Skip to main content


Nabarun Dasgupta presented two projects from the Opioid Data Lab: an algorithm to predict out-of-hospital death using insurance claims data and an initiative to understand and communicate the content inside street drugs. While sharing the first project, Dasgupta explained that the long term trend has been that more people are dying at home than they have in decades, which had only increased significantly in the age of COVID, and that there are several disparities between those dying at home and in the hospital. Large observational data sets that are used in epidemiology don’t often record deaths or people disenrolling from their health insurance plans, so Dasgupta and his team set out to create an algorithm that could predict death. The team used a large dataset called MarketScan from Watson Health that collected insurance records data, as well as records from the Social Security administration, to create a sample of 400,000,000. Using SuperLearner to create elastic net regression models to form an open source base model, the researchers fine-tuned it by incorporating data and calendar artifacts alongside external studies. Dasgupta revealed that applying the algorithm to distinguish death from disenrollment resulted in estimates that closely mirrored estimates using validated death data. The second project that Dasgupta shared aimed to reduce the harm and use of street drugs by studying the ingredients used in them. His team works with harm reduction programs and community partners around the country to receive samples of street drugs in the mail. They then complete detailed analyses of the drugs using a mass spectrometer at UNC-Chapel Hill, link them to specific clinical harms, and issue drug alerts written in simple language that allow people to make more informed health decisions. Using natural language processing, researchers have been able to identify strains of the “same” street drug that have vastly different ingredients. The next machine learning challenge that the team aims to tackle is to begin compiling a “Pantone book” of street drugs and record the effect of color on harm reduction, as the colors of the street drugs have begun to vary significantly in recent years. Click here to view the talk on YouTube.


Nabarun Dasgupta, Senior Scientist

Department: UNC Injury Prevention Research Center | Faculty Profile

Featured on: May 26, 2022 (Event Page)

Session Title: Improving Health Outcomes (Event Recap

Tools, Information, and Resources:

  • Opioid Data Lab: The Opioid Lab is a collaboration between the University of North Carolina at Chapel Hill, the University of Kentucky in Lexington and the University of Florida aimed at studying opioid analgesics, stimulants, and heroin-fentanyl. The Opioid Lab collaborates with pain patients and people who use drugs to answer research questions with real-life implications.
  • UNC Street Drug Analysis Lab:  The UNC Street Drug Analysis Lab is a public service of UNC-CH involving public health drug checking by mail. Results are typically returned in a week and samples are kept completely anonymous.
  • Opioid Data Lab GitHub: Check out the Jupyter Notebooks, Data, and Code shared publicly by the Opioid Data Lab. Datasets contain no personally identifying information and are either used with permission or only contain public data. Each set of code listed below is provided for public use, with the author's consent. Each code set contains a license that details sharing and reuse permissions. Code is written in SAS, Python, Stata, and/or R. These materials are open for use for anyone with an interest in reducing the harm from prescription opioids and heroin.
  • SuperLearner: SuperLearner is an algorithm that uses cross-validation to estimate the performance of multiple machine learning models, or the same model with different settings.
  • Excel: Microsoft Excel is the industry leading spreadsheet software program, a powerful data visualization and analysis tool.
  • Stata: Stata is a complete, integrated software package that provides all your data science needs—data manipulation, visualization, statistics, and automated reporting.
  • Jupyter Notebooks: The Jupyter Notebook is the original web application for creating and sharing computational documents. It offers a simple, streamlined, document-centric experience.
  • Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
  • OneDrive: Save your files and photos to OneDrive and access them from any device, anywhere.
  • Longleaf Cluster: The Longleaf cluster is a Linux-based computing system available to researchers across the campus free of charge.
  • SquareSpace: Create a customizable website or online store with an all-in-one solution from Squarespace.