Research

Research Direction

Our primary research efforts are centered on the development and utilization of artificial intelligence techniques and statistical methods to analyze large-scale healthcare data and social media data. This analysis is aimed at improving infectious disease population surveillance, population health informatics, clinical decision support, health care quality improvement, healthcare cost reduction, and pathology informatics.

Transfer Learning

As more machine-learned models are generated, the biggest challenge is that a model developed in one healthcare system (denoted as source) may be expected to underperform in another healthcare system (target). An innovative transfer learning framework is developed in our lab to enable sharing of both data and models between source and target. Our research enables knowledge-sharing under heterogeneous scenarios and provides an approach for understanding transfer learning performance between source and target in terms of differences in features and their distributions and sample sizes. The model-transfer algorithm can be viewed as a new Bayesian network learning algorithm that has a flexible representation of prior knowledge.

We are now continuing research on Bayesian and deep learning approaches to improve the re-usability of models. Re-using models is expected to benefit the public’s health by:

Improving case detection during epidemics by enabling the re-use of automatic case detectors developed in the earliest affected regions with other regions, and, more generally;
Increasing the impact of NIH’s investment in machine learning by enabling machine-learned models to be used in more institutions and locations.

Public Health Surveillance

Traditionally, public health surveillance relies mainly on sentinel physician reporting and laboratory reporting, which usually result in reporting delays and issues due to underreporting and undertesting. Our lab has developed a near real-time, active surveillance approach that uses machine-learned case detection systems to automatically capture cases from electronic medical records. A case detection system consists of a natural language processing parser (NLP) and a Bayesian network classifier. It uses NLP to infer the presence or absence of clinical findings from narrative notes. With these findings, a Bayesian network classifier infers each patient’s diagnosis probabilities, as well as the likelihood of patient clinical evidence, to support outbreak detection and forecast at the population level.

Our case detection systems enhance the communications between clinicians and public health officials, automatically inferring patients’ diagnoses from free-text clinical notes for population monitoring and automatically providing population prevalence information to support clinical decisions on differential diagnoses. Our research shows the impact of accurate natural language processing and feature selection on classification performance and demonstrates the advantages of automated predictive modeling versus experts’ simple judgment with regard to making correlations between NLP-parsed clinical findings and disease status. Regarding model parameterization, our experiments also show that using a changing prevalence of disease could increase the discriminative ability of an influenza detection model compared to using a constant prevalence. Moreover, we demonstrated high performance for influenza detection in a five-year retrospective study in both Allegheny County and Salt Lake County.

Challenges, Opportunities of Artificial Intelligence in Biomedicine

Growing numbers of artificial intelligence applications are being developed and applied to biomedicine. These technologies introduce risks and benefits that must be assessed and managed. In one collaborative article, we discussed how long-standing principles of medical and scientific ethics can be applied to artificial intelligence. In another collaborative article, we focus on the role of AI in clinical and translational research (CTR), including preclinical research, clinical research, clinical implementation, and public (or population) health. For each CTR phase, we addressed challenges, successes, failures, and opportunities for AI. We present three complementary perspectives: (1) scoping literature review, (2) survey, and (3) analysis of federally funded projects. In a collaborative review paper, we presented the existing sustainability models for open-source software (OSS) and describes 10 OSS use cases, including 3D Slicer, Bioconductor, Cytoscape, Globus, i2b2 (Informatics for Integrating Biology and the Bedside) and tranSMART, Insight Toolkit, Linux, Observational Health Data Sciences and Informatics tools, R, and REDCap (Research Electronic Data Capture), in 10 sustainability aspects: governance, documentation, code quality, support, ecosystem collaboration, security, legal, finance, marketing, and dependency hygiene.

Evaluation of Information Systems

The development of a medical information system must include comprehensive evaluation of its successes and deficiencies when applied in real-world scenarios. During her internship at the U.S. CDC, Ye designed an evaluation plan to study the effectiveness of an EMR alerting service. This plan showed how to take into account the objectives of major stakeholders, select an evaluation framework, identify evaluation elements, and clarify measurement methods. In the plan, Ye recommends the evaluation of causal relationships among input, process, and outcomes, as well as correlations between objective and subjective indicators. These causal relationships can indicate the validity of many measurements. This evaluation could be easily adapted for evaluation of other information systems.

Working History in Mesothelioma Patients

Malignant mesothelioma is a devastating cancer that is often caused by exposure to certain mineral fibers, particularly asbestos, in the environment or workplace. To better understand the risk factors associated with this disease, the National Mesothelioma Virtual Bank (NMVB) has been collecting detailed information on the industry, occupation, and exposure history of mesothelioma patients across the United States. This collaborative effort, led by the Departments of Biomedical Informatics and Pathology at the University of Pittsburgh and funded by a grant from the National Institute of Occupational Health and Safety of the Centers for Disease Control and Prevention, aims to provide a valuable resource for studying the high-risk industries and jobs that may put workers at risk for mesothelioma. Our laboratory is working in partnership with the CDC NIOSH to standardize the data collected by the NMVB, describe the natural history of patient cohorts' occupations, compare work history differences between genders, and explore potential risk factors that may impact the survival time of mesothelioma patients. Additionally, we are pleased to provide feedback and improvement suggestions on data collection when utilizing the NMVB cohort for research purposes.

R00LM013383 Transfer learning to improve the re-usability of computable biomedical knowledge
05/01/22-04/30/25 (PI: Dr. Ye Ye)
K99LM013383 Transfer Learning to Improve the Re-usability of Computable Biomedical Knowledge
05/04/20-04/30/22 (PI: Dr. Ye Ye, Mentors: Dr. Michael Becich, Dr. Gregory Cooper, Dr. Michael Wagner)
U24 OH009077 National Mesothelioma Virtual Bank, Continued Innovation (PI: Dr. Michael Becich)
09/30/16-09/29/21
R01LM013509 Automated Surveillance of Overlapping Outbreaks and New Outbreak Diseases (PI: Dr. Gregory Cooper)
08/01/21-09/29/24
R01 LM011370 Probabilistic Disease Surveillance (PI: Dr. Michael Wagner)
08/01/13-08/31/16, 05/01/17-08/31/17 (Ye was a GSR in this project)
Case Detection Using Natural Language Processing and Bayesian Network Classifiers Arts and Sciences Fellowship, University of Pittsburgh
09/01/16-04/31/17
Transfer Learning for Bayesian Case Detection Systems Andrew Mellon Fellowship, University of Pittsburgh
09/01/17-04/31/18

Artificial Intelligence in Population Health Lab