Executive Summary
Introduction and Background
The Training Data for Machine Learning to Enhance PCOR Data Infrastructure project (hereafter termed the Project) led by the Office of the National Coordinator for Health Information Technology (ONC) conducted foundational work to support future applications of artificial intelligence (AI), specifically focused on machine learning (ML) to further health, health care, and patient-centered outcomes research (PCOR), and in turn enhance the adoption and implementation of a PCOR data infrastructure. This Project is funded through the PCOR Trust Fund (PCORTF), established under the Patient Protection and Affordable Care Act of 2010, and managed by the Department of Health and Human Services (HHS) Assistant Secretary for Planning and Evaluation (ASPE) that leads projects to build PCOR data capacity and infrastructure.
A major challenge for advancing AI/ML applications to accelerate clinical innovation and support evidence-based decisions in clinical settings is the lack of high-quality training data. To address this challenge, ONC partnered with the National Institutes of Health (NIH) National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) to define and develop high-quality training datasets that were provisionally tested using three ML algorithms. The Project used data from the United States Renal Data System (USRDS) to prepare these training datasets and to apply ML techniques for an end stage kidney disease (ESKD)/end stage renal disease (ESRD) use case. A key aspect of implementing this project was the engagement of a technical expert panel (TEP) composed of experts from AI/ML and health information technology and a patient advocate – who played a crucial role in vetting the criteria for high-quality training datasets and the methods and results from building the training datasets and ML models.
Dissemination of resources generated from this Project, including the detailed methodology and the code that was developed, points to consider when building training datasets and ML models, and recommendations for future projects gathered from the TEP, further promotes the broader application of AI/ML by PCOR researchers (these resources are available in the Implementation Guide and this Final Report).
Development of High-Quality Training Datasets and ML Models
The use case – predicting mortality in the first 90 days of dialysis – was selected because mortality in the first 90 days of dialysis initiation in ESKD/ESRD patients remains notably high. From a patient-centered perspective, an ML model that predicts mortality in the first 90 days could inform patient-provider joint clinical decisions on whether to initiate dialysis.
The overall dataset was prepared using variables in the USRDS data with clinical relevance and prognostic value for mortality in the first 90 days after dialysis initiation. The criteria for high-quality training datasets were defined with input from TEP and other stakeholders and included applying inclusion/exclusion cohort selection requirements, structuring and curating to ensure that missing values and outliers were handled appropriately, scaling and balancing the data, and preparing a data dictionary with all the features selected for ML modeling. The features in the training dataset only included information known on or prior to the first day of dialysis and consisted of 188 features, with one record per patient. Two sets of features were included in the dataset – features taken directly from the USRDS data and those that were constructed.
Three ML algorithms (a mixture of non-parametric and parametric) were selected with guidance from the TEP to provisionally test the training datasets and develop ML models – eXtreme gradient boosting (XGBoost), logistic regression, and multilayer perceptron (MLP). Both non-imputed and multiply imputed datasets were used for XGBoost modeling to compare the contribution of multiple imputation on the model performance, whereas only the multiply imputed dataset was used for logistic regression and MLP, as these algorithms cannot natively handle non-informatively missing values. Due to the differing requirements of the input training dataset for these models, additional data processing steps were performed that included one-hot encoding, standardization, and balancing. Hyperparameters were tuned using the training dataset, and the final model was trained on the training dataset and evaluated on the testing dataset.
Performance of the models measured using receiver operating characteristic (ROC) area under the curve (AUC) showed high ROC AUC that ranged between 0.812 – 0.827. Calibration of the XGBoost models by plotting the observed versus estimated risk indicates an accurately estimated probability of mortality across all ranges of predicted risk. Features ranked in the top 10 by XGBoost and logistic regression included indicators of general health status, length of time prior to ESKD/ESRD, and the quality of care delivered. Performance of the models assessed for fairness measured by ROC AUC across demographic categories (age, race, sex) and initial dialysis modality demonstrated that XGBoost performed consistently across the evaluated categories as compared to logistic regression and MLP models.
Recommendations for Supporting the Future Application of ML to Health, Health Care, and PCOR
A major objective of this foundational project was to identify areas for future PCOR studies based on the challenges encountered and the findings from building the training datasets and ML models. Towards that end, the TEP and other stakeholders provided significant input and multiple recommendations for building upon the outputs and outcomes throughout the course of this project. These are detailed in this Final Report and include general strategic recommendations for industry to consider in advancing the application of AI/ML for PCOR and health care and specific more pragmatic recommendations for future PCOR researchers to build upon the training dataset and ML models developed in this project.
Conclusion
The project addressed the goal of building and testing high-quality training datasets for a kidney disease use case that can potentially be utilized for AI/ML applications, including joint clinician-patient informed decision making. PCOR researchers can build off the foundational work completed through this project and extend the application of these methods to a wider array of use cases and advance the application of ML to enhance PCOR infrastructure.