AACC Competition Demonstrates How Labs Can Use Data Analytics to Solve Real Problems
Posted on 17 Oct 2022
Clinicians rely on parathyroid hormone-related peptide (PTHrP) measurement to help establish a diagnosis of humoral hypercalcemia of malignancy - a rare form of cancer that causes, among other things, high levels of calcium in the blood. The problem: Clinicians often order it for patients with low pretest probability. Excessive PTHrP testing can lead to expensive, unnecessary, and potentially harmful procedures, including invasive laboratory testing to locate a possibly nonexistent cancerous tumor. A successful predictive algorithm would help laboratorians quickly and accurately identify potentially inappropriate PTHrP test orders by predicting whether laboratory data available at the time of order already suggest an abnormal PTHrP result. A machine-learning challenge introduced for the first time by the American Association for Clinical Chemistry (Washington, DC, USA; www.aacc.org) at the 2022 AACC Annual Scientific Meeting & Clinical Lab Expo demonstrated how laboratories can use data analytics to solve these real problems facing patients and clinicians.
The Predicting PTHrP Results Competition introduced by the AACC at the event in association with the informatics section in the department of pathology and immunology of Washington University School of Medicine, St. Louis (WUSM, St. Louis, MI, USA) aimed to engage the community of laboratory medicine practitioners in a fun and friendly online environment where they could practice their data analytics skills, learn from each other, and see how others approach problems on the data-driven side of laboratory medicine. Competition participants formed teams and used securely shared real, de-identified clinical data from PTHrP orders at WUSM to build their predictive algorithms. This is termed the “practice dataset”. Using real clinical data was a big deal because most machine-learning competitions use synthesized datasets. Organizers set up the competition using Kaggle, a popular online platform for machine-learning modeling and contests, and selected F1 score (the harmonic mean of sensitivity and specificity) as the performance metric.

A major challenge for the teams was developing a predictive model that achieved high accuracy without overfitting it to the public dataset (the practice dataset). Overfitting would mean the algorithm worked well on the initial data but failed if applied to new data and was not generalizable. Organizers used a second, private dataset to judge the algorithm’s effectiveness. From May to June 2022, 24 teams ran a total of 395 iterations of their predictive models through the public dataset. Each time a team submitted a predictive model for an attempt, they used the resulting F1 score to improve - or “train” - the model. For the final attempt, each team ran their predictive model through the private dataset. The winning team, Team Kagglist, achieved an F1 score of 0.9 with their predictive model. For reference, WUSM’s manual approach for identifying patients at risk for PTHrP had an F1 score of 0.6, making the algorithm a significant improvement over standard practice.
“We shouldn’t expect a predictive model trained on data from one hospital to automatically work at other hospitals,” said Team Kaggle’s Yingheng Wang. “Ultimately, we should aim to create adaptive models that can be fine-tuned by other institutions for their specific populations.”
“The quality of all 24 models was excellent and showed a high degree of accuracy for the very difficult task we challenged participants with,” said competition organizer Mark Zaydman, MD, PhD, an assistant professor of pathology and immunology at WUSM. “This competition really showed our community is ready to engage with sophisticated machine learning and data analytics tools.”
Related Links:
AACC 
















 
								

 
								
 
								 
							 
                            