AI-Based Tool Uses Tumor Gene Sequencing Data to Identify Site of Origin for Enigmatic Cancers

By LabMedica International staff writers
Posted on 08 Aug 2023

For a small segment of cancer patients, the origin of their cancer remains undetermined, making it challenging to select the most effective treatment. This is because most cancer medications are designed for distinct types of cancer. Researchers have now developed a new methodology, utilizing machine learning, to pinpoint the origins of these elusive cancers. This computational model evaluates the sequences of roughly 400 genes to predict a tumor's site of origin. In a dataset comprising about 900 patients, the model was able to accurately categorize 40% of untraceable tumors, thus potentially increasing the number of patients eligible for genomically guided, targeted treatment by 2.2 times.

In 3 to 5% of cancer cases, especially when tumors have metastasized across the body, determining the initial site where the cancer originated is a challenge. These tumors are termed as cancers of unknown primary (CUP). The inability to determine their origin hampers the administration of "precision" drugs, which are tailored for specific cancer types. These precision medications are not only more effective but also have fewer side effects than the more general treatments often prescribed to CUP patients. To address this issue, researchers from the Massachusetts Institute of Technology (MIT, Cambridge, MA, USA) and Dana-Farber Cancer Institute (Boston, MA, USA) analyzed routinely collected genetic data from Dana-Farber to predict cancer types. This data consisted of gene sequences for about 400 genes that are frequently mutated in cancer cases. Using this data, a machine-learning model was trained on nearly 30,000 patients diagnosed with one of 22 known cancer types.


Image: The AI model can help determine where a patient’s cancer arose (Photo courtesy of Freepik)

When this model, termed OncoNPC, was tested on 7,000 tumors that it had never seen before but whose site of origin was known, it predicted their origins with an astounding 80% accuracy, which increased to 95% for tumors with high-confidence predictions which constituted about 65% of the total. This model was then applied to around 900 CUP tumors from Dana-Farber. For 40% of these tumors, the model delivered high-confidence origin predictions. Further, when the model's forecasts were corroborated against germline (inherited) mutations in some tumors with available data, the team found that the model’s predictions often matched the type of cancer most strongly predicted by the germline mutations than any other type of cancer.

The model's accuracy was further validated by comparing CUP patients' survival time against the typical prognosis for the type of cancer predicted by the model. For instance, CUP patients predicted to have a grimmer prognosis, like pancreatic cancer, indeed had comparatively shorter survival times, while those predicted to have a more favorable prognosis, such as neuroendocrine tumors, lived longer. Additionally, 10% of the studied CUP patients received a targeted treatment based on oncological speculation. Among these, those treated in line with the model's predictions had better outcomes. The researchers also examined if the model’s predictions could be useful based on the types of treatments that CUP patients analyzed in the study had received. Around 10% of these patients had received targeted treatment, based on their oncologists’ best guess about where their cancer had originated. Among these patients, those who received treatment consistent with the type of cancer predicted by the model for them fared better than patients who received a treatment generally administered for a different type of cancer than what the model predicted for them.

Furthermore, the researchers used the model to identify an additional 15% of patients (a 2.2-fold increase) who could have benefited from targeted treatments if their cancer's origin had been known. Sadly, they were treated with general chemotherapy drugs. The team is now keen on expanding their model to incorporate varied data types, like pathology and radiology images, to provide a more comprehensive prediction using multiple data modalities. This would also provide the model with a comprehensive perspective of tumors, allowing it to predict the tumor type as well as the most appropriate treatment.

“That was the most important finding in our paper, that this model could be potentially used to aid treatment decisions, guiding doctors toward personalized treatments for patients with cancers of unknown primary origin,” said Intae Moon, an MIT graduate student in electrical engineering and computer science who was the lead author of the new study.

Related Links:
MIT
Dana-Farber Cancer Institute 


Latest Molecular Diagnostics News