Machine Learning Reveals Consistent Gut Microbiome Patterns in Colorectal Cancer
Posted on 27 Jun 2026
Colorectal cancer has been repeatedly linked to alterations in the gut microbiome, yet findings have often varied across small, heterogeneous studies. Reproducibility has been limited by differing sequencing methods and cohorts, complicating efforts to define reliable noninvasive signals. Early-stage disease and precancerous adenomas are especially challenging to detect from stool profiles. Researchers now report a large-scale analysis identifying a robust colorectal cancer–associated microbiome signature and a unifying analytical framework.
At the European Molecular Biology Laboratory (EMBL), investigators working within the Mi-EOCRC consortium spanning Germany, Switzerland and the Netherlands characterized a reproducible microbial signature of colorectal cancer. The effort reanalyzed 27 independent studies comprising 6,779 publicly available gut microbiome profiles from stool and assessed 906 intestinal tissue samples to compare fecal signals with microbes present in tumors. The resulting signature was consistent across populations, sequencing methods and age at diagnosis, including both early-onset and late-onset disease.

The team developed computational approaches to integrate datasets generated by different sequencing modalities at scale and trained a machine-learning classifier that assigns a “cancer-like” score to any human gut microbiome. Using taxonomic profiling down to bacterial strains and functional analyses of virulence factors, the approach could be applied retrospectively to diverse cohorts, including datasets not originally recruited for colorectal cancer research. Microbes enriched in tumor tissue mirrored the stool-based signature, indicating that tumor-associated taxa can be detected directly at the lesion site.
Detection patterns varied by specimen type and disease stage. Cancer-associated microbes were already detectable in early-stage tumors from tissue, whereas stool-based detection showed lower accuracy in early-stage cancers and in tumors located more proximally in the colon. Precancerous adenomas exhibited weaker and less consistent microbial shifts, and machine-learning classifiers trained for adenoma detection demonstrated variable performance across cohorts. In comparisons with existing noninvasive screening approaches, microbiome-based classifiers did not match fecal immunochemical test performance.
Dietary analyses linked a stronger colorectal cancer–associated microbiome pattern to lower fiber intake, while increased fiber intake in intervention studies reduced the cancer-like score. Strain-level resolution highlighted important differences among Fusobacterium taxa: Fusobacterium nucleatum subsp. animalis showed consistent enrichment across continents, whereas other Fusobacterium species and subspecies displayed geographically heterogeneous patterns, including taxa found almost exclusively in cancer patients from Asia.
The study, published in Cell Host & Microbe in 2026, underscores the value of open data for large-scale evidence synthesis and provides a reference resource to support future investigations into detection, risk assessment and prevention.
“The strength of this study is its comprehensiveness. We combined stool and tissue comparisons, dietary data, taxonomic analysis down to bacterial strains, and functional analysis of virulence factors,” said Georg Zeller, Visiting Team Leader at EMBL Heidelberg and Professor at the Leiden University Medical Center (LUMC).
“These results suggest that colorectal cancer-associated changes in the microbiome may appear early in disease development and raise the question of how the tumor shapes the microbiome and how the microbes impact the tumor microenvironment through signaling, metabolic and other interactions,” said Michael Zimmermann, Group Leader at EMBL Heidelberg.
Related Links
European Molecular Biology Laboratory








