New Metagenomics Analysis Tool Reduces False Discovery Rates

By LabMedica International staff writers
Posted on 31 Mar 2015
Genomic researchers recently described a novel new tool for analyzing the complex data generated during DNA screens of mixed populations of organisms such as the human gut microbiome.

DNA screening of entire communities of organisms has been termed metagenomics. Such screening generates an enormous data set of short sequences, or "reads," which must be evaluated in order to yield meaningful information. While existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern DNA sequencing platforms, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR).

Image: Many molecular biology studies begin with purified DNA and RNA extracted from complex environments such as the human gut (Photo courtesy of Los Alamos [US] National Laboratory).

To correct these problems, investigators at the Los Alamos National Laboratory (New Mexico, USA) developed a new method for analysis of DNA sequencing data. The new tool, described in the March 12, 2015, online edition of the journal Nucleic Acids Research, is called Genomic Origins through Taxonomic CHAllenge or GOTTCHA, which makes use of a database of reference genomes that have been preprocessed to retain only unique segments of the genomes at any level of taxonomy.

GOTTCHA analyzes the distribution and depth of coverage of only the unique fraction of each reference genome—the unique genome—to identify the true community composition and accurate relative abundance of members of the community. GOTTCHA uses empirically-derived coverage limits, supported by machine-learning approaches, to set the limits of detection. The result is a scalable, all-purpose, metagenomic community profiler with superior classification and statistical performance over all currently available tools.

"We have developed a new tool in this rapidly expanding and evolving field of what is called metagenomics," said senior author Dr. Patrick Chain, metagenomics team leader at the Los Alamos National Laboratory. "It uses nucleic acid data and looks for sections that map uniquely to a preconstructed database."

"Metagenomics is the study of entire microbial communities using genomics, such as when you sequence the DNA of a whole community of organisms at once," said Dr. Chain. "The result is an enormous data set of short sequences, or reads, that you need to sort through to try to understand which organisms are actually present, and what they may be doing. Here at Los Alamos, we specialize in incredibly large data sets; we know how to handle them whether it is for physics, ocean, or climate modeling, or for complex biological insights."

The GOTTCHA software, associated databases, and training datasets are accessible to biotech researchers online (please see Related Links below).

Related Links:
Los Alamos [US] National Laboratory
GOTTCHA



Latest BioResearch News