Supercomputer Significantly Shown to Speed up Genome Analysis

By LabMedica International staff writers
Posted on 06 Mar 2014
Investigators, working with one of the world’s fastest supercomputers designed for life sciences, recently reported that genome analysis could now be drastically accelerated.

The supercomputer named Beagle is based at Argonne US National Laboratory (Argonne, IL, USA), and it is able to analyze 240 full genomes in about two days. Although the time and cost of sequencing a complete human genome has nosedived, analyzing the resulting three billion base pairs of genetic data from only one genome can take many months.

Image: The Beagle computer at Argonne National Laboratory, one of the world’s fastest supercomputers designed for life sciences, is able to analyze 240 full genomes in about two days (Photo courtesy of the University of Chicago Medicine).

University of Chicago (IL, USA) scientists working with the Beagle supercomputer published their findings online on February 12, 2014, in the journal Bioinformatics. “This is a resource that can change patient management and, over time, add depth to our understanding of the genetic causes of risk and disease,” said study author Elizabeth McNally, MD, PhD, a professor of medicine and human genetics and director of the Cardiovascular Genetics clinic at the University of Chicago Medicine.

“The supercomputer can process many genomes simultaneously rather than one at a time,” said first author Megan Puckelwartz, a graduate student in McNally’s laboratory. “It converts whole genome sequencing, which has primarily been used as a research tool, into something that is immediately valuable for patient care.”

Because the genome is so huge, those investigators involved in clinical genetics have turned to exome sequencing, which focuses on the 2% or less of the genome that codes for proteins. This application is frequently helpful because an estimated 85% of disease-causing mutations are located in coding regions. However, the rest, approximately 15% of clinically significant mutations, come from noncoding regions, once referred to as “junk DNA,” but now known to serve important functions. If not for the vast data-processing analysis challenges, whole genome sequencing would be the method of choice.

To evaluate the system, the scientists utilized raw sequencing data from 61 human genomes and analyzed that data on Beagle. They used publicly available software packages and 25% of the computer’s total capacity. They discovered that shifting to the supercomputer setting improved accuracy and dramatically accelerated speed. “Improving analysis through both speed and accuracy reduces the price per genome,” Dr. McNally said. “With this approach, the price for analyzing an entire genome is less than the cost of the looking at just a fraction of genome. New technology promises to bring the costs of sequencing down to around USD 1,000 per genome. Our goal is get the cost of analysis down into that range.”

“This work vividly demonstrates the benefits of dedicating a powerful supercomputer resource to biomedical research,” said co-author Dr. Ian Foster, director of the Computation Institute and a professor of computer science. “The methods developed here will be instrumental in relieving the data analysis bottleneck that researchers face as genetic sequencing grows cheaper and faster.”

The finding has immediate medical applications. Dr. McNally’s Cardiovascular Genetics clinic, for example, relies on comprehensive cross-examination of the genes from an initial patient as well as a number of family members to understand, treat, and prevent disease. More than 50 genes can contribute to cardiomyopathy. Other genes can stimulate rhythm disorders, heart failure, or vascular problems. “We start genetic testing with the patient,” she said, “but when we find a significant mutation we have to think about testing the whole family to identify individuals at risk.”

The range of testable mutations has greatly expanded. “In the early days we would test one to three genes,” Dr. McNally said. “In 2007, we did our first five-gene panel. Now we order 50 to 70 genes at a time, which usually gets us an answer. At that point, it can be more useful and less expensive to sequence the whole genome.”

These genomic data combined with meticulous attention to patient and family histories “adds to our knowledge about these inherited disorders,” Dr. McNally said. “It can refine the classification of these disorders. By paying close attention to family members with genes that place then at increased risk, but who do not yet show signs of disease, we can investigate early phases of a disorder. In this setting, each patient is a big-data problem.”

Beagle, a Cray XE6 supercomputer housed in the Theory and Computing Sciences (TCS) building at Argonne National Laboratory, supports computation, simulation, and data analysis for the biomedical research community. It was named after the HMS Beagle, the ship that carried Charles Darwin on his celebrated scientific voyage in 1831.

Related Links:

Argonne U.S. National Laboratory
University of Chicago



Latest BioResearch News