Software Developed to Accelerate Genome Research

By Biotechdaily staff writers
Posted on 24 Apr 2008
A U.S. research team has developed software that can analyze half a million DNA sequences in 10 minutes.

The Marth laboratory's (Boston College [BC]; Chestnut Hill, MA, USA) proprietary PyroBayes software is one of a new type of computer programs able to accurately process the mountains of genome data flowing from the latest generation of gene decoding machines, which have placed a premium on computational speed and accuracy in data-compiling fields known as bioinformatics and high-throughput biology, according to Dr. Gabor Marth, an associate professor of biology at BC.

"We're on the edge of a real technological revolution that I think will help us understand the genetic causes of diseases in humans and how genetic materials determine traits in animals,” said Dr. Marth. "It is going to lead to less expensive technologies that will allow researchers to decode any individual.”

The PyroBayes software will aid researchers involved in the 1000 Genomes Project, which recently announced a plan to sequence the genomes of 1000 individuals from around the world. The U.S. National Institutes of Health (NIH; Bethesda, MD, USA), which helps direct the project, has awarded Dr. Marth more than US$1.3 million to develop the software over the next four years.

The advances of the Marth lab were revealed in two articles published by the professor and his assistants in the February 2008 issue of the journal Nature Methods. In an article co-authored by Dr. Marth, post-doctoral researcher Dr. Chip Stewart, and graduate students Aaron Quinlan and Mike Strömberg, the group unveiled the lab's PyroBayes base caller software, which examines data from one of the latest generation of DNA decoding machines--developed by Roche/454 Life Sciences (Branford, CT, USA)--faster and with far greater accuracy than other programs for pyrosequencing, a technology that utilizes the detection of pyrophosphate for decoding the sequence of DNA, the carrier of genetic information in living organisms.

A second Nature Methods article in the same issue, written in collaboration with colleagues from the Washington University School of Medicine (St. Louis, MO, USA), reported that three other computer programs developed by the Marth lab made it possible to rapidly and effectively study the whole genome of a laboratory worm and identify key differences between the sample strain and an earlier strain--a comparative process known as resequencing, now being applied to the genomes of humans and other organisms. This second study used another next-generation DNA sequencing platform, the Illumina/Solexa machine.

Recent developments are driving resequencing costs down, but researchers must still prove the effectiveness of the new technology by working with smaller organisms, which made the worm study critical, Dr. Marth reported. "This brings us closer to a major milestone in human individual resequencing--the decoding of the genome of human beings in routine fashion.”

Of the few computer programs available for the new sequencing machines, the software package developed by the Marth lab is the only one capable of working with a variety of decoding machines and offers greater accuracy, allowing researchers to separate accurate genetic variations from data errors, according to Dr. Marth. PyroBayes, a Linux-based package, is made available to fellow academic researchers at no cost.

Marth lab, as an analysis group member, participates in the analysis of the 1000 Genomes Project data, which was launched in February 2008. The goal of the project is to sequence the genomes of at least 1000 people worldwide to create the most detailed and medically useful picture to date of human genetic variation.

Ultimately, advances in bioinformatics will help push genetic science forward, providing new insights on human health and disease, according to Dr. Marth. He envisions his lab's role in providing critical tools that help researchers to organize and interpret data, and visualize genome variations.

"We are excited to develop the software that will help these super-fast, high-throughput sequencing machines to realize their potential to produce invaluable data for research,” Dr. Marth said.


Related Links:
Marth Laboratory of Boston College

Latest BioResearch News