Big Data Toolbox Designed to Advance Genome Research

By LabMedica International staff writers
Posted on 17 Oct 2012

A US professor is leading a research team that is developing a series of solutions using high-performance computing techniques. The newer, sophisticated DNA sequencing technology is confounding researchers with too many bytes of data and overpowering existing applications in biologic computing.

The researchers’ goal is to develop core techniques, parallel algorithms, and software libraries to aid researchers in modifying parallel computing technology to high-throughput DNA sequencing, the next generation of sequencing technologies.

Those technologies are now used extensively, “enabling single investigators with limited budgets to carry out what could only be accomplished by an international network of major sequencing centers just a decade ago,” said Srinivas Aluru, a professor of computer engineering at Iowa State University (Ames, USA), and lead researcher of the project. “Seven years ago we were able to sequence DNA one fragment at a time,” he said. “Now researchers can read up to 6 billion DNA sequences in one experiment. How do we address these big data issues?”

A three-year, USD 2 million grant from the BIGDATA program of the US National Science Foundation and the National Institutes of Health (Bethesda, MD, USA) will support the search for a solution by Prof. Aluru and researchers from Iowa State, Stanford University (Stanford, CA, USA), Virginia Polytechnic Institute and State University (Virginia Tech; Blacksburg, USA), and the University of Michigan (Ann Arbor, USA).

Most of the grant will support research at Iowa State. The investigators will begin by identifying a large set of building blocks frequently used in genomic studies. They will devise the parallel algorithms and high performance implementations required to do the vital data analysis.

Moreover, they will integrate all of those technologies in software libraries researchers can access for assistance. Then, they will design a domain specific language that automatically creates computing codes for researchers.

According to Prof. Aluru, this should be much more effective than asking high performance computing specialists to devise parallel tools to each and every application. “The goal is to empower the broader community to benefit from clever parallel algorithms, highly tuned implementations, and specialized high performance computing hardware, without requiring expertise in any of these,” according to a summary of the research project.

Dr. Aluru reported that the resulting software libraries will be fully open-sourced. Researchers will be free to utilize the libraries while developing, editing, and modifying them as needed. “We’re hoping this approach can be the most cost-effective and fastest way to gain adoption in the research community,” Dr. Aluru concluded. “We want to get everybody up to speed using high performance computing.”

Related Links:
Iowa State University

Big Data Toolbox Designed to Advance Genome Research

Latest BioResearch News

Other channels