DNA Sudoku: Math Puzzle Logic Used to Enhance Genome Sequencing Capability
By LabMedica International staff writers
Posted on 23 Jul 2009
A popular math-based game may now be poised to transform the fast-changing world of genome sequencing and the field of medical genetics, suggests a team of scientists. Combining a 2,000-year-old Chinese math theorem with concepts from cryptology, the scientists have devised "DNA Sudoku.” The strategy allows tens of thousands of DNA samples to be combined, and their sequences--the order in which the letters of the DNA alphabet (A, T, G, and C) line up in the genome--to be determined all at once.Posted on 23 Jul 2009
This achievement is in stark contrast to past applications that allowed only a single DNA sample to be sequenced at a time. It also significantly improves upon current approaches that, at best, can combine hundreds of samples for sequencing. "In theory, it is possible to use the Sudoku method to sequence more than a hundred thousand DNA samples,” said Cold Spring Harbor Laboratory (CSHL; Cold Spring Harbor, NY, USA) professor Gregory Hannon, Ph.D., a genomics expert, and leader of the team that invented the Sudoku approach. At that level of efficiency, it has the potential to greatly reduce costs. A sequencing project that costs upwards of US$10 million using conventional methods may be accomplished for $50,000 to $80,000 using DNA Sudoku, he estimated.
Originally devised to overcome a sequencing limitation that plagued one of the Hannon lab's research projects, the new method has tremendous potential for clinical applications. It can be utilized, according to Dr. Hannon, to analyze specific regions of the genomes of a large population and identify individuals who carry mutations that cause genetic diseases--a process known as genotyping.
The CSHL team has already begun to explore this possibility via collaboration with Dor Yeshorim (New York, NY, USA), an organization that has collected DNA from thousands of members of orthodox Jewish communities. The organization's aim is to prevent genetic diseases such as Tay-Sachs or cystic fibrosis that occur frequently within specific ethnic populations. The new method will now allow the many thousands of DNA samples gathered by Dor Yeshorim to be processed and sequenced in a single time-saving and cost-effective experiment, which should identify individuals who carry disease-causing mutations.
The mixing together and simultaneous sequencing of a massive number of DNA samples is known as multiplexing. In earlier multiplexing approaches, scientists first tagged each sample with a barcode--a short string of DNA letters known as oligonucleotides--before mixing it with other samples that also had unique tags. After the sample mix had been sequenced, scientists could use the barcode tags on the resulting sequences as identification markers and thus tell which sequence belonged to which sample.
"But this approach is very limiting,” explained Dr. Yaniv Erlich, a graduate student in the Hannon laboratory and first author on the article. "It's time-consuming and costly to have to design a unique barcode for each sample prior to sequencing, especially if the number of samples runs in the thousands.”
In order to circumvent this limitation, Dr. Erlich and others in the Hannon lab came up with the plan of mixing the samples in specific patterns, thereby creating pools of samples. Moreover, instead of tagging the individual samples within each pool, the scientists tagged each pool as a whole with one barcode. "Since we know which pool contains which samples, we can link a sequence to an individual sample with high confidence,” said Dr. Erlich.
The key to the researchers development is the pooling strategy, which is based on the 2,000-year-old Chinese remainder theorem. "It minimizes the number of pools and the amount of sequencing,” stated Dr. Hannon of their method, which they dubbed DNA Sudoku because of its similarity to the logic and combinatorial number-placement rules used in the popular game.
The technique, which the CSHL team has patented, is currently best suited for genotype analyses that require only short segments of an individual's genome to be sequenced to find out if the individual is carrying a specific variant of a gene or a rare mutation. But as sequencing technologies improve and researchers gain the ability to generate sequences for longer segments of the genome, Dr. Hannon foresees wider clinical applications for their method such as human leukocyte antigen (HLA) typing, already an important diagnostic tool for autoimmune diseases, cancer, and for predicting the risk of organ transplantation.
The report was published in the July 1, 2009, issue of the journal Genome Research.
Related Links:
Cold Spring Harbor Laboratory
Dor Yeshorim