Techniques Identify Thousands of New DNA Sequences Missing from the Human Genome Reference Map
By LabMedica International staff writers
Posted on 19 May 2010
Researchers have discovered 2,363 new DNA sequences corresponding to 730 regions on the human genome by using new approaches. These sequences represent segments of the genome that were not charted in the reference map of the human genome.Posted on 19 May 2010
"A large portion of those sequences are either missing, fragmented or misaligned when compared to results from next-generation sequencing genome assemblies on the same samples,” said Dr. Evan Eichler, senior author on the findings published April 19, 2010, in the advanced online edition of the journal Nature Methods. Dr. Eichler is a University of Washington (UW; Seattle, USA) professor of genome sciences. "These findings suggest that new genome assemblies based solely on next-generation sequencing might miss many of these sites.”
Dr. Jeffrey M. Kidd was lead author of the article, which described the new techniques the research team used to find some of the missing sequences. Dr. Kidd headed the study while earning his Ph.D. at the University of Washington in the Eichler lab. He is now a postdoctoral fellow at Stanford University (Stanford, CA, USA). "Over the past several years, the extent to which the structure of the genome varies among humans has become clearer. This variation suggested that there must be portions of the human genome where DNA sequences had yet to be discovered, annotated and characterized,” he said "We hope that these sequences ultimately will be included as part of future releases of the reference human genome sequence.”
The reference genome is a standard for comparison for studies of human genetics. The human reference genome was first created in 2001 and is updated every couple of years, Dr. Kidd explained. It is a mosaic of DNA sequences derived from several individuals. He noted that about 80% of the reference genome came from eight people. One of them actually accounts for more than 66% of the total.
Along with their collaborators at Agilent (Santa Clara, CA, USA), the team designed ways to research these newly identified sequences in a panel of individuals representing populations worldwide. The researchers discovered that, in some cases, the number of copies of these sequences varied from person to person. The fact that a person can have one or more copies or no copy at all of a specific DNA sequence may account for the reason these sequences were missing from the reference genome. The researchers also found that some of these sequences were common or rare in different populations, depending on from which part of the globe their ancestors originated.
"Each segment of the reference genome is from a single person, and reflects the genome of that individual. If the donor sample was missing a sequence that many other people have, that sequence would not be represented in the reference genome,” Dr. Kidd explained. "That is why some of the positions on the reference genome represent rare structural configurations or entirely omit sequences found in the majority of people.” Dr. Kidd noted that the study published in Nature Methods used information from nine individuals, representing various world populations, to search for and fill in some of the missing pieces.
By looking at genomes from seven kinds of animals, the researchers were also able to reveal that some of the newly identified DNA sequences appear to have been conserved during the evolution of mammals and man. The animals whose genomes were examined were chimpanzee, Bornean orangutan, Rhesus monkey, house mouse, Norway rat, dog, and horse. "Some of the sequences were present in several different species, but were absent from the reference genome,” Dr. Kidd said. "Some of the sequences present in several mammals actually correspond to sites of variations in humans--some people have retained a particular sequence, and others have lost it.”
The researchers also developed a method to effectively genotype many of the newly found DNA sequences and created a way to look at variations in the number of copies of these sequences, thereby opening up regions of the human genome previously inaccessible to such studies. "Scientists can now begin trying to understand the functional importance of these sequences and their variations,” Dr. Kidd said.
The 1,000 Genomes Project (an international effort to sequence fully the genomes of a thousand anonymous individuals) and other genome studies are gathering massive amounts of data on DNA sequences that are then mapped to the reference genome, he added. Any study, he continued, that improves the completeness and quality of the reference genome assembly will thereby benefit these projects and lead to a fuller outlook on the extent of human genomic variation.
Related Links:
University of Washington
Agilent