Software Tools Designed to Improve Identification of Cancer Biomarkers
By LabMedica International staff writers
Posted on 11 Sep 2009
Posted on 11 Sep 2009
The explosive growth of genomic and proteomic data has created in a new era of molecular medicine in which cancer detection, diagnosis, and treatment are customized to each individual's molecular profile. However, this personalized medicine approach requires that researchers discover and tie biomarkers--such as genes or proteins--to specific disease behaviors, such as the rate of tumor progression and different responses to treatments.
Two new software programs, called caCORRECT and omniBioMarker, which help address that challenge have recently earned silver-level compatibility certification from the U.S. National Cancer Institute's (Bethesda, MD, USA) cancer Biomedical Informatics Grid (caBIG). The programs improve the process of identifying cancer biomarkers from gene expression data.
The programs were developed by Dr. May Dongmei Wang and her team in the Wallace H. Coulter department of biomedical engineering at Georgia Institute of Technology (Georgia Tech; Atlanta, GA, USA) and Emory University (Atlanta, GA, USA). "Certification by caBIG means the tools can be easily used by everyone in the cancer community to improve approaches to cancer detection, diagnosis, treatment, and prevention,” said Dr. Wang, an associate professor in the Coulter department.
caBIG is a collaborative information network that enables researchers, physicians, and patients to share data, tools, and knowledge to accelerate the discovery of new approaches that they hope will ultimately improve cancer patient outcomes. To become caBIG-certified, caCORRECT and omniBioMarker passed a meticulous set of requirements, ensuring the cancer research community that the software tools are high quality and interoperable with all other caBIG-certified systems for U.S. deployment.
caCORRECT--chip artifact CORRECTion--is a software program that improves the quality of collected microarray data, ultimately leading to improved biomarker selection. Widely used Affymetrix (Santa Clara, CA, USA) microarrays contain thousands of probes, each including a 25-oligo sequence, which are used to detect mRNA expression levels.
"Once someone has collected microarray data, it is important to run quality control on it and remove any problematic points of data that could highlight incorrect biomarkers when analyzed,” explained Dr. Wang, who is also director of the biocomputing and bioinformatics core in the Emory-Georgia Tech National Cancer Institute Center for Cancer Nanotechnology Excellence (CCNE).
Since each microarray chip contains thousands of spots, it is easy for a few spots to become marred by artifacts and noise. These unusable portions are typically the result of experimental variations by different laboratory technicians or errors that create scratches, edge effects, and bubble effects on the data. caCORRECT removes the noise and artifacts from the data, while retaining high-quality genes on the array. The software can also effectively recover lost information that has been obscured by artifacts.
In collaboration with Dr. Andrew N. Young, an associate professor in pathology and laboratory medicine at Emory University School of Medicine, Dr. Wang and graduate students Todd Stokes, Martin Ahrens, and Richard Moffitt validated the caCORRECT software. A large-scale survey of public data and data from Dr. Young's laboratory demonstrated the ability of caCORRECT to assess and improve the quality of a wide array of datasets.
"caCORRECT is a quality assurance tool that allows researchers to utilize and trust imperfect experimental microarray data that they spent a tremendous amount of time and money to generate,” added Dr. Wang. "caCORRECT improves the downstream analysis of microarray data and should be used before conducting biomarker selection, therapeutic target studies, or pathway analysis studies in bioinformatics and systems biology.”
Once the quality of the data is assured with caCORRECT, researchers can use the caBIG-certified omniBioMarker software to identify and validate biomarkers from the high-throughput gene expression data.
Candidate cancer biomarkers are typically genes expressed at different levels in cancer patients compared to healthy subjects. omniBioMarker searches these groups of patient data for genes with the highest potential for accurately determining whether a patient has cancer. However, because individual genes are not expressed independently, the software also identifies groups of genes that act in concert.
The advantage of the omniBioMarker software is that it customizes biomarker selection to a specific dataset or clinical problem based on prior biologic knowledge. It also applies unique analysis parameters for each specific clinical problem. The parameters are optimal when the software selects genes that are known to be relevant biomarkers based on clinical observations and laboratory experiments available in literature and public databases. Then the software finds new potential biomarkers for experimental validation.
Dr. Wang, graduate student John Phan, and Dr. Young tested the ability of the software to identify biomarkers in clinical renal cancer microarray data. The researchers selected renal cancer for study because it has several distinct subtypes, which could appear in the same person in varying degrees and must be treated according to the diagnosed subtype to maximize treatment success. The results indicate that integrating prior laboratory and clinical knowledge with the microarray data improves biomarker selection.
"Using omniBioMarker to create an optimal metric for ranking and identifying novel biomarkers reduces the number of false discoveries, increases the number of true discoveries, reduces the required time for validation, and increases the overall efficiency of the process,” noted Dr. Wang.
Since receiving caBIG silver-level compatibility certification for caCORRECT and omniBioMarker, Dr. Wang and her team have been working on getting two more software programs certified--Q-IHC, a tool that analyzes and quantifies multi-spectral images such as quantum dot-stained histopathologic images, and omniVisGrid, a grid-based tool that visualizes data and analysis processes of microarrays, biologic pathways, and clinical outcomes.
Related Links:
Georgia Institute of Technology
Emory University