Huge Modifiable Biomedical Database to Be Available on the Wikidata Site
By LabMedica International staff writers Posted on 28 Apr 2016 |
Genome researchers are exploiting the power of the open Internet community Wikipedia database to create a comprehensive resource for geneticists, molecular biologists, and other interested life scientists.
While efficiency in generating scientific data improves almost daily, applying meaningful relationships between taxonomic and genetic entities requires a structured and integrative approach. Currently, knowledge is distributed across a multitude of sites from government-funded institutions to topic-focused databases to the supplemental tables of primary publications.
It is becoming increasingly difficult to organize this huge amount of information, since expert-curated databases are expensive to maintain and extend. To overcome these difficulties investigators at The Scripps Research Institute (La Jolla, CA, USA) have turned to the Wikimedia project Wikidata, an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of genomics information.
The investigators described initial priming of their Wikidata resource in a paper published in the March 17, 2016, online edition of the journal Database. They imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes were imported from the [U.S.] National Center for Biotechnology Information (NCBI) and 27,306 human proteins and 16,728 mouse proteins were imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, this body of imported data is expected to serve as the starting point for integration of further data by scientists, the Wikidata community, and citizen scientists alike.
In a second paper, which was published in the March 28, 2016, online edition of the journal Database, the investigators focused on data of particular interest to molecular microbiologists and drug developers. This is an effort to develop a microbial specific data model, based on Wikidata’s semantic web compatibility, which represents bacterial species, strains, and the gene and gene products that define them. Currently, they have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteria Chlamydia trachomatis. Using this pathogen as an example, they explored complex interactions between the pathogen, its host, associated genes, other microbes, disease, and drugs.
In the next phase of development, the investigators will add another 99 bacterial genomes and their gene and gene products, totaling about 900,000 additional entities.
“Open data is vital for progress and research,” said senior and contributing author Dr. Ben Good, assistant professor of molecular and experimental medicine at The Scripps Institute. “We need to break down those barriers.”
Related Links:
Scripps Research Institute
While efficiency in generating scientific data improves almost daily, applying meaningful relationships between taxonomic and genetic entities requires a structured and integrative approach. Currently, knowledge is distributed across a multitude of sites from government-funded institutions to topic-focused databases to the supplemental tables of primary publications.
It is becoming increasingly difficult to organize this huge amount of information, since expert-curated databases are expensive to maintain and extend. To overcome these difficulties investigators at The Scripps Research Institute (La Jolla, CA, USA) have turned to the Wikimedia project Wikidata, an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of genomics information.
The investigators described initial priming of their Wikidata resource in a paper published in the March 17, 2016, online edition of the journal Database. They imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes were imported from the [U.S.] National Center for Biotechnology Information (NCBI) and 27,306 human proteins and 16,728 mouse proteins were imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, this body of imported data is expected to serve as the starting point for integration of further data by scientists, the Wikidata community, and citizen scientists alike.
In a second paper, which was published in the March 28, 2016, online edition of the journal Database, the investigators focused on data of particular interest to molecular microbiologists and drug developers. This is an effort to develop a microbial specific data model, based on Wikidata’s semantic web compatibility, which represents bacterial species, strains, and the gene and gene products that define them. Currently, they have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteria Chlamydia trachomatis. Using this pathogen as an example, they explored complex interactions between the pathogen, its host, associated genes, other microbes, disease, and drugs.
In the next phase of development, the investigators will add another 99 bacterial genomes and their gene and gene products, totaling about 900,000 additional entities.
“Open data is vital for progress and research,” said senior and contributing author Dr. Ben Good, assistant professor of molecular and experimental medicine at The Scripps Institute. “We need to break down those barriers.”
Related Links:
Scripps Research Institute
Latest BioResearch News
- Genome Analysis Predicts Likelihood of Neurodisability in Oxygen-Deprived Newborns
- Gene Panel Predicts Disease Progession for Patients with B-cell Lymphoma
- New Method Simplifies Preparation of Tumor Genomic DNA Libraries
- New Tool Developed for Diagnosis of Chronic HBV Infection
- Panel of Genetic Loci Accurately Predicts Risk of Developing Gout
- Disrupted TGFB Signaling Linked to Increased Cancer-Related Bacteria
- Gene Fusion Protein Proposed as Prostate Cancer Biomarker
- NIV Test to Diagnose and Monitor Vascular Complications in Diabetes
- Semen Exosome MicroRNA Proves Biomarker for Prostate Cancer
- Genetic Loci Link Plasma Lipid Levels to CVD Risk
- Newly Identified Gene Network Aids in Early Diagnosis of Autism Spectrum Disorder
- Link Confirmed between Living in Poverty and Developing Diseases
- Genomic Study Identifies Kidney Disease Loci in Type I Diabetes Patients
- Liquid Biopsy More Effective for Analyzing Tumor Drug Resistance Mutations
- New Liquid Biopsy Assay Reveals Host-Pathogen Interactions
- Method Developed for Enriching Trophoblast Population in Samples