Huge Modifiable Biomedical Database to Be Available on the Wikidata Site

By LabMedica International staff writers
Posted on 28 Apr 2016

Genome researchers are exploiting the power of the open Internet community Wikipedia database to create a comprehensive resource for geneticists, molecular biologists, and other interested life scientists.

While efficiency in generating scientific data improves almost daily, applying meaningful relationships between taxonomic and genetic entities requires a structured and integrative approach. Currently, knowledge is distributed across a multitude of sites from government-funded institutions to topic-focused databases to the supplemental tables of primary publications.

It is becoming increasingly difficult to organize this huge amount of information, since expert-curated databases are expensive to maintain and extend. To overcome these difficulties investigators at The Scripps Research Institute (La Jolla, CA, USA) have turned to the Wikimedia project Wikidata, an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of genomics information.

The investigators described initial priming of their Wikidata resource in a paper published in the March 17, 2016, online edition of the journal Database. They imported all human and mouse genes, and all human and mouse proteins into Wikidata. In total, 59,721 human genes and 73,355 mouse genes were imported from the [U.S.] National Center for Biotechnology Information (NCBI) and 27,306 human proteins and 16,728 mouse proteins were imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, this body of imported data is expected to serve as the starting point for integration of further data by scientists, the Wikidata community, and citizen scientists alike.

In a second paper, which was published in the March 28, 2016, online edition of the journal Database, the investigators focused on data of particular interest to molecular microbiologists and drug developers. This is an effort to develop a microbial specific data model, based on Wikidata’s semantic web compatibility, which represents bacterial species, strains, and the gene and gene products that define them. Currently, they have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteria Chlamydia trachomatis. Using this pathogen as an example, they explored complex interactions between the pathogen, its host, associated genes, other microbes, disease, and drugs.

In the next phase of development, the investigators will add another 99 bacterial genomes and their gene and gene products, totaling about 900,000 additional entities.

“Open data is vital for progress and research,” said senior and contributing author Dr. Ben Good, assistant professor of molecular and experimental medicine at The Scripps Institute. “We need to break down those barriers.”

Related Links:
Scripps Research Institute