We use cookies to understand how you use our site and to improve your experience. This includes personalizing content and advertising. To learn more, click here. By continuing to use our site, you accept our use of cookies. Cookie Policy.

LabMedica

Download Mobile App
Recent News Expo Clinical Chem. Molecular Diagnostics Hematology Immunology Microbiology Pathology Technology Industry Focus

Researchers Use Natural-Language Processing (NLP) Algorithms to Predict SARS-CoV-2 Virus Mutations

By LabMedica International staff writers
Posted on 18 Jan 2021
Print article
Image: Researchers Use NLP Algorithms to Predict SARS-CoV-2 Virus Mutations (Photo courtesy of Baidu)
Image: Researchers Use NLP Algorithms to Predict SARS-CoV-2 Virus Mutations (Photo courtesy of Baidu)
Natural-language processing (NLP) algorithms are now able to generate protein sequences and predict virus mutations, including key changes that help the SARS-CoV-2 virus evade the immune system.

The key insight making this possible is that many properties of biological systems can be interpreted in terms of words and sentences. In the last few years, a handful of researchers have shown that protein sequences and genetic codes can be modeled using NLP techniques. Now, computational biologists at the Massachusetts Institute of Technology (MIT; Cambridge, MA, USA) have pulled several of these strands together and use NLP to predict mutations that allow viruses to avoid being detected by antibodies in the human immune system, a process known as viral immune escape. The basic idea is that the interpretation of a virus by an immune system is analogous to the interpretation of a sentence by a human.

The team uses two different linguistic concepts: grammar and semantics (or meaning). The genetic or evolutionary fitness of a virus - characteristics such as how good it is at infecting a host - can be interpreted in terms of grammatical correctness. A successful, infectious virus is grammatically correct; an unsuccessful one is not. Similarly, mutations of a virus can be interpreted in terms of semantics. Mutations that make a virus appear different to things in its environment - such as changes in its surface proteins that make it invisible to certain antibodies - have altered its meaning. Viruses with different mutations can have different meanings, and a virus with a different meaning may need different antibodies to read it.

To model these properties, the researchers used an LSTM, a type of neural network that predates the transformer-based ones used by large language models like GPT-3. These older networks can be trained on far less data than transformers and still perform well for many applications. Instead of millions of sentences, they trained the NLP model on thousands of genetic sequences taken from three different viruses: 45,000 unique sequences for a strain of influenza, 60,000 for a strain of HIV, and between 3,000 and 4,000 for a strain of the SARS-CoV-2 virus.

NLP models work by encoding words in a mathematical space in such a way that words with similar meanings are closer together than words with different meanings. This is known as an embedding. For viruses, the embedding of the genetic sequences grouped viruses according to how similar their mutations were. The overall aim of the approach is to identify mutations that might let a virus escape an immune system without making it less infectious - that is, mutations that change a virus’s meaning without making it grammatically incorrect.

To test their approach, the team used a common metric for assessing predictions made by machine-learning models that scores accuracy on a scale between 0.5 (no better than chance) and 1 (perfect). In this case, they took the top mutations identified by the tool and, using real viruses in a lab, checked how many of them were actual escape mutations. Their results ranged from 0.69 for HIV to 0.85 for one coronavirus strain. This is better than results from other state-of-the-art models, according to the researchers.

The team has been running models on new variants of the coronavirus, including the so-called UK mutation, the mink mutation from Denmark, and variants taken from South Africa, Singapore and Malaysia. Using NLP accelerates a slow process. Previously, the genome of the virus taken from a COVID-19 patient in hospital could be sequenced and its mutations re-created and studied in a lab. However, that can take weeks, whereas the NLP model predicts potential mutations straight away, which focuses the lab work and speeds it up.

“We’re learning the language of evolution,” said Bonnie Berger, a computational biologist at the Massachusetts Institute of Technology. “Biology has its own language.”

Related Links:
Massachusetts Institute of Technology (MIT)

Platinum Member
COVID-19 Rapid Test
OSOM COVID-19 Antigen Rapid Test
Magnetic Bead Separation Modules
MAG and HEATMAG
POCT Fluorescent Immunoassay Analyzer
FIA Go
Gold Member
SARS-CoV-2 RT-PCR Assay
Reliance SARS-CoV-2 RT-PCR Assay Kit

Print article

Channels

Clinical Chemistry

view channel
Image: The 3D printed miniature ionizer is a key component of a mass spectrometer (Photo courtesy of MIT)

3D Printed Point-Of-Care Mass Spectrometer Outperforms State-Of-The-Art Models

Mass spectrometry is a precise technique for identifying the chemical components of a sample and has significant potential for monitoring chronic illness health states, such as measuring hormone levels... Read more

Hematology

view channel
Image: The CAPILLARYS 3 DBS devices have received U.S. FDA 510(k) clearance (Photo courtesy of Sebia)

Next Generation Instrument Screens for Hemoglobin Disorders in Newborns

Hemoglobinopathies, the most widespread inherited conditions globally, affect about 7% of the population as carriers, with 2.7% of newborns being born with these conditions. The spectrum of clinical manifestations... Read more

Immunology

view channel
Image: Exosomes can be a promising biomarker for cellular rejection after organ transplant (Photo courtesy of Nicolas Primola/Shutterstock)

Diagnostic Blood Test for Cellular Rejection after Organ Transplant Could Replace Surgical Biopsies

Transplanted organs constantly face the risk of being rejected by the recipient's immune system which differentiates self from non-self using T cells and B cells. T cells are commonly associated with acute... Read more

Microbiology

view channel
Image: The ePlex system has been rebranded as the cobas eplex system (Photo courtesy of Roche)

Enhanced Rapid Syndromic Molecular Diagnostic Solution Detects Broad Range of Infectious Diseases

GenMark Diagnostics (Carlsbad, CA, USA), a member of the Roche Group (Basel, Switzerland), has rebranded its ePlex® system as the cobas eplex system. This rebranding under the globally renowned cobas name... Read more

Pathology

view channel
Image: The revolutionary autonomous blood draw technology is witnessing growing demands (Photo courtesy of Vitestro)

Robotic Blood Drawing Device to Revolutionize Sample Collection for Diagnostic Testing

Blood drawing is performed billions of times each year worldwide, playing a critical role in diagnostic procedures. Despite its importance, clinical laboratories are dealing with significant staff shortages,... Read more