Identifying resistance genes in tuberculosis

Newly published in PLOS Biology is our work identifying genes that confer resistance to common and last-resort antibiotics in bacteria that cause tuberculosis. Resistance to these drugs contributes to mortality and sickness on a pandemic scale every year, and disproportionately affects the poorest people in the world.

This new article is one of a series presenting results generated by more than 100 scientists across 23 countries across 5+ years as part of a collaboration called CRyPTIC.

Our role in CRyPTIC was the discovery of genes and mutations likely to cause drug resistance by applying a tool known as a genome-wide association study (GWAS), an approach we helped adapt to bacteria.

Using GWAS, we identified previously uncatalogued genes and mutations underlying resistance to every one of the 13 drugs we investigated. These include new and repurposed drugs, as well as the first- and second-line drugs more often used to treat tuberculosis.

Thanks to its generous funders, CRyPTIC dedicated scale (10,000+ genomes) and technical innovation (new high-throughput MIC assays) to help decode the DNA blueprint of antibiotic resistance. Pushing these boundaries has yielded a steep increase of up to 36% in the variation in resistance attributable to the genome for the important and previously understudied new and repurposed drugs.

Science at this scale can produce a seemingly overwhelming wealth of new information. We avoided the temptation to over-emphasize any individual result for the sake of simple narrative. Instead, we highlighted discoveries of uncatalogued genes or genetic variants that we found for every drug investigated:

The amidase AmiA2 and GTPase Era for bedaquiline.

The cytochrome P450 enzyme Cyp142 for clofazimine.

The serine/threonine protein kinase PknH for delaminid.

The antitoxin VapB20 for linezolid.

The PPE-motif family outer membrane protein PPE42 for amikacin and kanamycin.

The antibiotic-induced transcriptional regulator WhiB7 for ethionamide.

The rRNA methylase TlyA for levofloxacin.

The DNA gyrase subunit B GyrB for moxifloxacin.

The putative rhodaneses CysA2 and CysA3 for rifabutin.

The tRNA/rRNA methylase SpoU for ethambutol and rifampicin.

The multidrug efflux transport system repressor Rv1219 for isoniazid.

All these hits passed stringent evidence thresholds that take into account the large amount of data crunched. For each hit, we identified possible relationships between gene functions, such as they are known, and the mechanism of action of the antibiotics.

Beyond the biological discoveries of primary interest, this new paper unveils methodological advances in bacterial GWAS. We introduced a systematic, whole-genome approach to analysing not just short DNA sequences (so called oligonucleotide or “kmer”-based approaches), but also short sequences of the proteins that the DNA codes for (an oligopeptide-based approach). We have released our software on an open-source GitHub repository.

We also discovered a relationship that may help disentangle a technical issue in bacterial GWAS where the co-occurence of traits can trick us into thinking that a gene influences one trait when it influences another instead. For antimicrobial resistance, this issue is known as artefactual cross resistance. We observed that true associations tended to produce larger associations (as measured by the 'coefficient', rather than the p-value), providing a possible way to prioritize signals in the future.

This paper was published alongside the CRyPTIC Data Compendium in PLOS Biology, in which we released our data open source to the community, with resources provided by the European Bioinformatics Institute.

Some of the results of CRyPTIC have already been rushed into service by the World Health Organization on the grounds of exceptional importance based on a candidate gene approach; this includes the DNA gyrase subunit B – moxifloxacin association spotlighted above (Walker et al 2022). However, the new results go beyond a candidate gene approach, detecting a range of previously uncatalogued genes via its agnostic, whole-genome strategy.

Unpicking the genetics of antimicrobial resistance is a priority for improving rapid susceptibility tests for individual patients, selecting drug regimens that inhibit the evolution of multidrug resistance, and developing improved treatment options. The need is particularly great in M. tuberculosis, which killed 1.4 million people in 2019, owing to the slow (6-12 week) turnaround of traditional susceptibility testing, and the alarming threat of multidrug resistant tuberculosis. The discovery of many new candidate resistance variants therefore represents an advance that we hope will contribute to progress in reducing the burden of disease.

New paper: GenomegaMap for dN/dS in over 10,000 genomes

Published this week in Molecular Biology and Evolution, is a new paper joint with the CRyPTIC Consortium "GenomegaMap: within-species genome-wide dN/dS estimation from over 10,000 genomes".

The dN/dS ratio is a popular statistic in evolutionary genetics that quantifies the relative rates of protein-altering and non-protein-altering mutations. The rate is adjusted so that under neutral evolution - i.e. when the survival and reproductive advantage of all variants is the same - it equals 1. Typically, dN/dS is observed to be less than 1 meaning that new mutations tend to be disfavoured, implying they are harmful to survival or reproduction. Occasionally, dN/dS is observed to be greater than 1 meaning that new mutations are favoured, implying they provide some survival or reproductive advantage. The aim of estimating dN/dS is usually to identify mutations that provide an advantage.

Theoreticians are often critical of dN/dS because it is more of a descriptive statistic than a process-driven model of evolution. This overlooks the problem that currently available models make simplifying assumptions such as minimal interference between adjacent mutations within genes. These assumptions are not obviously appropriate in many species, including infectious micro-organisms, that exchange genetic material infrequently.

There are many methods for measuring dN/dS. This new paper overcomes two common problems:
  • It is fast no matter how many genomes are analysed together.
  • It is robust whether there is frequent genetic exchange (which causes phylogenetic methods to report spurious signals of advantageous mutation) or infrequent genetic exchange.
The paper includes detailed simulations that establish the validity of the approach, and it goes on to demonstrate how genomegaMap can detect advantageous mutations in 10,209 genomes of Mycobacterium tuberculosis, the bacterium that causes tuberculosis. The method reproduces known signals of advantageous mutations that make the bacteria resistant to antibiotics, and it discovers a new signal of advantageous mutations in a cold-shock protein called deaD or csdA.

Software that implements genomegaMap is available on Docker Hub and the source code and documentation are available on Git Hub.

With the steady rise of more and more genome sequences, the analysis of data becomes an increasing challenge even with modern computers, so it is hoped that this new method provides a useful way to exploit the opportunities in such large datasets to gain new insights into evolution.

Postdoctoral Scientist in Statistical Genomics

We are recruiting for a Postdoctoral Scientist in Statistical Genomics working on Antimicrobial Resistance (AMR) gene discovery and focused on Tuberculosis. This will be a joint position at the University of Oxford between Derrick Crook's group and mine, and part of the large international CRyPTIC consortium.

The role is for a population geneticist or statistical geneticist to develop and apply statistical methods, including genome-wide association studies, for discovering rare and common genetic variants underlying antimicrobial resistance in Mycobacterium tuberculosis.

One third of the world's population - 2.5 billion people - are thought to be infected with tuberculosis (TB). This post offers an opportunity to work with global TB experts from five continents, statistical geneticists, clinicians, medical statisticians and software engineers; integrating statistical genetics, bioinformatics and machine learning methods with the aim of uncovering all genomic variants causing at least 1% resistance to first line anti-TB drugs.

We're looking for candidates with a PhD in genomics, evolutionary biology, statistics or a related subject. The post is full-time and fixed-term for up to 3 years initially.

The deadline for applications is noon on Friday 6th May 2016.

CRyPTIC: rapid diagnosis of drug resistance in TB

The Modernising Medical Microbiology consortium has announced a new worldwide collaboration called CRyPTIC to speed up diagnosis of antibiotic resistant tuberculosis (TB).

TB infects nearly 10 million people each year and kills 1.5 million, making it one of the leading causes of death worldwide. Almost half a million people each year develop multidrug-resistant (MDR) TB, which defies common TB treatments. Time consuming tests must be run to identify MDR-TB and which drugs will work or fail. This delays diagnosis and creates uncertainty about the best drugs to prescribe to individual patients.

CRyPTIC aims to hasten the identification of MDR-TB using whole genome sequencing to identify genetic variants that give resistance to particular drugs. The project is funded by a $2.2m grant from the Bill & Melinda Gates Foundation and a £4m grant from the Wellcome Trust and MRC Newton Fund.

CRyPTIC aims to collect and analyse 100,000 TB cases from across the world, providing a database of MDR-TB that will underpin diagnosis using WGS. Samples from across Africa, Asia, Europe and the Americas will be collected by teams at more than a dozen centres They will conduct drug resistance testing and much of the genome sequencing. Read more information here.