SCOTTI wins PLoS Computational Biology Research Prize

Work from our group has been recognised in the PLoS Computational Biology 2017 Research Prizes. SCOTTI, which infers transmission routes from genetic and epidemiological information, won the Breakthrough in Advance/Innovation category. The citation reads
Our Breakthrough Advance/Innovation winning article presents a new computational tool, called SCOTTI (Structured COalescent Transmission Tree Inference), developed by Nicola De Maio of the University of Oxford (UK), and colleagues. De Maio says, “SCOTTI represents a convenient tool to reconstruct who-infected-whom within outbreaks… [and] has been used in particular for the study of bacterial hospital outbreaks”. It combines epidemiological information about patient exposure with genetic information about the infectious agent itself.
Work is nominated and selected as described in the announcement:
The journal invited the community to nominate their favorite 2016 published Research Articles. From these nominations the PLOS Computational Biology Research Prize Committee, made up of Editorial Board members Dina Schneidman, Nicola Segata, Maricel Kann, Isidore Rigoutsos, Avner Schlessinger, Lilia Iakoucheva, Ilya Ioshikhes, Shi-Jie Chen, and Becca Asquith, selected the winners. To help support future work, the authors of each winning paper will receive award certificates and a $2,000 (USD) prize.
You can read more about SCOTTI and the accompanying paper, written by Nicola De Maio, Jessie Wu and me, here.

Making the most of bacterial GWAS: new paper in Nature Microbiology

In a new paper published this week in Nature Microbiology, we report the performance of genome wide association studies (GWAS) in bacteria to identify causal mechanisms of antibiotic resistance in four major pathogens, and introduce a new method, bugwas,  to make the most of bacterial GWAS for traits under less strong selection.

As explained by Sarah Earle, joint first author with Jessie Wu and Jane Charlesworth, the problem with GWAS in bacteria is strong population structure and the consequent strong coinheritance of genetic variants throughout the genome. This phenomenon - known as genome-wide linkage disequilibrium (LD) - comes about because exchange of genes is relatively infrequent in bacteria, which reproduce clonally, compared to organisms that exchange genes every generation through sexual reproduction.

Genome-wide LD makes it difficult for GWAS to distinguish variants that causally influence a trait from other, coinherited variants that have no direct effect on the trait.

In the case of antibiotic resistance - a trait of high importance to human health - bacteria are under extraordinary selection pressures because resistance is a matter of life and death, to them as well as their human host. This helps overcome coinheritance and pinpoint causal variants because antibiotic usage selects for the independent evolution of the same resistance-causing variants in different genetic backgrounds.

Consequently, bacterial GWAS works very efficiently for antibiotic resistance: the variants most significantly associated with antibiotic resistance in 26 out of the 27 GWAS we performed were genuine resistance-conferring mutations. In the 27th we uncovered a putative novel mechanism of resistance to cefazolin in E. coli. These results for 17 antibiotics (ampicillin, cefazolin, cefuroxime, ceftriaxone, ciprofloxacin, erythromycin, ethambutol, fusidic acid, gentamicin, isoniazid, penicillin, pyrazinamide, methicillin, rifampicin, tetracycline, tobramycin and trimethoprim) across four species (E. coli, K. pneumoniae, M. tuberculosis and S. aureus) build on earlier work investigating beta-lactam resistance in S. pneumoniae, and convincingly demonstrate the potential for bacterial GWAS to discover new genes underlying important traits under strong selection.

What about traits under less strong selection, which probably includes pretty much every other bacterial trait? We show in this context that coinheritance poses a major challenge, based on detailed simulations. Often it may not be possible to use GWAS to pinpoint individual variants responsible for different traits because they are coinherited with - possibly many - other uninvolved variants.

But all is not lost. We show that even when individual locus-level effects cannot be pinpointed, there is often excellent power to characterize lineage-level differences in phenotype between strains. This is helpful for multiple reasons: (1) we often conceptualize trait variability in bacteria at the level of strain-to-strain differences (2) these differences can be highly predictive (3) we can prioritize variants for functional follow-up based on their contribution to strain-level differences.

These concepts represent a substantial departure from regular GWAS. In the human setting for instance, lineage-level differences are usually discarded as uninteresting or artefactual, and variants are almost always prioritized based on statistical evidence for involvement over-and-above any contribution to lineage-level differences. In the bacterial setting, we are forced to depart from these conventions because a large proportion of all genetic variation is strongly strain-stratified. To find out more, see the paper and try our methods.

BASTA: Improved method for phylogeography

This week sees publication of our paper New Routes to Phylogeography: a Bayesian Structured Coalescent Approximation in PLoS Genetics.

Phylogeography is the recovery of migration history from genome sequences, and has exploded as a field in recent years. Over a thousand papers have used contemporary sequences and ancient DNA to reconstruct migratory trends, locate the origin of outbreaks and track the spread of infectious diseases. In many high profile examples phylogeography has informed our understanding of how major human pathogens spread.

In our new paper we solve a severe and apparently widely unappreciated problem: that the most popular approaches to phylogeography are heavily biased, extremely sensitive to sampling structure and substantially underestimate statistical uncertainty. The problems stem from the treatment of migration as equivalent to mutation (discrete trait analysis; DTA), and the assumption that sampling locations are phylogeographically informative.

To solve these problems we introduce and demonstrate a new method BASTA, implemented in the phylogenetic software package BEAST2, that employs a novel approximation to enable inference under the structured coalescent – the bottom-up population genetics model of migration. Previously, methods for exact inference under the structured coalescent have proven too slow for many practical purposes, hence the need for a fast and accurate approximation.

The biases we highlight with popular phylogeography methods are much more important than might appear from what is at one level a question of model choice. To underline this, we present an analysis of around 100 Ebola virus genome sequences to investigate the emergence of human outbreaks. Epidemiological studies have found that animals act as a reservoir, maintaining the virus between the sporadic human outbreaks that have unfolded over the past four decades, a scenario that our structured coalescent-based model correctly identifies.

Remarkably, DTA, the de facto standard method for phylogeography, wrongly concluded with high confidence that Ebola has been maintained since 1976 by undetected human-to-human transmission between outbreaks. Although such a conclusion would never be believed in the case of Ebola, it makes clear the potential for highly misleading inference about transmission that could, for much less well understood diseases, have serious implications for public health policy.

BASTA is the result of a lot of hard work by Nicola De Maio, who is a James Martin Fellow at the Oxford Martin School Institute for Emerging Infections, with help from Jessie Wu and Kathleen O'Reilly. You can read the paper here and download BASTA here.