New paper: GenomegaMap for dN/dS in over 10,000 genomes

Published this week in Molecular Biology and Evolution, is a new paper joint with the CRyPTIC Consortium "GenomegaMap: within-species genome-wide dN/dS estimation from over 10,000 genomes".

The dN/dS ratio is a popular statistic in evolutionary genetics that quantifies the relative rates of protein-altering and non-protein-altering mutations. The rate is adjusted so that under neutral evolution - i.e. when the survival and reproductive advantage of all variants is the same - it equals 1. Typically, dN/dS is observed to be less than 1 meaning that new mutations tend to be disfavoured, implying they are harmful to survival or reproduction. Occasionally, dN/dS is observed to be greater than 1 meaning that new mutations are favoured, implying they provide some survival or reproductive advantage. The aim of estimating dN/dS is usually to identify mutations that provide an advantage.

Theoreticians are often critical of dN/dS because it is more of a descriptive statistic than a process-driven model of evolution. This overlooks the problem that currently available models make simplifying assumptions such as minimal interference between adjacent mutations within genes. These assumptions are not obviously appropriate in many species, including infectious micro-organisms, that exchange genetic material infrequently.

There are many methods for measuring dN/dS. This new paper overcomes two common problems:
  • It is fast no matter how many genomes are analysed together.
  • It is robust whether there is frequent genetic exchange (which causes phylogenetic methods to report spurious signals of advantageous mutation) or infrequent genetic exchange.
The paper includes detailed simulations that establish the validity of the approach, and it goes on to demonstrate how genomegaMap can detect advantageous mutations in 10,209 genomes of Mycobacterium tuberculosis, the bacterium that causes tuberculosis. The method reproduces known signals of advantageous mutations that make the bacteria resistant to antibiotics, and it discovers a new signal of advantageous mutations in a cold-shock protein called deaD or csdA.

Software that implements genomegaMap is available on Docker Hub and the source code and documentation are available on Git Hub.

With the steady rise of more and more genome sequences, the analysis of data becomes an increasing challenge even with modern computers, so it is hoped that this new method provides a useful way to exploit the opportunities in such large datasets to gain new insights into evolution.

Selection in a putative meningitis vaccine target

In Variation of the factor H-binding protein in Neisseria meningitidis, Carina Brehony in Martin Maiden's lab at Oxford investigated a group of outer membrane proteins in the bacterium responsible for meningococcal meningitis. To date, attempts to raise a vaccine against the common serogroup B meningococci have been frustrated by the low immunogenicity of the serogroup B capsular polysaccharide, despite success with serogroups A and C. Outer membrane proteins, such as factor H-binding protein (fHbp) may provide alternative targets for vaccine development.

However, fHbp is genetically diverse, and our investigation showed evidence of structuring into three groups. OmegaMap analyses of the three groups revealed a signature consistent with strong selection pressure for antigenic variability at the gene. Notably, there was clear evidence of diversifying selection at several previously discovered epitopes - positions in the protein targeted by antibodies during bacteria-killing immune response. (Analysis of one group is shown in the figure, with known epitopes marked).

While these observations are encouraging in terms of understanding the biology of pathogen antigens, a pressing question is how do we translate that understanding into practical vaccine design? Studies such as ours suggest a multi-component vaccine may be necessary to achieve broad coverage against serogroup B meningococci.

omegaMap at BioHPC

All evolutionary biologists wishing to make use of omegaMap now have access to a high performance parallel computing cluster via the internet courtesy of Cornell's CBSU and Microsoft. The software, which allows the detection of selection and recombination in DNA or RNA sequences, can be run via the web interface at cbsuapps.tc.cornell.edu/omegamap.aspx, or downloaded as part of the BioHPC suite.

The web interface consists of a simple form where users can upload their configuration file and sequences in FASTA format. Completed jobs are notified by e-mail. To learn more about the project visit the CBSU home page.

Meanwhile, I am working on several major updates to omegaMap, the most interesting of which will probably be the development of a new model that allows for the joint analysis of natural selection acting on sequences from different populations or species. The aim is to integrate population genetic and phylogenetic models of selection in order to exploit the signal of selection contained both in polymorphism within populations (or species) and divergence between them. I will be presenting progress on this work, in the context of hominid evolution, at the 2009 SMBE meeting in Iowa City this June.