New method inferring natural selection published today

I am pleased to report that my new paper "A population genetics-phylogenetics approach to inferring natural selection" is published today in PLoS Genetics. This is the culmination of two years work at the University of Chicago with Molly Przeworski, plus a good deal of follow-up since I moved to Oxford. In the paper we introduce a new way of combining population genetics and phylogenetics models of natural selection, and a statistical method (gammaMap) for estimating parameters under the model. From a collection of sequences within one or more species - in the paper, we use 100 X-linked coding sequences that Peter Andolfatto produced in Drosophila melanogaster and D. simulans - the method allows you to estimate the distribution of fitness effects within each lineage, and localize the signal of selection using a Bayesian sliding window approach. Using Ryan Hernandez's simulator SFSCODE we tested the method for robustness to demographic change and linkage disequilbrium, and we investigated the effect that common assumptions concerning spatial variation in selection coefficients (sitewise, genewise and sliding window approaches) have on inference of selection. During the winter break I will work on compiling the program for different platforms and writing the documentation, with a view to releasing the software early in the New Year. Subscribe to this blog for updates or - if you are too impatient to wait - send me an email.

Discovering the distribution of fitness effects

At this year's Society for Molecular Biology and Evolution meeting in Lyon I presented ongoing work estimating the distribution of fitness effects, which is a collaborative venture with Molly Przeworski and Peter Andolfatto. Earlier versions of this research appeared in talks I presented at Chicago in December (Ecology and Evolution Departmental seminar) and Liverpool in January (UK Population Genetics Group meeting), and it follows on from last year's SMBE presentation in which I discussed methods to tease out sub-genic variation in selection pressure.

There is intrinsic interest in the fitness effects of novel mutations in coding regions of the genome, especially the relative frequency of occurrence of neutral, beneficial and deleterious variants. Yet estimating the distribution of fitness effects (the DFE) is also of practical use when localizing the signal of adaptive evolution. The reason is that in Bayesian analyses, the assumed DFE can influence the strength of evidence for or against adaptation at a particular site. Consequently it is preferably to estimate the DFE at the same time as detecting adaptation at individual sites to avoid prior assumptions unduly influencing the results.

Having estimated the DFE, it is of use in quantifying the relative contribution of adaptation versus drift to genome evolution. The figure, taken from my talk in Lyon (slides here), illustrates the idea when a normal distribution is used to estimate the DFE; the relative area of the green to the yellow shaded regions represents the respective contribution of adaptation versus drift in amino acid substitutions accrued along the Drosophila melanogaster lineage.