Festival of Genomics 2024

I will be talking at the Festival of Genomics on Wednesday 24 January about Identifying virulence and antimicrobial resistance genes in bacterial using genome-wide association studies. You can preview my talk here.

Announcing ProbGen22 in Oxford 28-30 March

The organizing committee is pleased to announce the 7th Probabilistic Modeling in Genomics Conference (ProbGen22) to be held at the Blavatnik School of Government and Somerville College Oxford from 28th-30th March 2022.

The meeting will be a hybrid in-person and online event. Talk sessions will feature live speakers, both in-person and online, and will take place during the afternoons (making live attendance feasible for US timezones). Talks will be recorded and made available to registrants for a period of one month. Poster sessions will be held online during the evenings.

The conference will cover probabilistic models, algorithms, and statistical methods across a broad range of applications in genetics and genomics. We invite abstract submissions on a range of topics including population genetics, natural selection, Quantitative genetics, Methods for GWAS, Applications to cancer and other diseases, Causal inference in genetic studies, Functional genomics, Assembly and variant identification, Phylogenetics, Single cell 'omics, Deep learning in genomics and Pathogen genomics.

The registration deadline is 28th February 2022.

For more details visit the conference website. 

Two new positions: Senior Statistical Geneticist and Bioinformatician

Two new positions are available in my Infectious Disease Genomics group at the Big Data Institute, University of Oxford.

A Senior Postdoctoral Statistical Geneticist to jointly lead the implementation, design and application of new statistical tools for genome-wide association studies, lead the biological interpretation of key findings, develop methodologies and supervise junior group members. This post would suit a candidate with a PhD and relevant post-doctoral experience including direct experience in statistical genetics. Candidates without post-doctoral experience may be considered for a less senior appointment.

A Bioinformatician to provide expertise for computationally intensive analyses including genome-wide association studies and RNAseq studies of differential gene expression, as well as contributing to informatics projects as part of a wider collaboration with national biomedical cohorts. This post would suit a candidate with either a post-graduate degree related to Bioinformatics, Statistics, and Computing or equivalent experience in industry.

The application deadline for both posts is Noon GMT on Friday 7th January 2022.

New paper: Antimicrobial resistance determinants are associated with Staphylococcus aureus bacteraemia and adaptation to the healthcare environment

Staphylococcus aureus is a leading cause of infectious disease deaths in all countries, with bloodstream infection leading to sepsis a major concern. This new study, published in November in Microbial Genomics, reports genes and genetic variants in Staph. aureus associated severe disease vs asymptomatic carriage, and healthcare vs community carriage.

Our genome-wide association study of 2000 bacterial genomes showed that antibiotic resistance in Staph. aureus is associated with severe disease and the hospital environment:

  • A mutation conferring trimethoprim resistance (dfrB F99Y) and the presence of a gene conferring methicillin resistance (mecA) were both associated with bloodstream infection vs asymptomatic nose carriage.
  • Separately, we demonstrated that a mutation conferring fluoroquinolone resistance (gyrA L84S) and variation in a gene involved in resistance to multiple antibiotics (prsA) were preferentially associated with healthcare-associated carriage vs community-acquired carriage.

The implication – that antibiotic resistance genes may provide survival advantages which mechanistically contribute to the development of disease – is important in the face of the continued global rise of antibiotic resistance.


We were also able to shed light on a controversy as to whether different strains of Staph. aureus differ in their propensity to cause severe disease. Interest in this question dates back decades in the literature, and contradictory studies, often based on modest sample sizes, have reached different conclusions. Our comparatively large study, using a whole-genome method that we previously published in Nature Microbiology, found that all strains of Staph. aureus are equally likely to cause severe disease vs asymptomatic carriage.




New paper: Genome-wide association studies reveal the role of polymorphisms affecting factor H binding protein expression in host invasion by Neisseria meningitidis

In this paper, published in October in PLOS Pathogens, we discovered a novel genetic association between life-threatening invasive meningococcal disease (IMD) and bacterial genetic variation in factor H binding protein (fHbp) through two bacterial genome-wide association studies (GWAS), which we validated experimentally. This was a collaboration with the groups of Chris Tang and Martin Maiden, with the work in my group led by Sarah Earle.

fHbp is an important component of meningococcal vaccines that directly interacts with human complement factor H (CFH). Intriguingly, our discovery that bacterial genetic variation in fHbp associates with increased virulence mirrors an earlier discovery that human genetic variation in CFH associates with increased susceptibility to IMD (Nature Genetics 42: 772).

Our experiments showed that the fHbp risk allele increased expression. Interestingly, increased susceptibility to IMD has been previously associated with elevated CFH expression. Therefore over-expression of either fHbp by the bacterium or CFH by the host appears to increase the risk of IMD. Since complement evasion is necessary for pathogenesis, these insights offer new leads for improving treatment.

Key results from the paper:

  • A GWAS for IMD in 261 meningococci from the Czech Republic highlighted a highly polygenic architecture of meningococcal virulence (see Figure), including capsule biosynthesis genes, the meningococcal disease association island and the new signal near the fba and fHbp genes.
  • A replication GWAS for IMD in 1295 meningococcal genomes belonging to strain ST41/44 downloaded from pubMLST.org validated the novel signal of association near fba and fHbp.
  • SHAPE reactivity analyses revealed that IMD-associated variation in the regulatory region of fHbp disrupted the ability of the cell machinery to commence gene expression.
  • Flow cytometry assays of newly constructed genetically engineered strains, in different temperatures and in the presence and absence of human serum, attributed changes in gene expression to a non-synonymous candidate mutation in the fHbp gene.

In this study, our GWAS relied exclusively on publicly available genome sequences and metadata, highlighting the untapped potential of large-scale open source databases like pubMLST.org, and the value of big data for improving our understanding of disease.



The group’s research response to COVID-19

This is an update on the group's research response to the COVID-19 pandemic. As an infectious disease group we have been keen to contribute to the international research effort where we could be useful, while recognising the need to continue our research on other important infections where possible.

  • Bugbank. Thanks to a pre-existing collaboration between our group, Public Health England and UK Biobank, we were in a position to help rapidly facilitate COVID-19 research via SARS-CoV-2 PCR-based swab test results. Beginning mid-March, we worked to provide regular (usually weekly) updates of tests results, which were made available to all UK Biobank researchers beginning April 17th. This is one of several resources on COVID-19 linked to UK Biobank. Beginning in May we provided feeds to other cohorts: INTERVAL, COMPARE, Genes & Health and the NIHR BioResource. We provide updates on this work through the project website www.bugbank.uk. We have published a paper describing the dynamic data linkage in Microbial Genomics (press release). Key collaborators in this project are Jacob Armstrong (Big Data Institute) Naomi Allen (UK Biobank) and David Wyllie and Anne Marie O'Connell (Public Health England).


  • Epidemiological risk factors for COVID-19. Graduate student Nicolas Arning and I are developing an approach to quantify the effects of lifestyle and medical risk factors for COVID-19 in the UK Biobank that accounts for inherent uncertainty in which risk factors to consider. The new method employs the harmonic mean p-value, a model-averaging approach for big data that we published previously. We are in the process of evaluating the performance of the approach, comparing it to machine learning, and interpreting the results.

  • Antibody testing for the UK Government. Postdoc Justine Rudkin has been working in the lab with Derrick Crook, Sir John Bell and others to measure the efficacy of antibody tests for the UK Government. They have tested many hundreds of kits to establish the sensitivity and specificity of the tests to help evaluate the utility of a national testing programme. This work was crucial in demonstrating the limitations of early blood-spot based tests, and the credibility of subsequent generations of antibody tests. The work has been published in Wellcome Open Research.


Work on other infections that has continued during the lockdown. Postdoc Sarah Earle continues research into pathogen genetic risk factors for diseases including tuberculosis and meningococcal meningitis, while Steven Lin has continued to pursue work on hepatitis C virus genetics and epidemiology. Many of our close collaborators are infection doctors and they have of course been recalled to clinical duties. Laboratory work in the group has been severely disrupted, particularly several of Justine's Staphylococcus aureus projects. We are keen to pick up on those projects where we left off when the chance arrives.

Postdoc Available in Statistical Genetics

The closing date for applications for this post is noon on Wednesday 15th April 2020.

We are seeking an exceptional researcher with a track record in methods development for Statistical Genomics and an interest in Infectious Disease to join our group at the Big Data Institute. Our research focuses on Bacterial Genomics, Genome-Wide Association Studies and Population Genetics. The aim of the post is to conduct innovative research within the group's range of interests and to make use of the opportunities afforded by our outstanding collaborators. We welcome candidates who wish to use the opportunity as a stepping stone to independent funding.

The Oxford University Big Data Institute (BDI) is an interdisciplinary research centre aiming to develop, evaluate and deploy efficient methods for acquiring and analysing biomedical data at scale and for exploiting the opportunities arising from such studies. The Nuffield Department of Population Health, a partner in the BDI, contains world-renowned population health research groups and is an excellent environment for multi-disciplinary teaching and research.

The Postdoctoral Researcher in Statistical Genomics will join our team which has expertise in microbiology, genomics, evolution, population genetics and statistical inference. Responsibilities include planning a research project and milestones with help and guidance from the group, preparing manuscripts for publication, keeping records of results and methods and tracking milestones, and disseminating results.

To be considered, you need to hold, or be close to completion of, a PhD/DPhil involving statistical methods development. You also need experience of large-scale statistical data analysis, evidence of originating and executing your own academic research ideas and excellent interpersonal skills and the ability to work closely with others in a team.

For informal enquiries, please contact me.

Further details, including how to apply are here: https://my.corehr.com/pls/uoxrecruit/erq_jobspec_details_form.jobspec?p_id=145506

New paper: PVL toxin associated with pyomyositis


In a new collaborative study published this week in eLife, we report a strong association between Staphylococcus aureus that carry the PVL toxin and pyomyositis, a muscle infection often afflicting children in the tropics.

Catrin Moore and colleagues at the Angkor Children's Hospital in Siem Reap, Cambodia, spent more than a decade collecting S. aureus bacteria from pyomyositis infections in young children, and built a comparable control group of S. aureus carried asymptomatically in children of similar age and location.

When Bernadette Young in our group compared the genomes of cases and controls using statistical tools we developed, she found some strong signals:

  • Most, but not all, pyomyositis was caused by the CC-121 strain, common in Cambodia.
  • The association with CC-121 was driven by the PVL toxin which it carries.
The ability to pinpoint the association to PVL came about because (i) a sub-group of CC-121 that lacked PVL caused no pyomyositis and (ii) pyomyositis-causing S. aureus from backgrounds that rarely caused pyomyositis were unusual in also possessing PVL.

The strength of the PVL-pyomyositis association was extraordinarily strong, so strong that PVL appeared all-but necessary for disease. Moreover, disease appeared to be monogenic, with no other genes involved elsewhere in the bacterial genome. To discover an apparently monogenic disease mechanism for a common disease is very unusual nowadays.

The discovery has immediate practical implications because it draws parallels between pyomyositis and toxin-driven bacterial diseases like tetanus and diphtheria that have proven amenable to immunization. The fact that anti-PVL vaccines have already been developed in other contexts offers hope for the future treatment of this debilitating tropical infection.

Our study throws much-needed light on a subject that has been the subject of heated debate over previous years. Many bacterial toxins, PVL included, have been implicated in diverse S. aureus disease manifestations, often without sound evidence. Because PVL is known to contribute to angry, pus-filled skin infections, and has been observed in bacteria causing rare and severe S. aureus infections, some authors have implicated it in dangerous diseases including necrotizing pneumonia, septic arthritis and pyomyositis, but detailed meta-analyses have dismissed these claims as not substantiated. Our GWAS approach offers unprecedented robustness over previous generations of candidate gene studies by accounting for bacterial genetic variation across the entire genome.

If you are interested, please take a closer look at the paper.

GRAF, a new tool for finding duplicates and closely related samples in large genomic datasets

Genome-wide association studies (GWAS) usually rely on the assumption that different samples aren’t from closely related individuals. If you’re using combined datasets that have been genotyped on different platforms, though, how do you detect duplicates and close relatives? The dbGaP … Continue reading

dbGaP 10th Anniversary Symposium June 9, 2017

dbGaP (the NIH database of Genotypes and Phenotypes) is celebrating its 10th Anniversary this year! We are proud to support over 850 studies and 1.6 million samples. We invite you to join us at the dbGaP 10th Anniversary Symposium to … Continue reading