New paper: Machine learning to predict the source of campylobacteriosis using whole genome data

This study, published in October in PLOS Genetics, brings together machine learning, large bacterial isolate collections and whole genome sequencing to address the general problem of how to trace the source of human infections.

Specifically, we investigated campylobacteriosis, a common infection of animal origin causing ~1.5 million cases of gastroenteritis and 10,000 hospitalizations every year in the United States alone. We show that our combined machine learning/genomics analyses:

  • Improve the accuracy with which infections can be traced back to farm reservoirs.
  • Identify evolutionary shifts in bacterial affinity for livestock host species.
  • Detect changes in human infection capability within related strains.

These results will improve understanding not only of Campylobacter, but more generally as these technologies can readily be applied to other important bacterial pathogen species.

This paper builds on previous work published by the group, including our well cited Tracing the source of campylobacteriosis (Wilson et al 2008, PLOS Genetics 4:e1000203). The use of these methods for tracing infection has influenced public health policy and contributed to reducing disease burden.

This work demonstrates the potential for modern genomics and artificial intelligence approaches to address common and serious problems that affect our everyday lives. The awareness of the importance of infection to society has rarely been higher than in 2021, and while the current pandemic imposes an acute global problem, other infections continue to present long-term threats to health and productivity.

This work was led by Nicolas Arning, in collaboration with David Clifton and Sam Sheppard.

PhD Studentship: Genomic prediction of antimicrobial resistance spread

This position is now closed
An opportunity has arisen for a D.Phil. (Ph.D.) place on the BBSRC-funded Oxford Interdisciplinary Bioscience Doctoral Training Partnership in the area of Artificial Intelligence, specifically Predicting the spread of antimicrobial resistance from genomics using machine learning.

If successful in a competitive application process, the candidate will join a cohort of students enrolled in the DTP’s one-year interdisciplinary training programme, before commencing the research project and joining my research group at the Big Data Institute.

This project addresses the BBSRC priority area “Combatting antimicrobial resistance” by using ML to predict the spread of antimicrobial resistance in human, animal and environmental bacteria exemplified by Escherichia coli. Understanding how quickly antimicrobial resistance (AMR) will spread helps plan effective prevention, improved biosecurity, and strategic investment into new measures. We will develop ML tools for large genomic datasets to predict the future spread of AMR in humans, animals and the environment. The project will create new methods based on award-winning probabilistic ML tools pioneered in my group (BASTA, SCOTTI) by training models using genomic and epidemiological data informative about past spread of AMR. We will apply the tools collaboratively to genomic studies of E. coli in Kenya, the UK and across Europe from humans, animals and the environment, Enterobacteriaceae in North-West England, and Campylobacter in Wales. Genomics has proven effective for asking “what went wrong” in the context of outbreak investigation and AMR spread; here we will address the greater challenge of repurposing such information using ML for forward prediction of future spread of AMR. Scrutiny will be intense because future predictions can and will be tested, raising the bar for the biological realism required while producing computationally efficient tools.

Attributes of suitable applicants: Understanding of genomics. Interest in infectious disease. Some numeracy, e.g. mathematics A-level, desirable. Experience of coding would help.

Funding notes: BBSRC eligibility criteria for studentship funding applies (https://www.ukri.org/files/legacy/news/training-grants-january-2018-pdf/). Successful students will receive a stipend of no less than the standard RCUK stipend rate, currently set at £14,777 per year.

How to apply: send me a CV and brief covering letter/email (no more than 1 page) explaining why you are interested and suitable by the Wednesday 11 July initial deadline. I will invite the best applicant/s to submit with me a formal application in time for the Friday 13 July second-stage deadline.

New paper: Rapid host switching in Campylobacter

Our new open access paper Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections was published last week in the ISME Journal.

Figure from paper 
With Bethany Dearlove, Sam Sheppard and colleagues, we investigated common strains of campylobacter, the most frequent cause of bacterial gastroenteritis worldwide. Campylobacter infection is associated with food poisoning, particularly contaminated chicken. But in previous work, we found that certain strains (the ST-21, ST-45 and ST-828 complexes) are often found contaminating a range of meat and poultry, making it difficult to trace the source of human infection.

That previous work was based on partial genome sequencing known as MLST. In MLST, less than 1% of the information in the genome is captured. Now that whole genome sequencing is available, the expectation was that we should be able to distinguish easily between between ST-21, 45 and 828 strains contaminating poultry versus beef versus lamb, and so on.

What we found was surprising. Instead of these strains harbouring previously unobserved sub-structure that allowed them to be associated with different animal sources, we found rapidly mixing populations undergoing extremely fast transmission between animal species, with campylobacter strains ricocheting among animal species on a timescale of just a few years. This is faster than they can accumulate enough mutations to differentiate populations colonizing different animal species.

Our results present an unforeseen roadblock to tracing transmission with whole genome sequencing, and suggests these strains are adapted to a generalist lifestyle, shedding new light on the ecology of this pathogen. These findings push back against the tide of opinion that whole genome sequencing is necessarily a panacea for detecting transmission, and demonstrate that going forwards, a detailed understanding of the biology of zoonotic bacteria (those transmitting between multiple species) and intensive sampling of potential sources are essential for effectively tracing the source of human infection.

World Health Day: Food-borne disease theme

For World Health Day 2015, the group's research into food-borne campylobacter infection was featured on the Nuffield Department of Medicine's home page. The piece features recent work Bethany Dearlove and I have conducted into zoonotic (animal-human) transmission with Sam Sheppard. The paper is currently under review, and a preprint can be downloaded from the website.

Geographical differences in transmission revealed by cryptic population structure

Two papers that I co-authored with colleagues at Lancaster and Massey Universities appear this month in the October 2010 issue of Epidemiology & Infection. The common theme is that cryptic differences in the population structure of the enteric pathogen Campylobacter jejuni, revealed by my method for attributing cases to source populations, suggest subtle differences in transmission between rural and urban districts.
The method, implemented in the software iSource (available on my website), allows strains of campylobacter to be characterized as poultry- or cattle-associated based on their genetic profiles. Interestingly, when the relative incidence of poultry- and cattle-associated strains is plotted on a map, there is a significantly higher occurrence of poultry-related disease in urban areas and cattle-related disease in rural areas. Both studies – one in Lancashire led by Edith Gabriel and one in New Zealand led by Petra Mullner – draw the same conclusion. These findings imply that there are subtle differences in transmission in rural and urban areas. Whether they represent geographical differences in the profile of food pathogens, environmental exposure, resistance to infection or other risk factors is not understood.

Campylobacter source attribution in New Zealand

What is the source of the common food poisoning pathogen Campylobacter jejuni was the subject of a paper published in September last year in PLoS Genetics by my colleagues and I, in which we traced the origin of bacterial isolates collected from patients in Lancashire, England. In that study, and a subsequent investigation into campylobacteriosis across Scotland, we found that the majority of cases could be attributed to populations of C. jejuni typically found in poultry.

Now Petra Mullner, Nigel French and colleagues have genetically characterized the C. jejuni populations found in human patients, cattle, sheep, poultry and environmental samples from New Zealand covering the period March 2005 - February 2008. What is special about their study is that the New Zealand poultry industry is a closed system, with no foreign imports, making it possible to directly sample the putative source populations and disease-causing isolates concurrently.

Like the studies in England and Scotland, poultry was the inferred source of the majority of disease in New Zealand. Uniquely however, it was possible to attribute cases separately to the three major poultry suppliers on the islands. One supplier in particular was attributed a disproportionate number of cases using 3 assignment methods, including my method (iSource, soon to be available on this website). Supported in part by this evidence, the New Zealand Food Safety Authority introduced mandatory targets for limiting Campylobacter contamination of poultry products in 2007. Remarkably, the number of cases fell from 15,873 in 2006 before the control measures were introduced to 6,689 in 2008. The next chapter of this intriguing story will be a follow-up study to establish whether the fall in the number of cases corresponded to a reduction in the proportion of campylobacteriosis attributable to poultry sources.

Neolithic origin of Campylobacter jejuni

As part of a recent trip to the University of Edinburgh to visit Andrew Rambaut, I gave a talk on some work of mine published in the February edition of Molecular Biology and Evolution and subsequently recommended on the Faculty of 1000 website about the evolution of the gut pathogen Campylobacter jejuni.

Part of the paper is concerned with the issue of the timescale of Campylobacter evolution, and using longitudinal samples of C. jejuni DNA sequences we attempted to calibrate the molecular clock in a similar way to that which is standard practice for viruses.

We detected surprisingly rapid evolution - 1,000 times faster than traditional estimates - which would place the split of C. jejuni from its closest relative C. coli during the Neolithic revolution. Interestingly, the point estimate of 6,500 years ago for the split from C. coli - which preferentially infects swine - coincides with the spread of pig domestication in the Near East and Europe in the 4th millennium BC.

The date is controversial because the traditional dating method, which is based on bounding deep phylogenetic splits such as the common ancestor of mitochondria and bacteria, would place the divergence of C. jejuni and C. coli closer to 10 million years ago.

After the seminar I had an interesting discussion with Paul Sharp, who was in the audience. Prof Sharp is actively researching the causes of conflict between long-term and short-term estimates of the rate of evolution in viruses. As he points out, short-term rate estimates (usually based on longitudinally-sampled viral sequences) frequently suggest that evolution is occurring much more rapidly than long-term estimates (based on deeper calibration points, such as co-phylogeny of host and pathogen). This phenomenon, observed in HIV and hepatitis C among others, may be caused by overly simplistic models of sequence evolution.

So how plausible is it that a ubiquitous bacterial pathogen such as C. jejuni evolved as recently as the Neolithic, possibly in response to changes brought about by agriculture or animal husbandry? Longitudinal studies of Helicobacter pylori and Neisseria gonnorhoeae have obtained similarly rapid rates of bacterial evolution, and evidence is mounting that the Neolithic revolution played an important role in creating new niches for human, plant and animal pathogens. Perhaps the best prospect for resolving these questions will be studies of ancient DNA preserved from the period in question.

Tracing the source of campylobacteriosis

Finally, it's out! The main piece of work to come out of my two-year period as Research Associate at Lancaster University is published today in PLoS Genetics.

The article reports a study in Lancashire, England, of the bacterium Campylobacter jejuni, the primary cause of bacterial gastro-enteritis in developed countries. We inferred the source of infection in 1,200 patients by comparing the DNA sequences of C. jejuni taken from those patients to 1,100 taken from different animal species and the environment. The result: livestock are the source of infection in 97% of cases.

In addition to preparing the figures, approving final drafts, and producing a press release in conjunction with the PLoS Genetics and university press offices, I have spent much of my time over the last three weeks revising a companion paper on the evolution of C. jejuni. On Friday that was resubmitted to Molecular Biology and Evolution, and should it be accepted, will draw a line under my Lancaster projects.