State restrictions and hospitalizations

The University of Oxford’s Blavatnik School of Government defined an index to track containment measures for the coronavirus. For The New York Times, Lauren Leatherby and Rich Harris plotted the index against cases and hospitalizations:

When cases first peaked in the United States in the spring, there was no clear correlation between containment strategies and case counts, because most states enacted similar lockdown policies at the same time. And in New York and some other states, “those lockdowns came too late to prevent a big outbreak, because that’s where the virus hit first,” said Thomas Hale, associate professor of global public policy at the Blavatnik School of Government, who leads the Oxford tracking effort.

A relationship between policies and the outbreak’s severity has become more clear as the pandemic has progressed.

States with more restrictions tend to have lower rates.

From these plots, it seems clear what we need to do. But I think most people have made up their minds already, and the interpretation of the data leads people to different conclusions.

With the holidays coming up, I just hope you lean towards clarity.

Tags: , , , ,

Rooted phylogenetic networks for coronaviruses

In a previous post, Guido constructed trees for coronaviruses in the SARS group to search for evidence of recombination. He also constructed unrooted data-display networks using SplitsTree. Here, we discuss our attempts to construct rooted genealogical phylogenetic networks for the same dataset [6] but with some modifications.

In particular, we deleted some sequences, giving a smaller data set with only 12 taxa. These taxa include, next to SARS-CoV-2 (the virus causing COVID-19) and SARS-CoV (responsible for the SARS epidemic in 2002/2003), the viruses MP789 and PCoV_GX-P1E sampled from Malayan pangolins from two different Chinese provinces and several viruses found in different bat species in the horseshoe bat genus (Rhinolophus), all from China.

This research was done by Rosanne Wallin, an MSc student at VU Amsterdam and UvA. Her full thesis as well as all data and results can be found on github.

The first algorithm we applied to this data set was the TreeChild Algorithm [1], which is one of the methods that take a number of discordant (rooted, binary) trees as input and finds a rooted network containing each input tree, minimizing the number of reticulate events in the network. To filter out some noise, we contracted some poorly-supported branches and then resolved multifurcations consistently across the trees (using a tool within the TreeChild Algorithm). This gave the network below. Note that the method is restricted to so-called tree-child networks, meaning that certain complex scenarios are excluded (where a network node only has reticulate children). Also note that this is not necessarily the only optimal tree-child network and not all topological differences can be distinguished based on the trees [5].

Figure 1: Phylogenetic network constructed by the Tree-Child algorithm (blocks_A_len0.01_supp70).

The network shows no reticulation in the SARS-CoV-2 clade (the bottom four taxa) and puts SARS-CoV-2 right next to RaTG13. Furthermore, it shows a reticulation between an ancestor of HKU3-1 and a common ancestor of SARS-CoV-2 and RaTG13 leading to bat-SL-CoVZC45. However, it cannot exactly identify which common ancestor of SARS-CoV-2 and RaTG13 is the parent, leading to multiple branches (in red) leading into this reticulation. All these observations are consistent with previous research [2].

Importantly, we cannot directly conclude that each reticulation corresponds to a recombination event. See Table 2.1 of David’s book [10] for a nice overview of possible causes of reticulation. Nevertheless, based on [2], it does look like at least the reticulation leading to bat-SL-CoVZC45 corresponds to a recombination event.

The second algorithm we applied was TriLoNet [3], which constructs a rooted network directly from sequence data. It is restricted to so-called level-1 networks, meaning that it cannot construct overlapping cycles. This method produced the network below.

Figure 2: Phylogenetic network constructed by TriLoNet.

At first sight, the network may look a bit different from the previous one (Figure 1). However, note that the three observations above also hold for this second network. Moreover, the SARS-CoV-2 clade is identical in both networks. This network contains only one reticulation, which is most likely due to the level-1 restriction.

Nevertheless, we can still use this method to find more putative recombination events. To do so, we simply exclude the recombinant bat-SL-CoVZC45 from the analysis and rerun the algorithm. This gives the following network.

Figure 3: Phylogenetic network constructed by TriLoNet, after omitting bat-SL-CoVZC45.

We have now found a second putative recombination event with Rf1 as recombinant. Note that this is also consistent with the network in Figure 1. On the other hand, also note that the branching order in the SARS-CoV clade (the bottom 7 taxa in Figure 3) has changed a bit. This could mean that more recombination events are present in the SARS-CoV clade, as we also see in Figure 1.

One interesting follow-up question is whether the two (or more) networks produced by TriLoNet can be combined into a single higher-level network, in order to show multiple reticulations simultaneously (see [4] for an algorithm that could be useful).

Another interesting observation from these networks is that there is no sign of recombination involving the pangolin coronaviruses MP789 and PCoV_GX-P1E. It rather looks like these viruses evolved from common ancestors of SARS-CoV-2 and RaTG13, but it is important to note that we cannot exclude a recombination event on the basis of these networks. The relationship between SARS-CoV-2 and pangolin coronaviruses is still being debated in the literature [2,7,8,9].

Some limitations of the algorithms were noticed during this study. Firstly, the depicted networks are purely topological, i.e., the branch lengths do not represent anything. Adapting these algorithms to take branch length information into account could possibly improve their accuracy for this data set since the extant taxa have precise time stamps and for recent divergence events these times can be estimated quite accurately, see [2].

Another limitation is that we had to remove several taxa from the original data set [6] before the TreeChild algorithm could find a solution. By removing taxa, we reduced the number of reticulations needed to display the trees, making the TreeChild algorithm run in reasonable time. We made sure to include a diverse set of taxa (based on their pairwise distances [6]) to represent as much of the subgenus as possible. 

Rosanne used several other algorithms, taxon selections and also used trees based on genes rather than fixed-length blocks (which we did above, following Guido’s post), see her thesis on github.

Although rooted phylogenetic network methods are often limited in the number of taxa that can be analysed and/or the complexity of the networks that can be constructed, we have seen that these methods can be useful for constructing hypothetical evolutionary histories. Moreover, although the constructed networks are not identical, we have seen that they share certain key properties, which are also consistent with previous research.  

Rosanne Wallin, Leo van Iersel, Mark Jones, Steven Kelk and Leen Stougie

[1] Leo van Iersel, Remie Janssen, Mark Jones, Yukihiro Murakami and Norbert Zeh. A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees. arXiv:1907.08474 [cs.DM] (2019).

[2] Maciej F. Boni, Philippe Lemey, Xiaowei Jiang, Tommy Tsan-Yuk Lam, Blair W. Perry, Todd A. Castoe, Andrew Rambaut and David L. Robertson. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol 5, 1408–1417 (2020).

[3] James Oldman, Taoyang Wu, Leo van Iersel and Vincent Moulton. TriLoNet: Piecing together small networks to reconstruct reticulate evolutionary histories. Molecular Biology and Evolution, 33 (8): 2151-2162 (2016). (postprint)

[4] Yukihiro Murakami, Leo van Iersel, Remie Janssen, Mark Jones and Vincent Moulton. Reconstructing Tree-Child Networks from Reticulate-Edge-Deleted Subnetworks. Bulletin of Mathematical Biology, 81(10):3823–3863 (2019).

[5] Fabio Pardi and Celine Scornavacca. Reconstructible phylogenetic networks: do not distinguish the indistinguishable. PLoS Comput Biol, 11(4), e1004135 (2015).

[6] Grimm, Guido; Morrison, David (2020): Harvest and phylogenetic network analysis of SARS virus genomes (CoV-1 and CoV-2). figshare. Dataset.

[7]  Lam, Tommy Tsan-Yuk, Marcus Ho-Hin Shum, Hua-Chen Zhu, Yi-Gang Tong, Xue-Bing Ni, Yun-Shi Liao, Wei Wei, et al. Identifying SARS-CoV-2 Related Coronaviruses in Malayan Pangolins. Nature, 583, 282–285 (2020).

[8] Wang, Hongru, Lenore Pipes, and Rasmus Nielsen. Synonymous Mutations and the Molecular Evolution of SARS-Cov-2 Origins. [Preprint] Evolutionary Biology, April 21, 2020.

[9] Li, Xiaojun, Elena E. Giorgi, Manukumar Honnayakanahalli Marichannegowda, Brian Foley, Chuan Xiao, Xiang-Peng Kong, Yue Chen, S. Gnanakaran, Bette Korber, and Feng Gao. Emergence of SARS-CoV-2 through Recombination and Strong Purifying Selection. Science Advances, Vol. 6, no. 27 (2020). 

[10] David Morrison, Introduction to Phylogenetic Networks. RJR Productions, Uppsala, Sweden (2011).

Coronavirus cases rising in prisons

Coronavirus cases are rising (again), which includes prisoners and prison staff. The Marshall Project has been tracking cases since March and provides a state-by-state rundown:

New infections this week rose sharply to their highest level since the start of the pandemic, far outpacing the previous peak in early August. Iowa, Michigan and the federal prison system each saw more than 1,000 prisoners test positive this week, while Texas prisons surpassed 2,000 new cases.

Tags: , ,

Estimate your Covid-19 risk, given location and activities

The microCOVID Project provides a calculator that lets you put in where you are and various activities to estimate your risk:

This is a project to quantitatively estimate the COVID risk to you from your ordinary daily activities. We trawled the scientific literature for data about the likelihood of getting COVID from different situations, and combined the data into a model that people can use. We estimate COVID risk in units of microCOVIDs, where 1 microCOVID = a one-in-a-million chance of getting COVID.

Tags: ,

Spike past 100k Covid-19 cases in a day

Meanwhile… based on estimates from The COVID Tracking Project, the United States had an all-time high for daily counts yesterday, at 103,087. And 1,116 people died.

Tags: , ,

How masks work to filter out particles

Masks are effective in slowing down the spread of the coronavirus. The New York Times zoomed in at the particle level to show how masks do this.

Tags: , ,

Where coronavirus cases are peaking

With this simple choropleth map, Lauren Leatherby for The New York Times shows where coronavirus cases peaked in the past month or week. It appears the United States still has a way to go:

With case counts trending upward in almost every state — and 21 of those states adding more cases in the last week than in any other seven-day stretch — officials in parts of the country are once again implementing control measures. Residents of El Paso are under a two-week stay-at-home order, and indoor dining will be halted in Chicago beginning Friday, Oct. 30. Other officials are considering new restrictions in an effort to curb the virus’s rapid spread.


Tags: , ,

Illustrations show how to reduce risk at small gatherings

Risk of coronavirus infection changes depending on the amount of contagious particles you breathe in. El Pais illustrated the differences when you take certain measures, namely wearing masks, ventilation, and decreased exposure time.

The suggestions are based on statistical models, so there is more uncertainty than I think the explanations provide, but the sequence of illustrations provides a clear picture of what we can do — if you must do things indoors.

Tags: , , , , ,

Covid-19 cases and state partisanship

From Dan Goodspeed, the bar chart race is back. The length of the bars represents Covid-19 case rates per state, and color represents partisanship. The animation currently starts on June 1 and runs through October 13. It plays out how most of us probably assumed at some level or another.

Tags: , ,

Covid-19, the third leading cause of death

For Scientific American, Youyou Zhou made a line chart that shows cause of death in the United States, from 2015 up to present. Covid-19 was the leading cause of death in April and is now sitting at number 3. The rise in unclassified deaths also stands out.

Tags: , ,