Statistical fallacies in the news

For UnHerd, Tom Chivers, talks about David Spiegelhalter’s new book and why every statistical headline deserves a grain of salt. One way to make sure things check out:

As a non-mathematician, I have a few shortcuts for working out whether a statistic is worth believing, which seem to have done all right for me so far. One, which Spiegelhalter stresses, is that often the best statistical analysis you can do is simply visualising the data. There was a bit of a recent kerfuffle about suicides among girls and young women going up 83% since 2012; but simply looking at the ONS chart showed that the numbers were small, the data was noisy, and the only way you got the 83% figure was by choosing the lowest year on record. (It’s an old trick.)

See also: common statistical fallacies.

Tags: , ,

Salesforce to acquire Tableau

From Tableau CEO Adam Selipsky:

In 2003, Tableau set out to pioneer self-service analytics with an intuitive analytics platform that would empower people of any skill level to work with data. Our customers grew with us to form the strongest analytics community in the world. And today, that mission to help people see and understand data grows stronger.

I’m excited to announce that Tableau has entered into an agreement to be acquired by Salesforce in an acquisition that combines the #1 CRM with the #1 analytics platform. By joining forces we will accelerate our ability to accomplish our mission. Together, Salesforce and Tableau share a deep commitment to empowering their respective communities and enabling people of every skill level to transform their businesses, their careers, and their lives through technology.

I’m an outsider looking in, so this surprised me, but maybe it was expected for those closer. Tableau sponsored this little site of mine for nearly a decade, so I think it might have appeared smaller to me than it actually is.

Anyways, it’ll be interesting to see where Tableau goes from here, especially for those who worked with the software outside a marketing context.

Tags: , ,

Giorgia Lupi joins Pentagram

Giorgia Lupi, whose work exemplifies the use of data and visualization outside of analytic insights (think Dear Data), is now a partner at design consultancy Pentagram. For FastCompany, Mark Wilson with the news:

At Pentagram, where she’ll have access to the biggest brands in the world, Lupi believes she can find a greater reach for data design in general. “It’s a good opportunity to expand graphics beyond the niche field of data visualization, and figure out how data visualization can be part of our daily experiences–in the things we consume, wear, and see,” Lupi says. “I want to explore things I don’t think have been done before.”


Tags: ,

News story lifespan charts

A wideout view of the news cycle can look like a series of rise and falls. Something captures the general public’s attention, and then it fades off. Thank you, next. This collaboration between Schema Design and Google Trends charts search volume for news stories and aligns them by their peaks, so that you can see these rise and falls.

Transparent areas overlaid on each other show an “average” trend, and the more irregular shapes are made obvious because they stand out from the rest.

See also the simpler view by Axios, who contributed stories to the project.

Tags: , , ,

DataKind receives $20M grant to expand on data for social good

DataKind, the organization known for helping others use data for social good, received a $20 million grant from The Rockefeller Foundation and the Mastercard Center for Inclusive Growth:

The grant will allow DataKind to transition from a project to a platform-based model, thereby, supporting more organizations on a set of high impact areas, such as community health and inclusive growth. We’re humbled and honored that these two groups are supporting our mission with $20M over five years to help us grow to support the needs of the sector.



Posted by in DataKind, news



Amanda Cox promoted to New York Times data editor

Amanda Cox is the new data editor for The New York Times:

As data editor, Amanda will continue to provide direction for The Upshot, and she’ll add the expertise from Computer-Assisted Reporting journalists in New York and software developers here and in the Washington bureau. She’ll serve as the top adviser to the executive editor and managing editor on statistical questions like polling methodologies and election forecasting, and she’ll participate in conversations with desks as they discuss data-oriented reporting that may aid our economics, technology and investigative coverage.

In her time here, Amanda has helped bring together some of our best explanatory and statistical reporting efforts with our smartest visualization experts, and now she’ll do that on a bigger stage.

Great news for both Amanda and NYT.

Tags: ,

xkcd and the needle of probability

xkcd referenced the ever-so-loved forecasting needle. I’m so not gonna look at it this year. Maybe.

Tags: , , ,

News homepage design overview

As you click through the news, you can probably almost always figure out what source is loading without the URL or title. Just judge based on the layout. Noah Veltman made this overview to show how news orgs prioritize editorial content, ads, and sponsored content.

Tags: , ,

The Markup is a new journalism venture to examine technology through data

Founded by Sue Gardner, the former head of the Wikimedia Foundation and Julia Angwin and Jeff Larson, journalists formerly for ProPublica, The Markup will aim to use data to help non-experts better understand everyday technologies that often go unchecked.

When Angwin and Larson worked together at ProPublica, their data-driven investigations included exposing discriminatory advertising practices at Facebook, bias in software that is used in criminal sentencing and algorithms that result in unfair car insurance pricing. They also uncovered evidence of domestic surveillance practices in the Snowden archives and revealed technology vulnerabilities at the President’s Mar-A-Lago country club.

“I’m excited to build a team with deep expertise that can really scale up and advance the work Jeff and I began at ProPublica,” Angwin said. “We see The Markup as a new kind of news organization, staffed with journalists who know how to investigate the uses of new technologies and make their effects understandable to non-experts.”

“People know that these new technologies are important and want to better understand their societal effects. We will help them do that,” said Larson. “The Markup will hold the powerful to account, raise the cost of bad behavior, and spur reforms.”

The venture is primarily backed by a $20 million donation from Craigslist founder Craig Newmark and $2 million from the Knight Foundation. Amazing.

Looking forward to this.

Tags: ,

Limitations of the new book about HGT networks

This is a joint post by David Morrison and Ajith Harish.

There has been a flurry of reviewing activity recently about the new book:

The Tangled Tree: a Radical New History of Life
David Quammen. 2018. Simon & Schuster.

This book has received glowing reviews, including:

The book is intended for the general public, rather than for specialists, explaining the "new view" of evolutionary history that includes extensive horizontal gene transfer (HGT), especially in the microbial world. Quammen describes himself as a science, nature and travel writer, so his book is more than just a record of science, and is as much about the people involved as about the scientific theory. In particular, it contains a biography of Carl Woese.

Quammen’s recent New York Times feature article The scientist who scrambled Darwin’s Tree of Life is a very good primer to his book. For us, it indicates that the book has many overlaps with Jan Sapp's earlier book The New Foundations of Evolution: on the Tree of Life (2009. Oxford University Press). The publisher’s advertised selling point of that book is: "This is the first book on (and first history of) microbial evolutionary biology, and that it puts forth a new theory of evolution", with HGT being the new theory. In this sense, the "radical new view" is simply that genetic material can be transferred without sexual reproduction, an idea that goes back rather a long way in history (see The history of HGT), and which is often seen as anti-Darwinian.

Bill Hanage in his review of Sapp’s book (2010. The trouble with trees. Science 327: 645-646) argues that the book neither puts forward a new theory nor is the debate actually about horizontal gene transfer, and the Tree of Life is thus far from settled. There are many other interesting points discussed in that review. Furthermore, even after almost 10 years, Hanage’s review of Sapp’s 2009 book can be substituted verbatim as a review of Quammen’s 2018 book! This PDF shows how the book review would read if the author and book names in Hanage’s review were to be substituted [reproduced with the permission of the original author].

The debate allegedly involving HGT is, at heart, about explaining the pattern of extensively mixed genetic material found in the akaryotes. However, simply looking at a pattern does not tell you about the process that created the pattern. In order to study processes, we need a model, in this case a model about how evolution occurs. The "HGT model" is that the Last Universal Common Ancestor (LUCA) of life was a relatively simple organism genetically, and that subsequent evolutionary history has involved complexification of that ancestor, both by diversification and by HGT.

What the two books do not explore is the other major model for the current distribution of genetic material among akaryotes. This alternative scenario is that the LUCA was genetically complex, and that the subsequent evolutionary history involved independent losses of parts of the genetic material — the sporadically shared material is basically coincidental. All that this model requires is that there be evolutionary history prior to the LUCA, during which it became a complex organism from its simple beginnings — the LUCA is merely as far back as we can see into the past, with the prior history being unrecoverable by us (ie. we cannot see past the LUCA bottleneck).

Over the past couple of decades, a number of papers have explored the evidence for the latter idea, from both the RNA and protein perspectives, including:
  • Anthony Poole, Daniel Jeffares, David Penny (1999) Early evolution: prokaryotes, the new kids on the block. BioEssays 21: 880-889.
  • Christos A. Ouzounis, Victor Kunin, Nikos Darzentas, Leon Goldovsky (2006) A minimal estimate for the gene content of the last universal common ancestor — exobiology from a terrestrial perspective. Research in Microbiology 157: 57-68.
  • Miklós Csűrös István Miklós (2009) Streamlining and large ancestral genomes in Archaea inferred with a phylogenetic birth-and-death model. Molecular Biology and Evolution 26: 2087-2095.
  • Kyung Mo Kim, Gustavo Caetano-Anollés (2011) The proteomic complexity and rise of the primordial ancestor of diversified life. BMC Evolutionary Biology 11: 140.
  • Ajith Harish, Charles G. Kurland (2017) Akaryotes and Eukaryotes are independent descendants of a universal common ancestor. Biochimie 138: 168-183.
Finally, even from the perspective of phylogenetic networks, Quammen's book is very one-sided. In particular, the other processes that lead to reticulate evolution (eg. introgression and hybridization) are pretty much ignored. That is, the focus is on akaryotes not eukaryotes. The latter are also of phylogenetic interest.