Communicating risk in the context of daily living

Wayne Oldford, a statistics professor at the University of Waterloo, explains risk in the context of daily life at the individual level, because “one in a million” is not especially intuitive:

A few years ago, I was the “go to guy” at the University of Waterloo, asked to speak to local media, whenever a lottery jackpot got stupendously large (and the news cycle got exceedingly slow). My purpose was to relate to their audience the size of the chance of winning in a way that was quick yet comprehensible, which I did with some success on local radio and television stations.

Inevitably, though, the next day I would hear back of listener disappointment – that some of the fun of purchasing a ticket had been removed. Joy came from anticipating winning the prize and my exposition killed that for many, by them having gained an appreciation of the chance of actually winning.

I felt a little bit bad about this. I wanted people to understand the probabilities but I didn’t want to be a kill joy.

Important reading if you’re trying to understand the odds of things these days.

My favorite explanation of risk in the day-to-day is still the one from David Spiegelhalter.

Tags: , ,

Cancer and statistics

Hannah Fry works with statistics and risk, but her perspective changed when she was diagnosed with cancer. Fry documented the experience and it’s available on BBC:

Hannah Fry, a professor of maths, is used to investigating the world around her through numbers. When she’s diagnosed with cervical cancer at the age of 36, she starts to interrogate the way we diagnose and treat cancer by digging into the statistics to ask whether we are making the right choices in how we treat this disease. Are we sometimes too quick to screen and treat cancer? Do doctors always speak to us honestly about the subject? It may seem like a dangerous question to ask, but are we at risk of overmedicalising cancer?

At the same time, Hannah records her own cancer journey in raw and emotional personal footage, where the realities of life after a cancer diagnosis are laid bare.

You can only watch the film in the UK for now, but she spoke about the topic on the Numberphile podcast. Worth a listen.

Tags: , , ,

Catching students cheating with R

Matthew Crump, a psychology professor who discovered high volume cheating in his class via WhatsApp, outlines the saga in five parts. Bonus points for use of R to analyze the evidence:

I do a lot of teaching on using computational tools for reproducible data analysis. I can input some data and run it through a script for analysis. When the data changes I can run it through the same script and get the new analysis. The chat archive had changed and this time it was easier to do the analysis all over again. I redid all the counts of academic integrity violations and rewrote the forms spelling out sanctions for each student. So many forms, I died a little inside once for every form.

Tags: , , ,

Statistical personality quiz matches you to fictional characters

The Open-Source Psychometrics Project, which seems to have been around for a while, provides personality quizzes as an exercise in data collection and personality education:

This website has been offering a wide selection of psychological assessments, mostly personality tests, since late 2011 and has given millions of results since then. It exists to educate the public about various personality tests, their uses and meaning, the various theories of personality and also to collect data for research and develop new measures. This website is under continuous development and new tests and information are being added all the time.

One of the more recent quizzes matches your personality with fictional characters, and the results seem oddly close? I took the short version, and out of 2,000 characters, I was a 92% match to Data from Star Trek. I’m not totally sure how I feel about that.

You can also download anonymized data collected through the project.

Tags: ,

AI says if you’re the a**hole

There’s a subreddit where people share a story and ask if they’re the asshole. WTTDOTM and Alex Petros trained AI models based on the responses so that you can enter your own story and see what the AI outputs as responses:

AYTA responses are auto-generated and based on different datasets. The red model has only been trained on YTA responses and will always say you are at fault. The green model has only been trained on NTA responses and will always absolve you. And the white model was trained on the pre-filtered data. Have fun!

Unfortunately you only get three responses from your input, one from each model. It would’ve been fun if the AI tried to make a final call.

Tags: ,

Posted by in AI, Reddit, statistics

Tags: ,

Permalink

Fracture and flow of Oreo cookies

Crystal Owens, Max Fan, John Hart, and Gareth McKinley from Massachusetts Institute of Technology published their research on how the cream in an Oreo behaves when you split the sandwich, in Physics of Fluids:

Using a laboratory rheometer, we measure failure mechanics of the eponymous Oreo’s “creme” and probe the influence of rotation rate, amount of creme, and flavor on the stress–strain curve and postmortem creme distribution. The results typically show adhesive failure, in which nearly all (95%) creme remains on one wafer after failure, and we ascribe this to the production process, as we confirm that the creme-heavy side is uniformly oriented within most of the boxes of Oreos. However, cookies in boxes stored under potentially adverse conditions (higher temperature and humidity) show cohesive failure resulting in the creme dividing between wafer halves after failure. Failure mechanics further classify the creme texture as “mushy.” Finally, we introduce and validate the design of an open-source, three-dimensionally printed Oreometer powered by rubber bands and coins for encouraging higher precision home studies to contribute new discoveries to this incipient field of study.

This is very important. [via kottke]

Tags: , ,

Applying sentiment analysis usefully

Sentiment analysis can be fun to apply to varying types of text, but the usefulness of the results, as Rachael Tatman argues, is often low:

[T]he places where it makes sense for a data scientist or NLP practitioner working in industry to use sentiment analysis are vanishingly rare. First, because it doesn’t work very well and second, because even when it does work it’s usually measuring the wrong thing.

Although it’s not a lost cause. Tatman also points out areas where sentiment analysis could provide value.

Tags: ,

Calculating win probabilities

Zack Capozzi, for USA Lacrosse Magazine, explains how he calculates win probabilities pre-game and during games. On interpretation, which could easily apply to other sports and all forecasts:

But interpretation here matters quite a bit. And this is frustrating for some people, but that 61 percent should be interpreted as: “if these teams played 100 times, we would expect Marquette to win 61 of those games.” It definitely does not mean that the model is 61 percent confident that Marquette will win.

This is a bit odd, but this also means that if the Win Probability model gives Team A a 90% chance to beat Team B, there is nothing wrong with the model if Team B ends up winning the game. The issue would arise if, out of 100 90-percent win probability games, the favorite wasn’t winning around 90 of those games. When the model says 90 percent, you want it to mean 90 percent.

I wonder how many people incorrectly interpret the probability as “61 percent confident”. I bet a lot.

I do know that ever since the Golden State Warriors lost to the Cleveland Cavaliers in the 2016 NBA Finals — while holding a 90-something percent win projection by FiveThirtyEight — I stopped paying attention to win probability. But learning more about the calculation made it more interesting.

Tags: , ,

Lessons learned from making covid dashboards

For Nature, Lynne Peeples spoke to the people behind many of the popular covid dashboards and the lessons learned:

Among the shared themes for the dashboards were simplicity and clarity. Whether you are producing visuals and analytical tools for policymakers or for the public, Blauer says, the same rules of thumb apply. “Don’t overcomplicate your visualization, make the conclusions as clear as possible, and speak in the most basic of plain-language terms,” she says.

Yet, as other data scientists point out, presenting data simply might not be enough to ensure viewers get the message. For one thing, attention to detail matters. Ritchie recalls how she and her team spent hours focused on the titles and subtitles of charts, “because that is ultimately what most people will look at”. And in those titles and subtitles, the analysts made sure to specify ‘confirmed’ deaths or ‘confirmed’ cases. “An emphasis on ‘confirmed’ is really important because we know that it’s an underestimate of the total,” says Ritchie. “It might seem very basic, but it’s really crucial to how you understand the data and the scale of the pandemic.”

Tags: , ,

Increasing mortality baseline

There was a time not that long ago when a hundred covid deaths seemed like a lot, but now the United States is getting closer to one million deaths with over a thousand deaths per day. The country is unmasking and re-opening. For The Atlantic, Ed Yong discusses the shifting baseline and our perception of these big numbers:

The United States reported more deaths from COVID-19 last Friday than deaths from Hurricane Katrina, more on any two recent weekdays than deaths during the 9/11 terrorist attacks, more last month than deaths from flu in a bad season, and more in two years than deaths from HIV during the four decades of the AIDS epidemic. At least 953,000 Americans have died from COVID, and the true toll is likely even higher because many deaths went uncounted. COVID is now the third leading cause of death in the U.S., after only heart disease and cancer, which are both catchall terms for many distinct diseases. The sheer scale of the tragedy strains the moral imagination. On May 24, 2020, as the United States passed 100,000 recorded deaths, The New York Times filled its front page with the names of the dead, describing their loss as “incalculable.” Now the nation hurtles toward a milestone of 1 million. What is 10 times incalculable?

Tags: , , ,