Introduction to Probability for Data Science, a free book

Introduction to Probability for Data Science is a free-to-download book by Purdue statistics professor Stanley H. Chan:

We need a book that balances the theory and practice. We need a book that provides insights and not just theorems and proofs. We need a book that motivates the students, telling them why probability is so essential to their work. We need a book that highlights the impacts of the subject. From over than half a decade of teaching the course, I have distilled what I believe to be the core of probabilistic methods. I put the book in the context of data science, to emphasize the inseparability between data (computing) and probability (theory) in our time.

Download a free PDF copy or buy a physical copy.

Tags: , ,

Odds of winning the big Mega Millions prize

With tonight’s Mega Millions jackpot estimated at $1.28 billion, you might be wondering what the odds of winning are, even if you know the chances are super slim for an individual. (On the other hand, the more tickets purchased overall, the greater the chances that someone in the country wins.) For The Washington Post, Bonnie Berkowitz and Shelly Tan made a playful quiz to test your perception of 1 in 302.6 million.

Tags: ,

Calculating win probabilities

Zack Capozzi, for USA Lacrosse Magazine, explains how he calculates win probabilities pre-game and during games. On interpretation, which could easily apply to other sports and all forecasts:

But interpretation here matters quite a bit. And this is frustrating for some people, but that 61 percent should be interpreted as: “if these teams played 100 times, we would expect Marquette to win 61 of those games.” It definitely does not mean that the model is 61 percent confident that Marquette will win.

This is a bit odd, but this also means that if the Win Probability model gives Team A a 90% chance to beat Team B, there is nothing wrong with the model if Team B ends up winning the game. The issue would arise if, out of 100 90-percent win probability games, the favorite wasn’t winning around 90 of those games. When the model says 90 percent, you want it to mean 90 percent.

I wonder how many people incorrectly interpret the probability as “61 percent confident”. I bet a lot.

I do know that ever since the Golden State Warriors lost to the Cleveland Cavaliers in the 2016 NBA Finals — while holding a 90-something percent win projection by FiveThirtyEight — I stopped paying attention to win probability. But learning more about the calculation made it more interesting.

Tags: , ,

Looking for similar NBA games, based on win probability time series

Inpredictable, a sports analytics site by Michael Beuoy, tracks win probabilities of NBA games going back to the 1996-97 season. When a team is up by a lot, their probability of winning is high, and then flip that for the losing team. So for each game, you have a minute-by-minute time series of win probability.

Beuoy added a new feature that looks for games with similar patterns a.k.a. “Dopplegamers”.

Tags: ,

Probability you will break up with your partner

Rosenfeld, et al. from Stanford University ran a survey in 2009 for a study on How Couples Meet and Stay Together. Dan Kopf and Youyou Zhou for Quartz used this dataset to estimate the probability that you will break up with your partner, given a few bits of information about your current relationship.

The Stanford data page says a 2017 release is on the way. I’m curious how, if anything, has changed in relationships between 2009 and now.

Tags: , ,

Chances it’s a Friend’s Birthday Every Single Day of the Year

If it seems like every day you log in to Facebook, it’s someone’s birthday, you're probably not that far off. Read More

What That Election Probability Means

What That Election Probability Means

You're going to see probability values mentioned a lot these next few months. Many people will misinterpret. But not you. Read More

Nearly impossible to predict mass shootings with current data

Predicting mass shooting

Even if there were a statistical model that predicted a mass shooter with 99 percent accuracy, that still leaves a lot of false positives. And when you’re dealing with individuals on a scale of millions, that’s a big deal. Brian Resnick and Javier Zarracina for Vox break down the simple math with a cartoon.

Tags: ,

Place your bets: sea level rise from Antarctic ice sheet collapse

2015-03-03882B_cover_6-690x320I have a few bits of happy news! The BBC’s program Climate Change by Numbers on which I was one of the main scientific consultants has won the AAAS Science Journalism Award in the “in

What probability means in different fields

Political scientist probability meaning

Statistically, probability ranges from 0 to 1 — impossible to definitely without a doubt. Math with Bad Drawings characterized what those values mean in various fields of expertise. This amuses me.

Tags: ,