Arguing in favor of dual axes to show correlation

Charts that use two different scales on the same vertical often get the automatic “misleading” label, because if you stretch and shrink two data series enough, you’ll eventually find a way to make them look related. Toph Tucker argues that the automatic dismissal is misguided:

So yes, dual axes transform the series, and that transformation can lie. But it is the same kind of transformation that is already built into the Pearson correlation coefficient. Insofar as dual axes are bad, so is the Pearson correlation coefficient. Their merits and their badness go together. Dual axes are good at showing spurious correlation because they are good at showing correlation.

The challenge is that when you see a line chart with time on the horizontal axis and multiple data lines, it’s hard to separate coordinate systems and we’ve learned to read the lines as patterns over time. On the other hand, a scatterplot (or a connected one for time) highlights the relationship.

So while you don’t need to avoid dual axes completely, you should be careful when you do.

Tags: , ,

Spotting spurious correlations in health news

When it comes to diet and health, you might see one day that a certain food decreases life expectancy. Then on another day, a different study suggests that food increases life expectancy. It’s hard to know which to believe. David Epstein provides some guidance:

If you’ve watched a lot of NFL games, you’ve probably heard a commentator at some point trot out a stat like: “The Chicago Bears are undefeated in division games following a bye week when they wear their alternate jerseys.”

It is possible that the combination of extra rest and unusual attire gets the Bears amped for division rivals, thus improving their performance — just as it is possible that using the sauna precisely 9-12 times per month at a particular temperature protects against dementia (at first). It is far more likely, however, that given a large database of games and conditions, there will be loads of correlations that occur just by chance; the more you slice and dice the data, the more you’ll find.

Tags: , ,

Airport runway orientation reveals wind patterns

Airport runways orient certain directions that correlate with wind direction in the area. It helps planes land and take off more easily. So, when you map runways around the world, you also get wind patterns, which is what Figures did:

Winds circulate around the globe, forming patterns of gigantic proportions. These patterns become part of human culture and are reflected in our architecture. They are hidden designs, mapping the complexion of the earth, which we can uncover. By orienting on the direction of general winds, airports recreate wind patterns, forming a representation of a global wind map with steel and stone, thus making the invisible visible.

Tags: , ,

Waffle House index as a storm indicator

Waffle House activated their storm center in preparation for Hurricane Florence. Their restaurants are open 24/7, so they need to keep track of which ones need to close or limit their menus. This might also have to do with an informal Waffle House Index that FEMA described last year:

If a Waffle House can serve a full menu, they’ve likely got power (or are running on a generator). A limited menu means an area may not have running water or electricity, but there’s gas for the stove to make bacon, eggs, and coffee: exactly what hungry, weary people need.

It’s more than just a Waffle House though.

Businesses in communities are often some of the biggest drivers of recovery. If stores can open, people can go back to work. If people can go back to work, they can return to at least one piece of a normal life—and that little piece of normalcy can make a big difference.

Hold up. I think I got it. If we just keep all the businesses open, we can avoid all disaster. That’s how causation-correlation works, right? Nailed it.

(Stay safe, Carolinians.)

Tags: , ,

Common statistical interpretation mistakes

Statistics is a game of subtleties, and you lose when you don’t pay attention to the details. Here are a handful of common mistakes when interpreting the numbers. In a nutshell: You get into trouble when you assume and ignore.

Tags: , , ,

Game: Guess the correlation

Guess the correlation game

Guess the Correlation is a straightforward game where you do just that, and it's surprisingly fun. You get a scatterplot and you guess the correlation coefficient. That's it. If you're off by too much, you lose a life, and if you're almost spot on, you gain a life. If you're somewhat right, you get a coin. Bonus points for streaks of correct guesses.

Have at it.

Tags: ,

Gender prediction through trivia performance

Gender prediction through trivia performance

Todd Schneider likes trivia, and he plays in an online league called LearnedLeague. Curious, Schneider wondered if there was anything interesting he could glean from the performance of the LLamas (Learned League members) that might apply to knowledge in general.

He looked at it from two angles. In the first, he simply calculated correlation coefficients between subjects. If you know world history, are you more likely to know geography? Yes. If you know math, are you more likely to be in tuned with pop culture? Probably not. The correlations aren't too surprising, but the correlation strengths are fun to poke at.

The second angle: gender prediction through performance levels in various subjects.

LLamas optionally provide a bit of demographic information, including gender, location, and college(s) attended. It's not lost on me that my category performance is pretty stereotypically "male." For better or worse, my top 3 categories—business, math, and sports—are often thought of as male-dominated fields. That got me to wondering: does performance across categories predict gender?

As shown up top, Schneider used a decision tree and got decent results. [Thanks, Todd]

Tags: , , ,

Fake correlation

Collider actresses

Gabriel Rossman, a sociology professor at UCLA, describes colliders — or when correlation does not equal causation and the former might not even exist either. Referring to the simulated plot above, Rossman uses Hollywood actor selection as an example:

For instance, suppose that in a population of aspiring Hollywood actors there is no correlation between acting ability and physical attractiveness. However assume that we generally pay a lot more attention to celebrities than to some kid who is waiting tables while going on auditions. That is, we can not readily observe people who aspire to be actors, but only those who actually are actors. This implies that we need to understand the selection process by which people get cast into films. In the computer simulation displayed below I generated a population of aspiring actors characterized by "body" and "mind," each of which follows a normal distribution and with these two traits being completely orthogonal to one another. Then imagine that casting directors jointly maximize talent and looks so only the aspiring actors with the highest sum for these two traits actually get work in Hollywood. I have drawn the working actors as triangles and the failed aspirants as hollow circles. Among those actors we can readily observe there then will be a negative correlation between looks and talent, even though there is no such correlation in the grand population. If we see only the working actors without understanding the censorship process we might think that there is some stupefaction of being ridiculously good-looking.

Tags: ,

Think you know why obesity rates are rising? You’re probably wrong.

I know what’s been causing obesity rates to rise. So do you. So does just about everyone. Unfortunately, most of us are probably wrong.

As I tell my students, correlation doesn’t prove causation, but that’s not why we use it. …

The post Think you know why obesity rates are rising? You’re probably wrong. appeared first on PLOS Blogs Network.

Random things that correlate

Divorce rate in Maine vs margarine

This is fun. Tyler Vigen wrote a program that attempts to automatically find things that correlate. As of writing this, 4,000 correlations were found so far (and actually over 100 more when I finished). Some of the gems include: the divorce rate in Maine versus per capita consumption of margarine, marriage rate in Alabama versus whole milk consumption per capita, and honey produced in bee colonies versus labor political action committees. Many things correlate with cheese consumption.