Multivariate Beer

Beer-labels

Can you experience data? Sometimes visualization gets you part of the way there, putting data into context, serving as a trigger for your memory, and all that. But only so much can happen through the computer screen.

I want to feel data the way I want to taste the food in pictures. It's one thing to see something good, and it's another to be at a restaurant to taste a dish direct from the source.

Maybe we can use food to understand data. Instead of charts that use visual cues such as geometry and color, we can use ingredients at varying levels to represent variables in a dataset. Food has the potential to appeal to all the senses, rather than only sight.

Moritz Stefaner and prozessagenten have toyed around with the concept in their Data Cuisine workshops. Thomas Levine gave a talk not long ago on the gastronomification of data. Fish to represent emigration. Guacamole to represent test scores.

I'm curious. I like to cook. I like data. What do you get when you combine the two, and does the food help you understand data differently than you would from a bar graph?

Wait a second. I also like beer.

Data plus beer. Multivariate beer. Okay, gotta do it.

I've been playing around with the idea of an R package that spits out a beer recipe based on data from the latest American Community Survey release from the United States Census Bureau. The main function creates a recipe for each county. It takes into account the following:

  • Percent of people with at least bachelor's degree
  • Percent of people who are employed
  • Percent of people covered by health care
  • Median household income
  • Population density
  • Percent of population that is white, black, hispanic, and Asian

It was important that I incorporate multiple variables, because I want to find out if I end up with relationships or just disparate taste notes. I think I know what single variable beer would be, and I have a hunch I'd miss out on potential complexities.

The great thing about beer is that it has plenty of dimensions to work with: body, bitterness, head retention, hop profile, color, aroma, alcohol by volume, and plenty more. The amount of various ingredients affects how beer looks, tastes, and smells.

Still a work in progress, here's how a beer recipe is formed.

  • Greater head retention should increase with higher education, so a grain called Carapils is added.
  • More hop aroma represents higher employment. This comes from more hops at the end of a boil and dry hopping.
  • Rye adds spice and complexity to the beer as health care coverage increases.
  • A darker-colored and more full-bodied beer comes from higher median household income and Crystal Malt 40.
  • More hop bitterness and flavor means more people per square mile, and the type of hops — Cascade, Centennial, Citra, Warrior, and Magnum — represents the races of the population.

For example, here is the recipe for Salt Lake County, Utah:

SALT LAKE COUNTY ALE
-----------------------------
This recipe is for a 5-gallon batch.

Hop addition times decided by brewer. Suggestion: Continuous hopping every 10 minutes during a 60-minute boil. That's 1.44 ounces per interval, which includes the hop addition at the beginning of the boil.

Add half of aroma hops at flameout. Use the rest for dry-hopping.

HOPS
-----------------------------
Cascade: 7.3oz
Centennial: 0.2oz
Citra: 0.4oz
Warrior: 1.8oz
Magnum: 0.4oz
Cascade (for aroma): 3.4oz

GRAINS
-----------------------------
American 2-row: 12lbs
Carapils: 0.7lbs
Rye: 0.6lbs
Crystal40: 0.6lbs

The recipe function also spits out some rough label sketches, as shown at the top of this post. Bar graphs show how the county compares to others, there's a simple map, and another is a dot plot using multi-dimensional scaling.

Are there noticeable differences in look, aroma, and taste for various counties? Next step: brew county ales and see what happens. Stay tuned. Brewing takes about half a day and fermentation about a month. Maybe I should fix myself some data sandwiches in the meantime.

Tags: , ,

Chart-Topping Songs as Graphs and Diagrams

Chart-topping songs

Billboard ranked the top 100 songs since the creation of their Hot 100 list in 1958. The list is based on airplay and sales. Here are the songs in chart form.

Tags: ,

Looking For the Closest Casino

Nearest casino

The New York Times covered casinos in the Northeast and noted that more than half of the population is within 25 miles of a casino. Naturally, I wondered what it is like in other parts of the country, so I sampled uniformly across the United States looking for the nearest casino. The results are shown in the map above.

Using the Google Places API, I took about 7,500 samples in twenty-mile increments, east to west and north to south. Of those samples, 36 percent of them are 25 miles or less from a casino.

This is significantly lower than the Northeast's 50 percent mark, but keep in mind that the sample also includes uninhabited areas. Lots of deserts and mountains. So the national percentage for population is likely higher. After all, some 80 percent of the country's population lives in urban areas.

However, it's tough to say what the actual percentage is based on this data. Google classifies places as casinos based on their own criteria, and not everything is an actual casino by a legal definition. For example, there's one Utah casino location in my sample, but gambling is illegal in the state, and there are no Indian reservation casinos. This particular place is a poker party business.

So think estimate rather than concrete count.

But, speaking of reservations, also of note: how this map coincides with Indian reservations, especially in the west.

Indian reservations

In any case, if you're in the mood to lose money, alongside a three-dollar prime rib dinner, it's likely you're not too far off.

Tags: ,

Where People Work and How Much They Make

Salaries by Industry

One of the things that drew me into Statistics is that you can apply it to a variety of fields, and in recent years, more industries hire in-house statisticians (or other varietals of data scientist and analyst). You can work in different industries with the same educational background.

However, the salaries can vary a lot between industries, which made me curious. How does salary vary across industries for other occupations? Here is an interactive to help you see, based on data from the Bureau of Labor Statistics for jobs in 2013. Search for your own occupation or browse random ones to see the changes and differences.

Bar charts on the left show the number of people employed in an industry for the selected occupation, and the ranges on the right show annual salaries from the 10th to 90th percentile. The dot in the middle of the line is the median salary. Mouse over bars and lines for more information about each industry.

Registerned nurse

Some occupations are specific to certain industries. For example, registered nurses are primarily in the health care sector. Although, it is not completely uncommon to find nursing jobs in other places such as in government or educational services, as shown above.

On the other hand, network and computer systems administrators work in lots of industries, and salaries range from $32,050 all the way up to $158,170 (10th to 90th percentile). In fact, with most industries and occupations, the salaries can vary by orders of magnitude. There's plenty to glean from the data, but if there's one takeaway, it's that if you're really good at what you do, you can make a good salary.

Have a look at the data.

Tags: , , , ,