Random number generation with lava lamps

Tom Scott explains how Cloudflare uses a wall of lava lamps to generate random numbers. A video camera is pointed at the wall, and the movement in the lamps plus noise from the video provides randomness, which is used to secure websites.

Even though computers can do many things on their own, they still need help from the physical world for true unpredictability. The robot overlords aren’t here yet. [via kottke]

Tags: , , ,

Not so likely life of The Simpsons

For The Atlantic, Dani Alexis Ryskamp compares the financials of The Simpsons against present day medians, arguing that the fictional family’s lifestyle is no longer attainable:

The purchasing power of Homer’s paycheck, moreover, has shrunk dramatically. The median house costs 2.4 times what it did in the mid-’90s. Health-care expenses for one person are three times what they were 25 years ago. The median tuition for a four-year college is 1.8 times what it was then. In today’s world, Marge would have to get a job too. But even then, they would struggle. Inflation and stagnant wages have led to a rise in two-income households, but to an erosion of economic stability for the people who occupy them.

Someone should take this a step further and look at distributions and time series to show the shift, with The Simpsons as baseline.

Tags: , , , ,

Machine learning to find a recipe for a baked good that’s half cake and half cookie

Last year, around the time when people were baking a lot of things, Sarah Robinson used machine learning to find a recipe for a “cakie”:

Like many people, I’ve been entertaining myself at home by baking a ton and talking about my sourdough starter as if it were a real person. I’m pretty good at following recipes, but I decided I wanted to take things one step further and understand the science behind what differentiates a cake from a bread or a cookie. I also like machine learning so I thought: what if I could combine it with baking??!

Robinson provides the final recipe at the end, so first, I need to try this recipe. Second, what other foods and beverages can this apply to?

Tags: , , ,

Neural network creates images from text

OpenAI trained a neural network that they call DALL·E with a dataset of text and image pairs. So now the neural network can take text input and output random combinations of descriptors and objects, like a purse in the style of Rubik’s cube or a teapot imitating Pikachu.

Tags: , , ,

Blob Opera is a machine learning model you can make music with

David Li, in collaboration with Google Arts and Culture, made a fun experiment to play with:

We developed a machine learning model trained on the voices of four opera singers in order to create an engaging experiment for everyone, regardless of musical skills. Tenor, Christian Joel, bass Frederick Tong, mezzo‑soprano Joanna Gamble and soprano Olivia Doutney recorded 16 hours of singing. In the experiment you don’t hear their voices, but the machine learning model’s understanding of what opera singing sounds like, based on what it learnt from them.

So smooth. So blobby.

Tags: , , , ,

Through the eyes of the algorithm

Eugene Wei looks closer at the algorithms that drive TikTok and how its design provided an effective feedback loop:

But for TikTok (or Douyin, its Chinese clone), who needed an algorithm that would excel at recommending short videos to viewers, no such massive publicly available training dataset existed. Where could you find short videos of memes, kids dancing and lip synching, pets looking adorable, influencers pushing brands, soldiers running through obstacle courses, kids impersonating brands, and on and on? Even if you had such videos, where could you find comparable data on how the general population felt about such videos? Outside of Musical.ly’s dataset, which consisted mostly of teen girls in the U.S. lip synching to each other, such data didn’t exist.

In a unique sort of chicken and egg problem, the very types of video that TikTok’s algorithm needed to train on weren’t easy to create without the app’s camera tools and filters, licensed music clips, etc.

At first I was confused by TikTok. I’m still confused by TikTok. But one thing that is for sure is that the system knows how to serve up videos that one might find interesting. Whether that’s good in the long run is anyone’s guess.

Tags: , ,

Scented candle reviews on Amazon and Covid-19

Prompted by a tweet about scented candles without smell and Covid-19, Kate Petrova plotted Amazon reviews for scented and unscented candles over time. Notice the downward trend for scented candles after the first confirmed case for Covid-19.

Interesting if true. I’m imagining a bunch of people opening their new scented candles, taking a big whiff, and not smelling anything.

But I wonder if there are outside forces (a.k.a. confounding factors) at work here. For example, Petrova only looked at reviews for the “top 3” scented candles. What do we see with other candles? Maybe a higher demand for scented candles from more people staying at home put a strain on the manufacturer. Maybe there was a shortage of some scented ingredient, which led to less potent candles. Maybe new scented candles customers have unrealistic expectations of what candles smell like.

I don’t know.

Maybe the decreasing average review really is related to Covid-19 symptoms.

Petrova put up the code and data, in case you want to dig into it.

Tags: , , , ,

Jobs of a data scientist

Roger Peng outlines four main roles of a data scientist:

If you’re reading this and find yourself saying “I’m not an X” where X is either scientist, statistician, systems engineer, or politician, then chances are that is where you are weak at data science. I think a good data scientist has to have some skill in each of these domains in order to be able to complete the basic data analytic iteration.

The good thing about data science is that you can apply the skills to different fields and tasks. It’s also one of the challenges when you’re in the early phases of learning, because you have to figure out what to work on. This should point you in the right direction.

See also: Peng’s tentpoles of data science.

Tags: , ,

Debunking claims of election rigging

There’s a video (one of too many I am sure) going around that “shows” election rigging. Statistician Kristian Lum shows, with good ol’ basic math and R plots, why the “evidence” is what happens during a normal election.

Tags: , ,

Estimate your Covid-19 risk, given location and activities

The microCOVID Project provides a calculator that lets you put in where you are and various activities to estimate your risk:

This is a project to quantitatively estimate the COVID risk to you from your ordinary daily activities. We trawled the scientific literature for data about the likelihood of getting COVID from different situations, and combined the data into a model that people can use. We estimate COVID risk in units of microCOVIDs, where 1 microCOVID = a one-in-a-million chance of getting COVID.

Tags: ,