Red-blue electoral map and the green-gray in satellite imagery

For NYT’s The Upshot, Tim Wallace and Krishna Karra looked at how the red-blue electoral map relates to the green and gray color spectrum in satellite imagery:

The pattern we observe here is consistent with the urban-rural divide we’re accustomed to seeing on traditional maps of election results. What spans the divide — the suburbs represented by transition colors — can be crucial to winning elections. It’s part of why President Trump, seeking to appeal to swing voters, has portrayed the suburbs as under siege and menaced by crime. But the suburbs are neither politically nor geographically monolithic. They are where Democratic and Republican voters meet and overlap, in a variety of ways.

The breakdown and process are impressive. Be sure to check out the full rundown. Wallace also provides more details about how this came together on the Twitter.

Tags: , , , ,

Tic-Tac-Toe the Hard Way is a podcast about the human decisions in building a machine learning system

From Google’s People + AI Research team, David Weinberger and Yannick Assogba build a machine learning system that plays Tic-Tac-Toe. They discuss the choices, not just the technical ones, along the way in the ten-part podcast series:

A writer and a software engineer engage in an extended conversation as they take a hands-on approach to exploring how machine learning systems get made and the human choices that shape them. Along the way they build competing tic-tac-toe agents and pit them against each other in a dramatic showdown!

This is a podcast for anyone, from curious non-techies to developers dabbling in machine learning, interested in peeking under the hood at how people make and shape ML systems.

I’m a few episodes in. It’s entertaining.

This is an especially good listen if you’re interested in machine learning, but aren’t quite sure about how it works beyond a bunch of data going into a black box.

Tags: , , ,

Park sounds before and during the pandemic

With lockdown orders arounds the world, places that we’re allowed to go sound different. The MIT Senseable City Lab looked at this shift in audio footprint through the lens of public parks:

Using machine learning techniques, we analyze the audio from walks taken in key parks around the world to recognize changes in sounds like human voices, emergency sirens, street music, sounds of nature (i.e., bird song, insects), dogs barking, and ambient city noise. We extracted audio files from YouTube videos of park walks from previous years, and compared them with walks recorded by volunteers along the same path during the COVID-19 pandemic. The analysis suggests an overall increase in birdsong and a decrease in city sounds, such as cars driving by, or construction work. The interactive visualization proposed in Sonic Cities allows users to explore and experience the changing soundscapes of urban parks.

The 3-D view shown above is visually interesting, but the top-down view is the easiest to read, looking like a stacked area chart over a map.

At distinct points on the mapped paths, a gradient line represents the distribution of quieter and louder sounds. Louder sounds appear to take up more space during the pandemic.

It’s hard to say how accurate the sound classification is through this view, but as I poked around, it seemed a bit rough. For example, the chart for Central Park in New York shows bird sounds making about 0% of the footprint, but you can hear birds pretty easily in the audio clips. I’d also be interested in how they normalized between YouTube clips and their own recorded audio to get a fair comparison.

Nevertheless, it’s an interesting experiment both statistically and visually. Worth a look.

Tags: , , , ,

Machine learning to make a dictionary of words that do not exist

Thomas Dimson trained a model to generate words that don’t exist in real life and definitions for said imaginary words. If you didn’t tell me the words were machine-generated, I’d believe a lot of them were actual parts of the English dictionary.

Tags: , ,

Machine learning to help you not touch your face

The CDC recommends that you do not touch your face to minimize the spread of the coronavirus. We do this quite a bit without even thinking about it, so Do Not Touch Your Face uses machine learning to help you adjust. Train the algorithm, and then the algorithm trains you.

Tags: , ,

Machine learning to erase penis drawings

Working from the Quick, Draw! dataset, Moniker dares people to not draw a penis:

In 2018 Google open-sourced the Quickdraw data set. “The world’s largest doodling data set”. The set consists of 345 categories and over 15 million drawings. For obvious reasons the data set was missing a few specific categories that people enjoy drawing. This made us at Moniker think about the moral reality big tech companies are imposing on our global community and that most people willingly accept this. Therefore we decided to publish an appendix to the Google Quickdraw data set.

Draw what you want, and the application compares your sketch against a model, erasing any offenders.

Tags: ,

Machine learning to steal baseball signs

Mark Rober, who is great at explaining and demonstrating math and engineering to a wide audience, gets into the gist of machine learning in his latest video:

Tags: , ,

Runway ML makes machine learning easier to use for creators

Machine learning can feel like a foreign concept only useful to those with access to big machines. Runway ML aims to make machine learning easier to use for a wider audience, specifically for creators. It provides a click-and-drag interface that lets you link algorithms, import datasets, and most importantly, experiment.

Looks like fun. Give it a go.


Myth of the impartial machine

In its inaugural issue, Parametric Press describes how bias can easily come about when working with data:

Even big data are susceptible to non-sampling errors. A study by researchers at Google found that the United States (which accounts for 4% of the world population) contributed over 45% of the data for ImageNet, a database of more than 14 million labelled images. Meanwhile, China and India combined contribute just 3% of images, despite accounting for over 36% of the world population. As a result of this skewed data distribution, image classification algorithms that use the ImageNet database would often correctly label an image of a traditional US bride with words like “bride” and “wedding” but label an image of an Indian bride with words like “costume”.

Click through to check out the interactives that serve as learning aids. The other essays in this first issue are also worth a look.

Tags: , , ,

Algorithms to fix underrepresentation on Wikipedia

Wikipedia is human-edited, so naturally there are biases towards certain groups of people. Primer, an artificial intelligence startup, is working on a system that looks for people who should have an article. It’s called Quicksilver.

We trained Quicksilver’s models on 30,000 English Wikipedia articles about scientists, their Wikidata entries, and over 3 million sentences from news documents describing them and their work. Then we fed in the names and affiliations of 200,000 authors of scientific papers.

In the morning we found 40,000 people missing from Wikipedia who have a similar distribution of news coverage as those who do have articles. Quicksilver doubled the number of scientists potentially eligible for a Wikipedia article overnight.

Then, after it finds people, it generates sample articles to get things started.

Tags: , ,