Machine learning to find a recipe for a baked good that’s half cake and half cookie

Last year, around the time when people were baking a lot of things, Sarah Robinson used machine learning to find a recipe for a “cakie”:

Like many people, I’ve been entertaining myself at home by baking a ton and talking about my sourdough starter as if it were a real person. I’m pretty good at following recipes, but I decided I wanted to take things one step further and understand the science behind what differentiates a cake from a bread or a cookie. I also like machine learning so I thought: what if I could combine it with baking??!

Robinson provides the final recipe at the end, so first, I need to try this recipe. Second, what other foods and beverages can this apply to?

Tags: , , ,

Blob Opera is a machine learning model you can make music with

David Li, in collaboration with Google Arts and Culture, made a fun experiment to play with:

We developed a machine learning model trained on the voices of four opera singers in order to create an engaging experiment for everyone, regardless of musical skills. Tenor, Christian Joel, bass Frederick Tong, mezzo‑soprano Joanna Gamble and soprano Olivia Doutney recorded 16 hours of singing. In the experiment you don’t hear their voices, but the machine learning model’s understanding of what opera singing sounds like, based on what it learnt from them.

So smooth. So blobby.

Tags: , , , ,

Red-blue electoral map and the green-gray in satellite imagery

For NYT’s The Upshot, Tim Wallace and Krishna Karra looked at how the red-blue electoral map relates to the green and gray color spectrum in satellite imagery:

The pattern we observe here is consistent with the urban-rural divide we’re accustomed to seeing on traditional maps of election results. What spans the divide — the suburbs represented by transition colors — can be crucial to winning elections. It’s part of why President Trump, seeking to appeal to swing voters, has portrayed the suburbs as under siege and menaced by crime. But the suburbs are neither politically nor geographically monolithic. They are where Democratic and Republican voters meet and overlap, in a variety of ways.

The breakdown and process are impressive. Be sure to check out the full rundown. Wallace also provides more details about how this came together on the Twitter.

Tags: , , , ,

The group’s research response to COVID-19

This is an update on the group's research response to the COVID-19 pandemic. As an infectious disease group we have been keen to contribute to the international research effort where we could be useful, while recognising the need to continue our research on other important infections where possible.

  • Bugbank. Thanks to a pre-existing collaboration between our group, Public Health England and UK Biobank, we were in a position to help rapidly facilitate COVID-19 research via SARS-CoV-2 PCR-based swab test results. Beginning mid-March, we worked to provide regular (usually weekly) updates of tests results, which were made available to all UK Biobank researchers beginning April 17th. This is one of several resources on COVID-19 linked to UK Biobank. Beginning in May we provided feeds to other cohorts: INTERVAL, COMPARE, Genes & Health and the NIHR BioResource. We provide updates on this work through the project website www.bugbank.uk. We have published a paper describing the dynamic data linkage in Microbial Genomics (press release). Key collaborators in this project are Jacob Armstrong (Big Data Institute) Naomi Allen (UK Biobank) and David Wyllie and Anne Marie O'Connell (Public Health England).


  • Epidemiological risk factors for COVID-19. Graduate student Nicolas Arning and I are developing an approach to quantify the effects of lifestyle and medical risk factors for COVID-19 in the UK Biobank that accounts for inherent uncertainty in which risk factors to consider. The new method employs the harmonic mean p-value, a model-averaging approach for big data that we published previously. We are in the process of evaluating the performance of the approach, comparing it to machine learning, and interpreting the results.

  • Antibody testing for the UK Government. Postdoc Justine Rudkin has been working in the lab with Derrick Crook, Sir John Bell and others to measure the efficacy of antibody tests for the UK Government. They have tested many hundreds of kits to establish the sensitivity and specificity of the tests to help evaluate the utility of a national testing programme. This work was crucial in demonstrating the limitations of early blood-spot based tests, and the credibility of subsequent generations of antibody tests. The work has been published in Wellcome Open Research.


Work on other infections that has continued during the lockdown. Postdoc Sarah Earle continues research into pathogen genetic risk factors for diseases including tuberculosis and meningococcal meningitis, while Steven Lin has continued to pursue work on hepatitis C virus genetics and epidemiology. Many of our close collaborators are infection doctors and they have of course been recalled to clinical duties. Laboratory work in the group has been severely disrupted, particularly several of Justine's Staphylococcus aureus projects. We are keen to pick up on those projects where we left off when the chance arrives.

Tic-Tac-Toe the Hard Way is a podcast about the human decisions in building a machine learning system

From Google’s People + AI Research team, David Weinberger and Yannick Assogba build a machine learning system that plays Tic-Tac-Toe. They discuss the choices, not just the technical ones, along the way in the ten-part podcast series:

A writer and a software engineer engage in an extended conversation as they take a hands-on approach to exploring how machine learning systems get made and the human choices that shape them. Along the way they build competing tic-tac-toe agents and pit them against each other in a dramatic showdown!

This is a podcast for anyone, from curious non-techies to developers dabbling in machine learning, interested in peeking under the hood at how people make and shape ML systems.

I’m a few episodes in. It’s entertaining.

This is an especially good listen if you’re interested in machine learning, but aren’t quite sure about how it works beyond a bunch of data going into a black box.

Tags: , , ,

Park sounds before and during the pandemic

With lockdown orders arounds the world, places that we’re allowed to go sound different. The MIT Senseable City Lab looked at this shift in audio footprint through the lens of public parks:

Using machine learning techniques, we analyze the audio from walks taken in key parks around the world to recognize changes in sounds like human voices, emergency sirens, street music, sounds of nature (i.e., bird song, insects), dogs barking, and ambient city noise. We extracted audio files from YouTube videos of park walks from previous years, and compared them with walks recorded by volunteers along the same path during the COVID-19 pandemic. The analysis suggests an overall increase in birdsong and a decrease in city sounds, such as cars driving by, or construction work. The interactive visualization proposed in Sonic Cities allows users to explore and experience the changing soundscapes of urban parks.

The 3-D view shown above is visually interesting, but the top-down view is the easiest to read, looking like a stacked area chart over a map.

At distinct points on the mapped paths, a gradient line represents the distribution of quieter and louder sounds. Louder sounds appear to take up more space during the pandemic.

It’s hard to say how accurate the sound classification is through this view, but as I poked around, it seemed a bit rough. For example, the chart for Central Park in New York shows bird sounds making about 0% of the footprint, but you can hear birds pretty easily in the audio clips. I’d also be interested in how they normalized between YouTube clips and their own recorded audio to get a fair comparison.

Nevertheless, it’s an interesting experiment both statistically and visually. Worth a look.

Tags: , , , ,

Machine learning to make a dictionary of words that do not exist

Thomas Dimson trained a model to generate words that don’t exist in real life and definitions for said imaginary words. If you didn’t tell me the words were machine-generated, I’d believe a lot of them were actual parts of the English dictionary.

Tags: , ,

Machine learning to help you not touch your face

The CDC recommends that you do not touch your face to minimize the spread of the coronavirus. We do this quite a bit without even thinking about it, so Do Not Touch Your Face uses machine learning to help you adjust. Train the algorithm, and then the algorithm trains you.

Tags: , ,

Request for proposals: Single Cell in the Cloud codeathon at NYGC in January

The New York Genome Center is hosting an NCBI  Single Cell in the cloud codeathon from January 15-17, 2020. Submissions for project proposals are due December 2nd. Please submit your proposal and apply here. What topics are in scope? This codeathon … Continue reading

Machine learning to erase penis drawings

Working from the Quick, Draw! dataset, Moniker dares people to not draw a penis:

In 2018 Google open-sourced the Quickdraw data set. “The world’s largest doodling data set”. The set consists of 345 categories and over 15 million drawings. For obvious reasons the data set was missing a few specific categories that people enjoy drawing. This made us at Moniker think about the moral reality big tech companies are imposing on our global community and that most people willingly accept this. Therefore we decided to publish an appendix to the Google Quickdraw data set.

Draw what you want, and the application compares your sketch against a model, erasing any offenders.

Tags: ,