Analysis of compound curse words used on Reddit

As you know, Reddit is typically a sophisticated place of kind and pleasant conversation. So Colin Morris analyzed the usage of compound pejoratives in Reddit comments:

The full “matrix” of combinations is surprisingly dense. Of the ~4,800 possible compounds, more than half occurred in at least one comment. The most frequent compound, dumbass, appears in 3.6 million comments, but there’s also a long tail of many rare terms, including 444 hapax legomena (terms which appear only once in the dataset), such as pukebird, fartrag, sleazenozzle, and bastardbucket.

Stay classy.

Tags: , , ,

Different languages, but similar information rates

Christophe Coupé and company analyzed speech rate (on the left) across different languages, and then compared it to information rate (on the right) in bits per second. While speech rate and information rate are still coupled, there’s less variation in information rate across languages. More syllables doesn’t necessarily mean more information.

Tags: , ,

Isotype, a picture language

Jason Forrest delves into the history of a single Isotype and a bit of the general background on the picture language:

Isotype is a highly refined picture language designed for educating people with as few words as possible. Created by Otto Neurath in 1925, the International System of Typographic Picture Education (ISOTyPE) evolved over the next two decades with the collaboration of Marie Neurath and Gerd Arntz. The trio developed their distinct approach to data visualization iteratively, and very collaboratively. Otto provided the overall direction, Marie “transformed” the data to present the story, and Gerd designed the pictogram units and highly-refined designs.

Tags: ,

Why some Asian accents swap Ls and Rs

Vox delves into why Ls and Rs often get replaced by Asian speakers using English as a second language. Some sounds aren’t prevalent in other languages, and it’s not the same across all Asian languages.

Tags: , ,

Changing size analogies and the trends of everyday things

When you try to describe the size of something but don’t have an exact measurement, you probably compare it to an everyday object that others can relate to. Using the Google Books Ngram dataset, Colin Morris looked for how such comparisons changed over the past few centuries.

I especially like the bits of history to explain why some words fell into and out of fashion.

Tags: , ,

Dialect book of maps

Speaking American Book

In 2013, Josh Katz put together a dialect quiz that showed where people talk like you, based on your own vocabulary. Things like coke versus soda. It’s a fine example of how we’re often talking about the same thing but say or express it differently. Speaking American is the book version of the dialect quiz results.

It’s a fun coffee/kitchen table book to flip through casually. It’s not just a book maps. It’s a highlight of the interesting bits and provides some short explanations for why the differences exist. I’ve been enjoying bits and pieces on the occasion my son takes an unreasonable amount of time to finish his dinner.

Get it on Amazon.

Tags: , ,

It’s All Greek (or Chinese or Spanish or…) to Me

Greek-to-me-3

In English, there's an idiom that notes confusion: "It's all Greek to me." Other languages have similar sayings, but they don't use Greek as their point of confusion, and of course — there's a Wikipedia page for that. Mark Liberman graphed the relationships several years ago, but the table on Wikipedia references more languages now. So I messed around with it a bit.

"Chinese" is the leading point of confusion, then Spanish and Greek, and then you just move out from there. Languages with lighter border and towards the edges don't have any other languages that point to them.

Obviously the Wikipedia page isn't comprehensive, but hey, it was fun to poke at.

Tags:

Translating images to words

Images into words

With Google's image search, the results kind of exist in isolation. There isn't a ton of context until you click through to see how an image is placed among words. So, researchers at Google are trying an approach similar to how they translate languages to automatically create captions for the images.

Now Oriol Vinyals and pals at Google are using a similar approach to translate images into words. Their technique is to use a neural network to study a dataset of 100,000 images and their captions and so learn how to classify the content of images.

But instead of producing a set of words that describe the image, their algorithm produces a vector that represents the relationship between the words. This vector can then be plugged into Google’s existing translation algorithm to produce a caption in English, or indeed in any other language. In effect, Google’s machine learning approach has learnt to “translate” images into words.

Tags: , ,

English versus Chinese color descriptors

Color study

Color exists on a continuous spectrum, but we bin them with names and descriptions that reflect perception and sometimes culture. We saw this with gender a while back. Wikipedia has a short description on culture differences and color naming.

Muyueh Lee looked at this binning through the lens of English versus Chinese color naming. More specifically, he looked at Chinese color names on Wikipedia and compared them against English color names. This comes with its own sampling biases because of higher Wikipedia usage for English speakers, but when you divide by color categories, it's a different story.

Full scrolling explainer here. Fun.

Tags: , ,