Analysis of compound curse words used on Reddit

As you know, Reddit is typically a sophisticated place of kind and pleasant conversation. So Colin Morris analyzed the usage of compound pejoratives in Reddit comments:

The full “matrix” of combinations is surprisingly dense. Of the ~4,800 possible compounds, more than half occurred in at least one comment. The most frequent compound, dumbass, appears in 3.6 million comments, but there’s also a long tail of many rare terms, including 444 hapax legomena (terms which appear only once in the dataset), such as pukebird, fartrag, sleazenozzle, and bastardbucket.

Stay classy.

Tags: , , ,

Different languages, but similar information rates

Christophe Coupé and company analyzed speech rate (on the left) across different languages, and then compared it to information rate (on the right) in bits per second. While speech rate and information rate are still coupled, there’s less variation in information rate across languages. More syllables doesn’t necessarily mean more information.

Tags: , ,

Isotype, a picture language

Jason Forrest delves into the history of a single Isotype and a bit of the general background on the picture language:

Isotype is a highly refined picture language designed for educating people with as few words as possible. Created by Otto Neurath in 1925, the International System of Typographic Picture Education (ISOTyPE) evolved over the next two decades with the collaboration of Marie Neurath and Gerd Arntz. The trio developed their distinct approach to data visualization iteratively, and very collaboratively. Otto provided the overall direction, Marie “transformed” the data to present the story, and Gerd designed the pictogram units and highly-refined designs.

Tags: ,

Why some Asian accents swap Ls and Rs

Vox delves into why Ls and Rs often get replaced by Asian speakers using English as a second language. Some sounds aren’t prevalent in other languages, and it’s not the same across all Asian languages.

Tags: , ,

Changing size analogies and the trends of everyday things

When you try to describe the size of something but don’t have an exact measurement, you probably compare it to an everyday object that others can relate to. Using the Google Books Ngram dataset, Colin Morris looked for how such comparisons changed over the past few centuries.

I especially like the bits of history to explain why some words fell into and out of fashion.

Tags: , ,

Dialect book of maps

Speaking American Book

In 2013, Josh Katz put together a dialect quiz that showed where people talk like you, based on your own vocabulary. Things like coke versus soda. It’s a fine example of how we’re often talking about the same thing but say or express it differently. Speaking American is the book version of the dialect quiz results.

It’s a fun coffee/kitchen table book to flip through casually. It’s not just a book maps. It’s a highlight of the interesting bits and provides some short explanations for why the differences exist. I’ve been enjoying bits and pieces on the occasion my son takes an unreasonable amount of time to finish his dinner.

Get it on Amazon.

Tags: , ,

It’s All Greek (or Chinese or Spanish or…) to Me


In English, there's an idiom that notes confusion: "It's all Greek to me." Other languages have similar sayings, but they don't use Greek as their point of confusion, and of course — there's a Wikipedia page for that. Mark Liberman graphed the relationships several years ago, but the table on Wikipedia references more languages now. So I messed around with it a bit.

"Chinese" is the leading point of confusion, then Spanish and Greek, and then you just move out from there. Languages with lighter border and towards the edges don't have any other languages that point to them.

Obviously the Wikipedia page isn't comprehensive, but hey, it was fun to poke at.


Translating images to words

Images into words

With Google's image search, the results kind of exist in isolation. There isn't a ton of context until you click through to see how an image is placed among words. So, researchers at Google are trying an approach similar to how they translate languages to automatically create captions for the images.

Now Oriol Vinyals and pals at Google are using a similar approach to translate images into words. Their technique is to use a neural network to study a dataset of 100,000 images and their captions and so learn how to classify the content of images.

But instead of producing a set of words that describe the image, their algorithm produces a vector that represents the relationship between the words. This vector can then be plugged into Google’s existing translation algorithm to produce a caption in English, or indeed in any other language. In effect, Google’s machine learning approach has learnt to “translate” images into words.

Tags: , ,

You say “Hippopotamuses”, I say “Hippipotamus”

Apparently, a herd of hippos derived from animals kept by deceased drug cartel lord Pablo Escobar have been running amok in Colombia for something like two decades1. Unfortunately, I could not find any references to extinct South American members of the Hippopotamidae family. So, this cannot be considered an accidental experiment2 in rewilding.

The multiple articles that have sprung up (no reputable news organization could ignore this story) have heightened the focus on a key question of grammar. What is the plural of hippopotamus. In terms of authority, we have disagreement, with the Oxford University Press voting for hippopotamuses, “The Smartest Man in the World” comedian Greg Proops arguing on behalf of  hippopotami, and would-be Internet language scholars suggesting hippopotamoi from the Greek.

What should the plural of hippopotamus be?

On this there can be troubling debate. The argument for hippopotamuses rests heavily on how the average person (ie, unrepentant philistine) likes to pronounce words:

It may also depend on whether the Latin or Greek form of the plural is either easily recognizable or pleasant to the speaker of English…the usual plural is hippopotamuses. – Oxford Dictionaries

The advocacy for hippopotami is based on the bastardization of Greek words into something meant to look like Latin – a practice popular among the Romans themselves. To concede this corruption would be to also concede that octopodes is not innately superior to octopi.

Because hippopotamus is derived from Greek, second declension nouns, a reasonable suggestion would be to simply apply the nominative plural ending for Greek, second declension noun, which would give us hippopotamoi.

But, the word hippopotamus is a conjoining of the words hippos3 (horse) and potamos (river), based on “river horse” to describe the animal and that potamippo is a name only a pharmaceutical company could love. We don’t want to say “rivers horse” when talking about multiple hippos. We want to say “river horses4“, which would by hippipotamus (a suggestion already made by other amateur pedants)

I kind of like that. Try it for yourself. A herd of hippipotamus.

You know what? I really like that. So let it be written. So let it be done.

1. This would seem to be a testament to the bad assery of hippos. If you abandoned me at a drug kingpin’s palace in Colombia, I probably wouldn’t last 20 days.
2. The first step of intentional, experimental rarely requires the investigator in possession of the experimental subjects to be gunned down in a rooftop gun battle with the Colombian National Police. Getting Institutional Review Board (IRB) approval for such things is a nightmare.
3. Coincidentally, the Greek singular is hippos, which is the same as the English plural for the shortened form of hippopotamus. One could argue that we should be saying hippoi, which I am up for, if you go first.
4. Or “river horsies”, if you happen to be the responsible party for a 4 and a 6 year-old second instar larval human.

Filed under: Curiosities of Nature, Follies of the Human Condition Tagged: hippos, language, Pablo Escobar

English versus Chinese color descriptors

Color study

Color exists on a continuous spectrum, but we bin them with names and descriptions that reflect perception and sometimes culture. We saw this with gender a while back. Wikipedia has a short description on culture differences and color naming.

Muyueh Lee looked at this binning through the lens of English versus Chinese color naming. More specifically, he looked at Chinese color names on Wikipedia and compared them against English color names. This comes with its own sampling biases because of higher Wikipedia usage for English speakers, but when you divide by color categories, it's a different story.

Full scrolling explainer here. Fun.

Tags: , ,