Colexification

If you want to annoy a linguist, then there are three easy ways to do so: ask them how many languages they speak; ask them for their opinion regarding the German spelling reform; or ask them whether it is true that the Eskimo language has 50 words for snow. What those three questions have in common is that they all touch upon some big issues in linguistics, which are so big that they give us a headache when being reminded of them.

For the first question, asking about a linguist's linguistic talent touches upon the conviction of quite a few linguists that in order to practice linguistics, one does not need to study many languages. One language is usually enough; and even if that language is only English, this may also be sufficient (at least according to some fanatics who practice syntax). To put it in different words: knowing only one language does not prevent a linguist from making claims about the evolution of whole language families. Knowing how to describe a language, or how to compare several languages, does not necessarily require anyone to be able to speak them. After all, mathematicians also pride themselves on not being able to calculate.

The second question, regarding the German spelling reform, marks the last time when German linguists failed royally in proving the importance of their studies to the broader public. The problem was that the German spelling reform, the first after some 100 years of linguistic peace, was mostly done without any linguistic input. Those who commented on it were, instead, novelists, poets and journalists, usually a bit older in age, who felt that the reform was proposed mainly in order to annoy them personally. At the same time, and this was maybe no coincidence, more and more institutes for comparative linguistics disappeared from German universities. The reason was again that the field had not succeeded in explaining its importance to the public. However, historical language comparison can, indeed, be important when discussing the reform of a writing system that is being used by millions of people, specifically also because the investigation of historically evolving linguistic systems is one of the specialties of historical-comparative linguistics. This was completely ignored by then.

The last question concerns the almost ancient debate about the hypothesis commonly known attributed to Edward Sapir (1884-1939) and Benjamin Lee Whorf (1897-1941). This says, in its strong form (Whorf 1950), that speaking influences thinking to such an extent that we might, for example, develop a different kind of Relativity Theory in physics if we started to practice our science in languages different from English, French, and German. Given that Eskimo languages are said to have some 50 different words for snow (as people keep repeating), it should be clear enough that those speaking an Eskimo language must think completely differently from those who start to forget what snow is after all.

The latter concept leads to an interesting use of networks, which I will discuss here.

Words versus deeds

The hypothesis by Sapir and Whorf annoys many linguists (including myself), because it has been long since disproved, at least in its strong, naive form. It was disproved by linguistic data, not by arguments; and the data were the data used by Whorf in order to prove his point in a first instance. However, although there is little evidence for the hypothesis in its strong form, people keep repeating it, especially in non-linguistic circles, where it is often instrumentalized.

Whether we can find evidence for a weak form of the hypothesis — which would say that we can find some influence of speaking on thinking — is another question; which is, however, difficult to answer. It may well be possible that our thoughts are channeled to some degree by the material we use in order to express them. When distinguishing color shades, for example, such as light blue and dark blue, by distinct words, such as goluboj and sin'ij in Russian or celeste and azul in Spanish, it may be that we develop different thoughts when somebody talks about blue cheese, which is called dark blue cheese in Spanish (queso azul).

But this does not mean that somebody who speaks English would never know that there is some difference between light and dark blue, just because the language does not primarily make the distinction between the two color tones. It is possible that the stricter distinction in Russian and Spanish triggers an increased attention among speakers, but we do not know how large the underlying effect is in the end, and how many people would be affected by it.

Particular languages are thus neither a template nor a mirror of human thinking — they do not necessarily channel our thoughts, and may only provide small hints as to how we perceive things around us. For example, if a language expresses different concepts, such as "arm" and "hand" with the same word, this may be a hint that "arm" and "hand" are not that different from each other, or that they belong together functionally in some sense, which is why we may perceive them as a unit. This is the case in Russian, where we find only one expression ruka for both concepts. In daily conversations, this works pretty well, and there are rarely any situations where Russian speakers would not understand each other due to ambiguities, since most of the time the context in which people speak disambiguates all they want to express well enough.

Colexification network with the central concept "MIND" and the geographical distribution of languages colexifying "MIND" and "BRAIN"

These colexifications, as we now call the phenomenon (François2008), occur frequently in the languages of the world. This is due to the polysemy of many of the words we use, since no single word denotes only one concept alone, but often denote several similar concepts at the same time. On the other hand, we encounter identical word forms in the same language which express completely different things, resulting from coincidental processes by which originally different pronunciations came to sound alike (called convergence, in biology). Those colexifications that are not coincidental but result from polysemy are the most interesting ones for linguists, not least because the words are related by network graphs not trees (as shown above). When assembled in large enough numbers, across a sufficiently large sample of languages, they may allow us some interesting insights into human cognition.

The procedure to mine these insights from cross-linguistic data has already been discussed in a previous blog, from 2018. The main idea is to collect colexifications for as many concepts and languages and possible, in order to construct a colexification network, in which each concept is represented by a node, and weighted links between the nodes represent how often each colexification between the linked concepts occurs; that is, they represent how often we find a language that expresses the two linked concepts with the same word.

Having proposed a first update of our Database of Cross-Linguistic Colexifications (CLICS) back in 2018, we have now been able to further increase the data. With this third installment of the database, we could double the number of language varieties, from 1,200 to 2,400. In addition, we could enhance the workflows that we use to aggregate data from different sources, in a rigorously reproducible way (Rzymski et al. 2020).

Current work

Even more interesting than these data, however, is a study initiated by colleagues from psychology from the University of North Carolina, which was recently published, after more than two years of intensive collaboration (Jackson et al. 2019). In this study, the colexifications for emotion concepts, such as "love", "pity", "surprise", and "fear", were assembled and the resulting networks were statistically compared across different language families. The surprising result was that the structures of the networks differed quite considerably from each other (an effect that we could not find for color concepts derived from the same data). Some language families, for example, tend to colexify "surprise" and "fear (fright)" (see our subgraph for "surprised"), while others colexified "love" and "pity" (see the subgraph for "pity").

Not all aspects of the network structures were different. An additional analysis involving informants showed that especially the criterion of valency (that is, if something is perceived as negative or positive) played an important role for the structure of the networks; and similar effects could be found for the degree of arousal.

These results show that the way in which we express emotion concepts in our languages is, on the one hand, strongly influenced by cultural factors, while on the other hand there are some cognitive aspects that seem to be reflected similarly across all languages.

What we cannot conclude from the results, however, is, that those, who speak languages in which "pity" and "love" are represented by the same word, will not know the difference between the two emotions. Here again, it is important to emphasize, what I mentioned above with respect to color terms: if a particular distinction is not present in a given language, this it does not mean that the speakers do not know the difference.

It may be tempting to dig out the old hypothesis of Sapir and Whorf in the context of the study on emotions; but the results do not, by any means, provide evidence that our thinking is directly shaped and restricted by the languages we speak. Many factors influence how we think. Language is one aspect among many others. Instead of focusing too much on the question as to which languages we speak, we may want to focus on how we speak the language in which we want to express our thoughts.

References

François, Alexandre (2008) Semantic maps and the typology of colexification: intertwining polysemous networks across languages. In: Vanhove, Martine (ed.): From polysemy to semantic change. Amsterdam: Benjamins, pp. 163-215.

Joshua Conrad Jackson, Joseph Watts, Teague R. Henry, Johann-Mattis List, Peter J. Mucha, Robert Forkel, Simon J. Greenhill and Kristen Lindquist (2019) Emotion semantics show both cultural variation and universal structure. Science 366.6472: 1517-1522. DOI: 10.1126/science.aaw8160

Rzymski, Christoph, Tiago Tresoldi, Simon Greenhill, Mei-Shin Wu, Nathanael E. Schweikhard, Maria Koptjevskaja-Tamm, Volker Gast, Timotheus A. Bodt, Abbie Hantgan, Gereon A. Kaiping, Sophie Chang, Yunfan Lai, Natalia Morozova, Heini Arjava, Nataliia Hübler, Ezequiel Koile, Steve Pepper, Mariann Proos, Briana Van Epps, Ingrid Blanco, Carolin Hundt, Sergei Monakhov, Kristina Pianykh, Sallona Ramesh, Russell D. Gray, Robert Forkel and Johann-Mattis List (2020): The Database of Cross-Linguistic Colexifications, reproducible analysis of cross- linguistic polysemies. Scientific Data 7.13: 1-12. DOI: 10.1038/s41597-019-0341-x

Benjamin Lee Whorf (1950) An American Indian Model of the Universe. International Journal of American Linguistics 16.2: 67-72.

Reader

Category Archives: Colexification

From words to deeds?

Meta