Questionable science diagrams

Sometimes illustrating scientific findings is a challenge. Sometimes the illustrations are published anyways, because there are no more options. Sometimes those illustrations end up on a Twitter feed called Science Diagrams that Look Like Shitposts.

Tags: ,

Improving the Display of Type Material in the NCBI TaxBrowser

Have you ever been confused by multiple taxonomic names for a single organism? You’re not alone! It’s one of the challenges in maintaining any biological database. Recently we updated the NCBI TaxBrowser to assist with this. Let’s start with a … Continue reading

Duplicated study of apologizers leads to a retraction — and an apology

The Journal of Consumer Research has retracted a 2019 paper because it overlapped significantly with a study previously published in Chinese by the same authors. But whether both authors agreed to the previous submission is a subject of some confusion on the part of one of them. The journal, published by Oxford Academic, added “RETRACTED” … Continue reading Duplicated study of apologizers leads to a retraction — and an apology

Posted by in Uncategorized


UST Bets on TELL-Seq

I've made a few references recently to TELL-Seq, both in my flawed analysis of BioNano Genomics (I missed a key business development in their raising $18M in October; I stand by the science comments and fear that the fund raise buys them about a year of time) and on 10X Genomics discontinuing their genome assay kits.  Now to actually dig into that technology -- a bit late given the preprint came out last fall, but better late than never.  So put on your sunglasses and hoodies, conjure up the image of early television chefs and key up the theme music for The Lone Ranger, because here I go.
Read more »

Posted by in Uncategorized


From words to deeds?

If you want to annoy a linguist, then there are three easy ways to do so: ask them, how many languages they speak; ask them for their opinion regarding the German spelling reform; or ask them whether it is true that the Eskimo language has 50 words for snow. What those three questions have in common is that they all touch upon some big issues in linguistics, which are so big that they give us a headache when being reminded of them.

For the first question, asking about a linguist's linguistic talent touches upon the conviction of quite a few linguists that in order to practice linguistics, one does not need to study many languages. One language is usually enough; and even if that language is only English, this may also be sufficient (at least according to some fanatics who practice syntax). To put it in different words: knowing only one language does not prevent a linguist from making claims about the evolution of whole language families. Knowing how to describe a language, or how to compare several languages, does not necessarily require anyone to be able to speak them. After all, mathematicians also pride themselves on not being able to calculate.

The second question, regarding the German spelling reform, marks the last time when German linguists failed royally in proving the importance of their studies to the broader public. The problem was that the German spelling reform, the first after some 100 years of linguistic peace, was mostly done without any linguistic input. Those who commented on it were, instead, novelists, poets and journalists, usually a bit older in age, who felt that the reform was proposed mainly in order to annoy them personally. At the same time, and this was maybe no coincidence, more and more institutes for comparative linguistics disappeared from German universities. The reason was again that the field had not succeeded in explaining its importance to the public. However, historical language comparison can, indeed, be important when discussing the reform of a writing system that is being used by millions of people, specifically also because the investigation of historically evolving linguistic systems is one of the specialties of historical-comparative linguistics. This was completely ignored by then.

The last question concerns the almost ancient debate about the hypothesis commonly known attributed to Edward Sapir (1884-1939) and Benjamin Lee Whorf (1897-1941). This says, in its strong form (Whorf 1950), that speaking influences thinking to such an extent that we might, for example, develop a different kind of Relativity Theory in physics if we started to practice our science in languages different from English, French, and German. Given that Eskimo languages are said to have some 50 different words for snow (as people keep repeating), it should be clear enough that those speaking an Eskimo language must think completely differently from those who start to forget what snow is after all.

The latter concept leads to an interesting use of networks, which I will discuss here.

Words versus deeds

The hypothesis by Sapir and Whorf annoys many linguists (including myself), because it has been long since disproved, at least in its strong, naive form. It was disproved by linguistic data, not by arguments; and the data were the data used by Whorf in order to prove his point in a first instance. However, although there is little evidence for the hypothesis in its strong form, people keep repeating it, especially in non-linguistic circles, where it is often instrumentalized.

Whether we can find evidence for a weak form of the hypothesis — which would say that we can find some influence of speaking on thinking — is another question; which is, however, difficult to answer. It may well be possible that our thoughts are channeled to some degree by the material we use in order to express them. When distinguishing color shades, for example, such as light blue and dark blue, by distinct words, such as goluboj and sin'ij in Russian or celeste and azul in Spanish, it may be that we develop different thoughts when somebody talks about blue cheese, which is called dark blue cheese in Spanish (queso azul).

But this does not mean that somebody who speaks English would never know that there is some difference between light and dark blue, just because the language does not primarily make the distinction between the two color tones. It is possible that the stricter distinction in Russian and Spanish triggers an increased attention among speakers, but we do not know how large the underlying effect is in the end, and how many people would be affected by it.

Particular languages are thus neither a template nor a mirror of human thinking — they do not necessarily channel our thoughts, and may only provide small hints as to how we perceive things around us. For example, if a language expresses different concepts, such as "arm" and "hand" with the same word, this may be a hint that "arm" and "hand" are not that different from each other, or that they belong together functionally in some sense, which is why we may perceive them as a unit. This is the case in Russian, where we find only one expression ruka for both concepts. In daily conversations, this works pretty well, and there are rarely any situations where Russian speakers would not understand each other due to ambiguities, since most of the time the context in which people speak disambiguates all they want to express well enough.

Colexification network with the central concept "MIND" and the geographical distribution of languages colexifying "MIND" and "BRAIN"

These colexifications, as we now call the phenomenon (François2008), occur frequently in the languages of the world. This is due to the polysemy of many of the words we use, since no single word denotes only one concept alone, but often denote several similar concepts at the same time. On the other hand, we encounter identical word forms in the same language which express completely different things, resulting from coincidental processes by which originally different pronunciations came to sound alike (called convergence, in biology). Those colexifications that are not coincidental but result from polysemy are the most interesting ones for linguists, not least because the words are related by network graphs not trees (as shown above). When assembled in large enough numbers, across a sufficiently large sample of languages, they may allow us some interesting insights into human cognition.

The procedure to mine these insights from cross-linguistic data has already been discussed in a previous blog, from 2018. The main idea is to collect colexifications for as many concepts and languages and possible, in order to construct a colexification network, in which each concept is represented by a node, and weighted links between the nodes represent how often each colexification between the linked concepts occurs; that is, they represent how often we find a language that expresses the two linked concepts with the same word.

Having proposed a first update of our Database of Cross-Linguistic Colexifications (CLICS) back in 2018, we have now been able to further increase the data. With this third installment of the database, we could double the number of language varieties, from 1,200 to 2,400. In addition, we could enhance the workflows that we use to aggregate data from different sources, in a rigorously reproducible way (Rzymski et al. 2020).

Current work

Even more interesting than these data, however, is a study initiated by colleagues from psychology from the University of North Carolina, which was recently published, after more than two years of intensive collaboration (Jackson et al. 2019). In this study, the colexifications for emotion concepts, such as "love", "pity", "surprise", and "fear", were assembled and the resulting networks were statistically compared across different language families. The surprising result was that the structures of the networks differed quite considerably from each other (an effect that we could not find for color concepts derived from the same data). Some language families, for example, tend to colexify "surprise" and "fear (fright)" (see our subgraph for "surprised"), while others colexified "love" and "pity" (see the subgraph for "pity").

Not all aspects of the network structures were different. An additional analysis involving informants showed that especially the criterion of valency (that is, if something is perceived as negative or positive) played an important role for the structure of the networks; and similar effects could be found for the degree of arousal.

These results show that the way in which we express emotion concepts in our languages is, on the one hand, strongly influenced by cultural factors, while on the other hand there are some cognitive aspects that seem to be reflected similarly across all languages.

What we cannot conclude from the results, however, is, that those, who speak languages in which "pity" and "love" are represented by the same word, will not know the difference between the two emotions. Here again, it is important to emphasize, what I mentioned above with respect to color terms: if a particular distinction is not present in a given language, this it does not mean that the speakers do not know the difference.

It may be tempting to dig out the old hypothesis of Sapir and Whorf in the context of the study on emotions; but the results do not, by any means, provide evidence that our thinking is directly shaped and restricted by the languages we speak. Many factors influence how we think. Language is one aspect among many others. Instead of focusing too much on the question as to which languages we speak, we may want to focus on how we speak the language in which we want to express our thoughts.


François, Alexandre (2008) Semantic maps and the typology of colexification: intertwining polysemous networks across languages. In: Vanhove, Martine (ed.): From polysemy to semantic change. Amsterdam: Benjamins, pp. 163-215.

Joshua Conrad Jackson, Joseph Watts, Teague R. Henry, Johann-Mattis List, Peter J. Mucha, Robert Forkel, Simon J. Greenhill and Kristen Lindquist (2019) Emotion semantics show both cultural variation and universal structure. Science 366.6472: 1517-1522.

Rzymski, Christoph, Tiago Tresoldi, Simon Greenhill, Mei-Shin Wu, Nathanael E. Schweikhard, Maria Koptjevskaja-Tamm, Volker Gast, Timotheus A. Bodt, Abbie Hantgan, Gereon A. Kaiping, Sophie Chang, Yunfan Lai, Natalia Morozova, Heini Arjava, Nataliia Hübler, Ezequiel Koile, Steve Pepper, Mariann Proos, Briana Van Epps, Ingrid Blanco, Carolin Hundt, Sergei Monakhov, Kristina Pianykh, Sallona Ramesh, Russell D. Gray, Robert Forkel and Johann-Mattis List (2020): The Database of Cross-Linguistic Colexifications, reproducible analysis of cross- linguistic polysemies. Scientific Data 7.13: 1-12. DOI: 10.1038/s41597-019-0341-x

Benjamin Lee Whorf (1950) An American Indian Model of the Universe. International Journal of American Linguistics 16.2: 67-72. DOI: 10.1126/science.aaw8160

Weekend reads: Texas A&M vs. Harvard; scientific publishers a “threatened species”; six researchers with “greed and a disregard” for rules

Before we present this week’s Weekend Reads, a question: Do you enjoy our weekly roundup? If so, we could really use your help. Would you consider a tax-deductible donation to support Weekend Reads, and our daily work? Thanks in advance. The week at Retraction Watch featured: A researcher starting 2020 off with a forthright retraction; A … Continue reading Weekend reads: Texas A&M vs. Harvard; scientific publishers a “threatened species”; six researchers with “greed and a disregard” for rules

For Hottest Planet, a Major Meltdown, Study Shows

Artist's rendering of a

In the scorching atmosphere of exoplanet KELT-9b, even molecules are torn to shreds.

Responding to the novel coronavirus (2019-nCoV) emerging in Wuhan, China

Map of China highlighting Wuhan City where a novel coronavirus has emerged

By Scott J. Becker, executive director, APHL

As news spreads of the 2019 novel coronavirus (2019-nCoV) emerging in Wuhan, China, we at APHL are taking this threat seriously while also remaining calm and confident that our public health system is prepared. APHL has activated our incident command structure (ICS) to support our members and partners during the response.

Despite being a new respiratory virus strain, there is a familiarity that is reassuring to many of us in public health but can be unsettling to others. This new outbreak resembles SARS, MERS, H5N1 bird flu and other emerging respiratory diseases from the past. However, illness does not appear to be as severe as those previous viruses although our understanding of 2019-nCoV is still developing.

While there is a lot we don’t know about 2019-nCoV, this is what we do know about the outbreak response to prevent its spread:

  • As the first 2019-nCoV patient was identified in the United States, our public health system worked. Efforts to disseminate information to the public and to health care providers led to the patient self-identifying and allowed his providers to quickly initiate screening, isolation and eventual diagnosis. The specimen was immediately sent to CDC for rapid testing and results were promptly reported.
  • Public health laboratories are ready to process and ship specimens to CDC whose laboratory is currently the only one able to perform diagnostic testing in the US. CDC is working hard to develop and qualify a test that public health laboratories can use. Performing testing close to where the patient is being treated is ideal, but developing an effective test requires strong science and that takes time. We expect this new test to be ready for public health lab use in the coming weeks. CDC is already working closely with FDA to get an emergency use authorization (EUA) to deploy the test across the country in the event a US public health emergency is declared. (An EUA cannot be given until the US Secretary of Health and Human Services declares a public health emergency.)
  • For all of the critical players in our public health system – public health laboratory scientists, epidemiologists, CDC, FDA, health care providers and others – this is all in a day’s work. Frequent preparedness training and routine outbreak responses ensure that when a new disease emerges, the public health system is ready.

An outbreak of a new virus like 2019-nCoV can sometimes stir up panic and fear. We understand why some feel that way, but we are also confident that the public health system is working to stop this virus just as it has done with many others. We hope that our confidence in their expertise and abilities is reassuring for you. It is not time to panic – it is time to wash those hands, catch your coughs and continue to be vigilant during this cold and flu season.

We will continue to update this post with more information as it becomes available.


What is an Emerging Infectious Disease?

The post Responding to the novel coronavirus (2019-nCoV) emerging in Wuhan, China appeared first on APHL Lab Blog.

December 2019 RefSeq annotations: human, Tasmanian devil and more

In December, the NCBI Eukaryotic Genome Annotation Pipeline released new annotations in RefSeq for the following organisms: Anarrhichthys ocellatus (wolf-eel) Apis florea (little honeybee) Contarinia nasturtii (swede midge) Cucumis sativus (cucumber) Galleria mellonella (greater wax moth) Homo sapiens (human) Nasonia … Continue reading