Inclusion of “personal correspondence” in evolution paper prompts retraction, new journal policy

Hearsay is not admissible as evidence in court — and it doesn’t seem to go very far in science, either. A pair of researchers in the field of human evolution have lost a paper which contained data from “personal correspondence” that the providing party apparently did not enjoy seeing in print. The article, “Early hominin […]

The post Inclusion of “personal correspondence” in evolution paper prompts retraction, new journal policy appeared first on Retraction Watch.

On stemmatics and phylogenetic methods

No se publica un libro sin alguna divergencia entre cada uno de los ejemplares. Los escribas prestan juramento secreto de omitir, de interpolar, de variar. [No book is published without some divergence between each of the copies. Scribes take a secret oath to omit, to interpolate, to change.] (Jorge Luis Borges, La lotería en Babilonia, in Ficciones, 1962)
This is the first on series of posts on stemmatics, a field just as much in love with trees and networks as are phylogenetics and historical linguistics. Being an introduction, I explain what the field does, present the most important jargon, and offer a list references that, while suitable for the audience of this blog, is denser than what one might expect for a blog post.

Thank you to Mattis and David for inviting me to write!

Textual criticism

Textual criticism (or, less precisely, "philology") is a discipline concerned with the investigation of the history of literary, legal, and religious texts for explaining how differences among the copies of a text (its "witnesses") arose, and with the production of "critical editions", either scholarly curated versions of a text that aim to reconstruct the lost original or corrected versions of an existing copy.

The problem of divergence between copies of text, with the accumulation of involuntary and deliberate errors, as well as the need for a systematic study of such differences, is as old as writing itself. For example, our current editions for the epic poems of Homer descend from Ancient philological attempts to restore an uncontaminated original (see the first two figures). These include the edition of Pisistratus (VI century BCE, which determined what was to be sung at the Panathenaic Games), and the so-called VMK (Viermännerkommentar, "commentary of the four men") of the Alexandrian School (I-II century BCE), which is generally assumed to be the root of the witnesses that we have.

Van der Valk's reconstruction of the sources for Venetus A, one of the most
important manuscripts of Homer's Iliad (source: Wikipedia).

Erbse's reconstruction of the sources for Venetus A, one of the most important
manuscripts of the Iliad (source: Wikipedia).

Before stemmatics, an edition could either be based on a "good copy" (a version considered to be less contaminated or more faithful than others), in a "majority reading" (in which the most attested variant would be chosen), or in a principle of "eclecticism" (with each best reading individually selected by the editor's judgment). Each new version, as expected, contributed even more to the confusion, particularly when changes were voluntary.

Among the texts with long and complex traditions, objects of countless and sometimes bloody disputations on the "correct" readings, are the Bible and codes of laws, for which it was not uncommon to have a different version in each city, with predictable consequences. For example, the first published textual tree, as already covered in this blog (The first Darwinian evolutionary tree), was authored by Carl Johan Schlyter in 1827 in a study precisely on the multiple and conflicting copies of Swedish law.

As such, it is no surprise that objective approaches were soon developed (Homer's VMK edition being one of the first examples), culminating with the development of stemmatics, with its study of the genealogical relationship between witnesses, and its representation of such relationships by means of trees.

Stemmatics

As a scientific approach to textual criticism, stemmatics established itself from the beginnings of 19th century as an alternative to emendations based in the opinions and wishes of editors, possibly inspiring both Charles Darwin and August Schleicher (for a general discussion on the development and significance of this method, see Timpanaro 2005). However, more than a "source", we should consider it a branch equally stemming from the "cultural framework" (Macé and Baret 2006: 91) that also gave us Darwinism and historical linguistics.

As was true for these latter disciplines, stemmatics was at first opposed, because of the revolution it brought to its field, along with its genealogical trees. However, just as in these sister disciplines, the results of the new mindset introduced by the explanation of evolution with trees could not be ignored, and this approach is so central to textual criticism that the latter can be divided into periods before and after the work of Karl Lachmann, the "father" of stemmatics, in particular the publication of his edition of Lucretius' De rerum natura (1850). In his commentaries, besides demonstrating the number of lines per page in the lost manuscript at the root of the tradition, Lachmann was even able to demonstrate the kind of script used to write it (Lachmanni 1850).

The work he chose, with the importance of Lucretius in the development of the scientific mindset (and, as we should remember when dealing with cultural evolution, of Darwin's theories), is unlikely to be casual, but this is a matter for a different blog post.

Trees

Genealogical trees are so central to the stemmatic method that the field itself is actually named after them. The main goal of an editor is to produce a stemma codicum ("family tree of manuscripts"), or simply stemma, a tree-like structure that supports the textual emendation and represents the "tradition" (the witnesses' genealogy), in analogy with the family trees of Roman families that figured in many texts reviewed by 19th century philologists. Stemma, in fact, is a Greek word meaning garland or wreath, that was incorporated in Imperial Latin to designate a family tree (and, figuratively, nobility itself), as family trees were drawn with a stemma at their top.

In short, stemmatics begins with a recensio, which is an investigation of all total and partial copies of a work. This review is followed by a collatio, a systematic scrutiny of the manuscripts' contents, when readings are aligned and compared. The results of this alignment are used to produce the stemma, following the principle that "community of errors implies community of origin". By analyzing the stemma and the errors, editors finally proceed to the emendatio, which is a reconstruction that explains the known variants, and is intended to represent the "archetype" (a lost witness at the root of the ramification, assumed to be closer to the original than any other copy).

A stemma is conventionally drawn top-to-bottom, with vertical placements roughly indicating the date of the manuscript (the higher, the older). Solid edges ("arrows") indicate descent, while dashed ones imply contamination (scribes using more than one source). Witnesses are usually labeled with abbreviated names or Latin letters, when the manuscript is available, or with Greek letters, when it is missing (with α usually reserved for the archetype and ω for the original). Below is a reproduction of Petrocchi's partial stemma for the tradition of Dante Alighieri's Divine Comedy, which I will cover in a future post. Note that the genealogy is actually a reticulating network rather than a simple tree.

Petrocchi's partial stemma for the Divine Comedy, presented in the
introduction to his critical edition (1965).

The example stemma offered by Maas (1958), adapted below, is still useful to demonstrate the principles of stemmatics. In this example, for a textual emendation manuscript H should be eliminated (as it descends from F), as well as I and J (copies of G). Manuscript C shows a contamination from its collateral D, something which should be considered when weighting errors. Sub-archetypes β and γ are to be inferred from the available witnesses of their branches, and their readings will have the same weight as K, the only member of the third family branching from the archetype (even though it is a recent manuscript), in establishing the "lesson" of α. Errors might be presumed in α itself, or even in the original ω, and in both cases a corrected "lesson" might be offered by the editor after internal and external evidences.

Exemplary stemma adapted from Maas (1958).

Adoption and practice

Stemmatics has been criticized and confronted since Lachmann's time. It requires very specialized knowledge, for example in distinguishing between monogenetic and polygenetic errors, i.e. those that arose once and those that emerged independently more than once (and that, as such, are not disjunctive). A number of its suppositions are routinely called into question, such as the idea that each copy always derives from a single source (accepting contamination, at most), that each copy has at least the same number of errors of its source, and, fundamentally, that traditions have one and only one archetype.

Many measures tend to be adopted to reduce the editorial effort. These include eliminating manuscripts considered to be descripti (i.e. proved to descend from a preserved witness, in theory sharing all the errors of their sources), and only performing the collatio in a set of critical passages (loci critici). While a complete stemma and a full collatio are desirable, such compromises might be unavoidable for long texts with ample traditions. For example, in the case of Dante Alighieri's Divine Comedy, after considering the time employed by scholars such as Petrocchi, Sanguineti, and Shaw for their editions, Trovato (2016) estimated the length of a full stemmatic approach in 400 man-years.

An alternative to stemmatic methods and suppositions, which also reduces the editorial effort, is found in scholars who follow the work of Joseph Bédier, who successfully challenged the limits of stemmatics by adopting a renewed version of the method of the "good copy" for his editions of medieval texts. The Bédierian method does not refute a scientific approach or methods such as the recensio, the collatio, or even the production of a stemma, but these are used to support the editor's judgment in selecting and curating a bon manuscript — a good edition of text to be corrected only where errors can be proved beyond reasonable doubt. In short, trees (and networks) have been central to textual criticism even when stemmatics itself, as a method, is being challenged.

Considering the editorial effort and the analogies with linguistics and biology, it is no surprise that digital workflows have been proposed, along with the development of computer resources and phylogenetic methods. Ideas for new approaches were explored by Froger (1969), and formal phylogenetic methods were attempted by Platnick and Cameron (1977). Recently, the number of editions supported by formal phylogenetic methods and software has increased (see, for example, Barbook et al. 1998; Stolz 2003; and Lantin, Baret and Macé 2004), also in the face of scientific evaluations of performance (Roos and Heikkila 2009).

Besides advances in speed and replicability, the new technologies are allowing us to expand the goals of the discipline, moving from electronic editing to computational philology. In fact, while the field has for centuries been defined by the production of critical editions, digital approaches have been shown to support a reduction in the importance of "authorial intention", allowing researchers to focus on the reception of texts by the public, in line with developments of literary theory (Jauss 1982), and with the goals established by the "New Philology" (Cerquiglini 1989). Manuscripts with readings that differ from a supposed original, traditionally described as "corrupted", are changing from copies that were meant to be discarded into data points that collaborate in an investigation of human history that is assisted by quantitative data and methods.

References

Barbrook A.C., Howe C.J., Blake N., Robinson P. (1998) The phylogeny of the Canterbury Tales. Nature 394 (6696): 839.

Cerquiglini B. (1989) Éloge de la variante: histoire critique de la philologie. Aux Travaux. Paris: Éditions du Seuil.

Froget J. (1969) La critique des textes et son automatization. Bulletin De L’Association Guillaume Budé 1(1): 125–129.

Jauss H.-R. (1982) Toward an Aesthetic of Reception. Minneapolis: University of Minnesota Press.

Lachmann C. (1850) De Rerum Natura. Commentarius. Berolini: Imprensis Georgii Reimeri.

Lantin A.-C., Baret P.V., Macé C. (2004) Phylogenetic analysis of Gregory of Nazianzus’ Homily 27. 7èmes Journées Internationales d’Analyse statistique des Données Textuelles, pp. 700-707.

Maas P. (1958). Textual Criticism. Translated by Barbara Flower. Oxford: Oxford University Press.

Macé C.; Baret P.V. (2006) Why phylogenetic methods work: the theory of evolution and textual criticism. Linguistica Computazionale. The Evolution of Texts: Confronting Stemmatological and Genetical Methods 24: 89–108.

Platnick N.I., Cameron H.D. (1977) Cladistic methods in textual, linguistic, and phylogenetic analysis. Systematic Zoology 26: 380–385.

Roos T., Heikkilä T. (2009) Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets. Literary and Linguistic Computing fqp002.

Stolz, M. (2003) New philology and new phylogeny: aspects of a critical electronic edition of Wolfram’s Parzival. Literary and Linguistic Computing 18(2): 139–150.

Timpanaro S. (2005) The Genesis of Lachmann's Method. Translated and edited by G. W. Most. Chicago: University of Chicago Press.

Trovato P. (2016) Metodologia editoriale per la Commedia di Dante Alighieri. Ferrara. See Youtube; date of access: March 19, 2017.

Multimedia phylogeny?


Evolutionary concepts have often been transferred to other fields of study, or derived independently in them, especially in anthropology in the broadest sense, covering all cultural products of the human mind. This includes phylogenetic studies of languages, texts, tales, artifacts, and so on — you will find many examples of such studies in this blog. One of the more recent applications has been to what is sometimes called multimedia phylogeny — the research field that "studies the problem of discovering phylogenetic dependencies in digital media".

I have noted before that phylogenetics in the biological sense is an analogy when applied to other fields, because only in biology is genetic information physically transferred between generations — in the other fields, cultural information transfer is all in the minds of the people, not in their genes (see False analogies between anthropology and biology). This analogy often becomes problematic when applied to other fields, because the practical application of bioinformatics techniques separates the informatics from the bio, and the mathematical analyses focus on trying to implement the informatics without any biological justification.


A recent paper that discusses the application of bioinformatics to multimedia phylogeny exemplifies the potential problems:
Guilherme D Marmerola, Marina A Oikawa, Zanoni Dias, Siome Goldenstein, Anderson Rocha (2017) On the reconstruction of text phylogeny trees: evaluation and analysis of textual relationships. PLoS One 11(12): e0167822.
The authors described their background information thus:
Articles on news portals and collaborative platforms (such as Wikipedia), source code, posts on social networks, and even scientific publications or literary works, are some examples in which textual content can be subject to changes in an evolutionary process. In this scenario, given a set of near-duplicate documents, it is worthwhile to find which one is the original and the history of changes that created the whole set. Such functionality would have immediate applications on news tracking services, detection of plagiarism, textual criticism, and copyright enforcement, for instance.
However, this is not an easy task, as textual features pointing to the documents' evolutionary direction may not be evident and are often dataset dependent. Moreover, side information, such as time stamps, are neither always available nor reliable. In this paper, we propose a framework for reliably reconstructing text phylogeny trees, and seamlessly exploring new approaches on a wide range of scenarios of text reusage. We employ and evaluate distinct combinations of dissimilarity measures and reconstruction strategies within the proposed framework.
So, their solution to the separation of bio from informatics is to try a range of techniques, none of which are based on any particular model of how phylogenetic changes might occur in text documents. All of these methods involve distance-based tree-building.

The essential problem, as I see it, is that without a model of change there is no reliable way to separate phylogenetic information from any other type of information. For example, similarity can arise from many sources, only some of which provide information about phylogenetic history — phylogenetic similarity is a form of "special similarity". In biology, other sources of similarity are usually lumped together as chance similarities, such as convergence, parallelism, etc. Without this basic separation of phylogenetic and chance similarity, it does not matter how many distance measures you use, or how many tree-building methods you employ — if you can't separate phylogeny from chance then you are wasting your time constructing a hypothetical  evolutionary history.

The authors' only saving grace is their claim that: "In text phylogeny, unlike stemmatology [the analysis of hand-written rather than digital texts], the fundamental aim is to find the relationships among near-duplicate text documents through the analysis of their transformations over time." The expectation, then, is that the phylogenetic similarity of the texts will be high, which will thus reduce the possibility of chance similarities. Sadly, it will also reduce the probability that the similarities will contain any phylogenetic information at all — this is the classic short-branches-are-hard-to-reconstruct problem in phylogenetics.

For digital texts, the authors employ three distance measures: edit distance, normalized compression distance, and cosine similarity. None of these are model-based in any phylogenetic sense (although the first one is used in alignment programs such as Clustal) — I have discussed this in the post on Non-model distances in phylogenetics. Their tree-building methods include: parsimony, support vector machines (a machine-learning form of classification), and random forests (a decision-tree form of classification). Once again, none of these is model-based in terms of textual changes.

A final issue is the insistence on trees as the model of a phylogeny. In stemmatology, for example, a network is a more obvious phylogenetic model, because hand-written texts can be copied from multiple sources. Indeed, this distinction plays an important role in the first application of phylogenetics to stemmatology (see the post on An outline history of phylogenetic trees and networks). Perhaps this is not an issue for "near-duplicate text documents", but it does seem like an unnecessary restriction. Moreover, one of the empirical examples used in the paper actually has a network history, which therefore does not match the authors' reconstructed tree.

The Genome Cellar is no such thing


In an earlier blog post, I noted that The Music Genome Project is no such thing. The use of the word "genome" in this context is an analogy, in which the musical characteristics are seen as producing a sort of genetic fingerprint. However, this is a false analogy, because the data used for the Music Genome Project are actually phenotypic, not genotypic. Indeed, music has no analog of a genotype.

In a similar vein, the data used for The Genome Cellar are phenotypic, not genotypic, and so this is also a false analogy.


The Genome Cellar is the database used by the Next Glass app. This app was released in November 2014, and a concurrent press release explained the concept:
Next Glass is the breakthrough app that uses science and machine learning software to provide accurate, personalized recommendations to consumers. Next Glass has analyzed tens of thousands of bottles of wine and beer with a mass spectrometer and stores the "DNA" of each product in its Genome Cellar™, which combines with users' Taste Profiles™ to provide product-specific recommendations.
So, the beer / wine data in the Genome Cellar are peaks in a spectrophotometer output. This is made clear in another press release:
Next Glass has developed the world’s first Genome Cellar, an extensive database that contains the chemical makeup – or "DNA" – of tens of thousands of wines and beers. By looking at each bottle on a molecular level, Next Glass defines a unique taste profile for every bottle by analyzing thousands of chemical elements.
This procedure will, indeed, provide a unique fingerprint for each alcoholic product, but it will be a phenotypic one not a genotypic one. Genetics is often chemistry but not all chemistry is genetics.

The idea of the Next Glass app is the same as that for the Music Genome Project — to use the fingerprint of currently liked products (music or wines / beers) to make recommendations for other products that might appeal to the customer. This approach can be expected to work for alcoholic beverages, because the subjective preferences will be based to some extent on the sensory components of the chemical makeup. If you document enough of the chemistry then you are bound to include a large proportion of the sensory part.

Anyway, you can see a short video about the laboratory here.

Finally, you might like to compare this approach with that of WineFriend, which tries to assess your taste in wine with multiple-choice questions, instead of complex chemistry. WineFriend:
uses a simple eight question taste survey that gives insights into a customer's thresholds for sweet, sour, bitterness and intensity of flavour. It then creates a profile which enables it to select wines that are tailored to the individual customer's tastes.
No mention of genomes here.

Changes in Playboy’s women through 60 years


It has long been known that ideas about female attractiveness, and concern with body weight among young women, are closely related to exposure to mass media images (see the review by Spettigue & Henderson 2004). The print media are particularly involved in this issue, not least the so-called "men's magazines", such as Playboy. It therefore created a great deal of media interest when it was announced in October 2015 that Playboy would no longer feature nude centerfolds (known as Playmates).

Indeed, Playboy has often been claimed as a purveyor of the US society's image of the "ideal woman", although this is surely media exaggeration. Playboy, whether we love it or hate it, has simply portrayed females that the editors thought would sell magazines at the time. Nevertheless, the magazine's choice of models has been used in the professional medical and psychological literature as representative of a prevalent cultural idealization of an ultra-slender female body shape (eg. Garner et al. 1980; Wiseman et al. 1992; Szabo 1996; Spitzer et al. 1999; Katzmarzyk & Davis 2001; Pettijohn & Jungeberg 2004).

It therefore comes as no surprise that the magazine's database of model statistics was subjected to scrutiny in the online media after the 2015 announcement, particularly with regard to how things had changed during the magazine's 62 years. Sadly, some of this analysis was quite poor (eg. Playboy's image of the ideal woman sure has changed). Here, I try to correct this by presenting a more thorough study of the available data.


The data I have used covers all of the Playmates of the Month that have appeared in the US edition of the magazine since its inception. This is contained in a searchable version of the pmstats.txt file that has been maintained by Jim Dean, Johnny Corvin and Doug Ewell, as currently available on Peggy Wilkins' website. This file is an updated compilation of the so-called "vital statistics" of the Playmates from December 1953 to February 2016, inclusive, as reported in Playboy, sometimes supplemented from other available sources.

Note, especially, that the data are basically self-reported by the Playmates. Some of the information has been questioned at various times, notably where it seems to contradict the associated photographic evidence. As a reputable scientist, I should probably have personally checked all of this evidence, but I have not done so (you can do so yourself, based on whatever photos you can find on the internet). I have simply assumed that, at a minimum, the information presents whatever the Playmates thought was a desirable public image at the time of publication.

There are 753 records in the dataset, separately including twins and triplets appearing in the same magazine issue, as well as multiple appearances by the same woman in different issues. The data include: magazine issue month; Playmate name, birth date and birth location; height in inches and weight in pounds; breast, waist and hip dimensions in inches; and photographer name. From this information, for each Playmate I calculated their age at the time of publication, along with standard measurements for determining whether a body is healthy or not: Body Mass Index (BMI), for body size (ie. underweight, normal weight, overweight, obese), and Waist to Hip Ratio (WHR), for body curvaceousness.

Analysis

As is usual in this blog, the data can be summarized using a phylogenetic network as a form of exploratory data analysis (see How to interpret splits graphs).

I first range-standardized the data (so that all of the measurements are compared on the same scale), and log-transformed the BMI and WHR measurements (because otherwise these ratios will have non-linear relationships to the other variables). I then used the manhattan distance to calculate the similarity of the different publication years and birth locations, based on the Playmates' body dimensions. This was followed by a neighbor-net analysis to display the between-year and the between-location similarities as two phylogenetic networks.

The network of relationships among the years is shown first. Years that are closely connected in the network are similar to each other based on the body dimensions of their Playmates, and those that are further apart are progressively more different from each other.

Click to enlarge

The network shows that there has been a strong and consistent change in Playmate age, size and shape through time. In the graph there is a simple gradient through time form top-right to bottom-left — the 1950s and 1960s are intermingled at the top, with the 1970s below them, the 1980s and 1990s below that, and the 2000s and 2010s intermingled at the bottom.

So, it will be worth looking at time graphs of the individual measurements. Let's start with age.


This does not show a particularly consistent trend, but the average age of the models does increase from 21 to 24 years from beginning to end of the time period.

The next graph shows that the reported height of the Playmates also increases across the 62 years, by 2.5" on average. There is almost no change in average weight across the decades (and so the graph is not shown).


However, far more notable is the relationship between height and weight, as expressed by the BMI, which is shown in the next graph. This does not show a linear trend at all, but a distinctly curved one. That is, the size of Playmates definitely changed through time, becoming thinner for the first 40 years, but then thickening up again for the next 20 years.


This trend has not been discussed in the professional literature, as far as I can determine, perhaps because previous assessments have been based only on a relatively short period of time, not the full 6 decades. Note that the bottom point of the curve occurs in c. 1997, and that by 2016 the BMI measurements had returned to the 1975 level (40 years earlier). I wonder whether they would return to the 1950s level in another 20 years?

More importantly, given that Playmates are to one degree or another reflecting a contemporary societal image of a desirable woman, we can note that 48% of these models are classified as being underweight. The lower limit of a healthy BMI is 18.5, as shown in the next graph, which also shows the boundaries between Mild thinness (17-18.5), Moderate thinness (16-17) and Severe thinness (<16).


Clearly, during the period 1975-1995 the vast majority of the models reported being underweight, while in the 1950s and 1960s very few of them did. This situation has improved recently, with roughly a half being underweight during the past 20 years. Also, several of the reported body sizes are very unhealthy. However, perhaps the BMI values below 16 are unreliable, in the sense that such a person is not likely to be very photogenic.

We can now move on to the circumferences of the models. The next graph shows the time trend for the reported circumference at breast level. This shows the biggest and most consistent change of all, with a dramatic reduction in bustiness.


Indeed, chest sizes of >36" have hardly been reported since the start of 1990, and yet in the early years a buxom 36-24-36 figure was the most common claim by the Playmates. Interestingly, very few of the models have claimed a chest size of 33" (as opposed to 32" or 34"); is this some sort of superstition?

The other large and consistent change in circumference is for waist size, as shown in the next graph. This shows the opposite trend, with an increase in average reported size of 2" across the 60 years.


There was a slight but not consistent reduction in hip circumference during time (and so the graph is not shown). This means that the WHR, the measure of curvaceousness, changed greatly through time, as shown in the next graph. So, with the waists reportedly becoming larger, there was apparently a very large reduction in the curvaceousness of the models through time.


Note that the reduction in BMI was apparently achieved in spite of an increase in waist size — the BMI reduction seems to be related to the increase in average reported height without an increase in weight, and partly to the decrease in chest size.

When combined with the reduction in breast circumference, this means that the Playmates of the 21st century have been a very different shape from those of the mid 20th century. They were taller, with smaller breasts and larger waists, and thus had fewer curves.

We can end this discussion by considering where these Playmates were born. Most of them reported being born in the USA (83%). This means that we can consider how the various states compare in producing nude models. Obviously, more models are likely to come from the most populous states, and so we need to standardize the data by dividing by the population size of each state (as estimated for 2015 in Wikipedia), to yield the number of Playmates per million people in each state.


Apparently, Hawaii and California are more likely than the other states to produce models who are prepared to take their clothes off in public, while Delaware and Vermont have not yet done so, at least as far as Playboy is concerned. The apparently large value for Washington DC represents only 2 models from a relatively small population.

We can also consider whether the dimensions of the models vary in any consistent way between the states. This can be done with a phylogenetic network, as discussed above. In the following network, states that are closely connected are similar to each other based on the body dimensions of their Playmates, and those that are further apart are progressively more different from each other.


There appear to be no consistent patterns here.

So, we can finish by considering the countries from which the remaining 17% of the models originated. Once again, the data are standardized, to yield the number of Playmates per million people in each country (or province, for Canada). The apparently large value for Malta represents one set of twins from a relatively small population.


There have been a relatively large number of models from Scandinavia (Norway, Denmark and Sweden). This presumably represents the number of females whose body shape matches the image required by the Playboy editors, as much as the willingness of Scandinavians to disrobe publicly. However, it is notable that the rate of models from Norway is double those for Denmark and Sweden.

References

Garner DM, Garfinkel P, Schwartz D, Thompson M (1980) Cultural expectations of thinness in women. Psychological Reports 47: 484-491.

Katzmarzyk PT, Davis C (2001) Thinness and body shape of Playboy centerfolds from 1978 to 1998. International Journal of Obesity 25: 590-592.

Pettijohn TF, Jungeberg BJ (2004) Playboy Playmate curves: changes in facial and body feature preferences across social and economic conditions. Personality and Social Psychology Bulletin 30: 1186-1197.

Spettigue W, Henderson KA (2004) Eating disorders and the role of the media. Canadian Child and Adolescent Psychiatry Review 13: 16-19.

Spitzer BL, Henderson KA, Zivian, MT (1999) Gender differences in population versus media body sizes: a comparison over four decades. Sex Roles 40: 545-565.

Szabo CP (1996) Playboy centrefolds and eating disorders - from male pleasure to female pathology. South African Medical Journal 86: 838-839.

Wiseman CV, Gray JJ, Mosimann JE, Ahrens AH (1992) Cultural expectations of thinness in women: an update. International Journal of Eating Disorders 11: 85-89.

The practical limits of networks?


Network techniques are becoming more widespread in biology and anthropology. However, the data in both of these disciplines can form very complicated patterns, indeed; and there must be practical limits to what one can do with a network analysis. This post discusses an example that covers both disciplines, and which may well exceed those limits.

The data come from:
Pugach I, Matveev R, Spitsyn V, Makarov S, Novgorodov I, Osakovsky V, Stoneking M, Pakendorf B (2016) The complex admixture history and recent southern origins of Siberian populations. Molecular Biology and Evolution 33: 1777-1795.

The authors note:
Siberia is an extensive geographical region of North Asia stretching from the Ural Mountains in the west to the Pacific Ocean in the east, and from the Arctic Ocean in the north to the Kazakh and Mongolian steppes in the south. This vast territory is inhabited by a relatively small number of indigenous peoples, with most populations numbering only in the hundreds or few thousands. These indigenous peoples speak a variety of languages belonging to the Turkic, Tungusic, Mongolic, Uralic, Yeniseic, Chukotko-Kamchatkan, and Aleut-Yupik-Inuit families, as well as a few isolates. There is also variation in traditional subsistence patterns ... This linguistic and cultural diversity suggests potentially different origins and historical trajectories of the Siberian peoples.
Previous studies of the genetic history of Siberian populations were hampered by the extensive admixture that appears to have taken place among these populations, because commonly used methods assume a tree-like population history and at most single admixture events.
This suggests the use of network techniques, instead of tree-based ones. However, under the circumstances described here it may be unwise to try to produce a phyogenetic network. The situation, as described, does not resemble a "tree with reticulations" but more of an "anastomosing plexus". The latter may be more confusing than helpful, when visualized as a network.

So, the authors do not mention the word "network" nor even "reticulation". Instead:
Here we analyze geogenetic maps and use other approaches to distinguish the effects of shared ancestry from prehistoric migrations and contact, and develop a new method based on the covariance of ancestry components, to investigate the potentially complex admixture history. We furthermore adapt a previously devised method of admixture dating for use with multiple events of gene flow, and apply these methods to whole-genome genotype data [genome-wide SNPs] from over 500 individuals belonging to 20 different Siberian ethnolinguistic groups [plus 9 reference populations].
The results of these analyses indicate that there have been multiple layers of admixture detectable in most of the Siberian populations, with considerable differences in the admixture histories of individual populations.
The admixture (or introgression) patterns among the populations are illustrated using a map. Each bar represents a population, with the colors denoting the different enthnolinguistic groups. Note that every population shows admixture.


The reconstructed migration relationships among the populations are also illustrated using a map. This time, the colors of the arrows represent the different ethnolinguistic groups.


I would not like to have to represent these patterns using a network, and make that network comprehensible. So, this dataset may exceed the practical limits of networks.

Inheritance in cultural evolution


I recently reviewed a book anthology devoted to the application of phylogenetic methods in archaeology (see List 2016, PDF here). This book, entitled Cultural Phylogenetics: Concepts and Applications in Archaeology, edited by Larissa Mendoza Straffon (2016), assembles eight articles by scholars who discuss or illustrate the application of phylogenetic approaches in different fields of anthropology and archaeology.

The volume presents a rich collection of different approaches, covering various topics ranging from the evolution of skateboards (Prentiss et al.) to the spread of the potter's wheel (Knappett). The articles dealing with theoretical questions range from historical accounts of tree-thinking in biology and anthropology (Kressing and Krischel) to an overview of the impact of Darwinian thinking on archaeology and anthropology (Rivero). Although I missed a golden thread when reading the eight articles of the volume, it is definitely worth a read for those interested in evolutionary approaches in a broader sense, as most articles explicitly reflect differences and commonalities between biological and cultural evolution, providing concrete insights into the challenges that archaeologists face when trying to promulgate quantitative approaches.

It is clear that evolution in the general sense is much broader than merely evolution in biology, as I have often tried to illustrate in this blog when showing how phylogenetic approaches can be applied in linguistics. Provided that descent with modification holds — in a broader sense — also for cultural artifacts, it is obvious to search for fruitful analogies between biological and cultural evolution, in order to profit from methodological transfer in disciplines like anthropology and archaeology. It is also clear, however, that certain analogies between biological evolution and evolution in other fields should be considered with great care. Even in linguistics, this is clearly evident, and I have pointed to this problem in the past (see Productive and unproductive analogies...). The goal cannot be a to try to press biological methods into the anthropological template. Instead, we have to rigorously test our proposed analogies, and adapt the biological methods to our needs if necessary.

What surprised me when reading the book was that the majority of the articles did not really seem to care about the crucial differences between biological and cultural evolution, but rather tried to fit the feet and heels of cultural evolution into biology's shoes. Tree thinking dominated most of the articles (with Knappett as a notable exception), and the scholars tried hard to find a clear distinction between vertical and lateral inheritance in cultural evolution. While it is clear that this distinction is the basis for phylogenetic tree applications, where patterns that do not fit a tree are explained as instances of homoplasy or lateral transfer, it is by no means clear why one would go through all the pain to identify these patterns in cultural evolution.

Consider, as an example, the evolution of skateboards. At some point in the history of mankind (some late point!), people decided to put wheels on a board and to do artistic tricks with it. Later, other people merchandised this idea, and started to sell those boards with wheels. Later on, other companies jumped on the bandwagon and started to produce their own brands, thus instigating a fight for the "best" model for a certain kind of clientel. In all of these cases, ideas for design were clearly taken among groups of people, further modified by specific needs or trends, until the current variety of skateboards arose. But which of these ideas were transferred vertically, and which ideas were transferred laterally? Can we identify processes of "speciation" in skateboard evolution, during which new brands were born?

In biology and linguistics we have the clear-cut criteria of interfertilityand intelligibility. They cause us enough problems, given that we have ring species in biology and dialect chains in linguistics, but at least they give us some idea how to classify a given exemplar as belonging to a certain group. But what is the counterpart in the evolution of skateboards? Their brand? Their shape? Their users? The analogy simply does not hold. We have neither vertical nor lateral transfer in topics such as skateboard evolution. All we have is a before and an after— a complex network in which objects were constantly recreated and modified, be it based on ideas that were inspired by other objects or people, or independently developed. It seems completely senseless to search for a distinction between vertical and lateral patterns here, as it is not even clear to what degree we are actually dealing with decent with modification.

It seems to me that the problem of inheritance needs to be addressed in cultural evolution before any further quantitative applications using tree-building methods are carried out. Given that ideas can easily be develop independently, the crucial question for studies of cultural evolution is whether similar ideas can be shown to share a common history. It is (as David mentioned in earlier in a blog post on False analogies between anthropology and biology) the general problem of homology that does not seem to be solved in most studies on cultural evolution. Here, linguistics has generally fewer problems, given that linguists have developed methods to test whether two words are homologous. In cultural evolution, however, the assessment of homology is far from being obvious.

I think that cultural evolution studies such as the ones presented in the book would generally profit from network approaches. By network approaches, I do not necessarily mean evolutionary networks (in the sense of Morrison 2011), as the problem of inheritance is difficult to solve. Instead, I am thinking of exploratory data analysis using phylogenetic networks (Morrison 2011), or some version of similarity networks (Bapteste et al. 2012). Phylogenetic network approaches are frequently used in biology, and are now also very popular in linguistics. Similarity networks are more common in biology, but we have carried out some promising studies of linguistic data (List et al. 2016). As all of these approaches are exploratory and very flexible regarding the data that is fed to them, they might offer new possibilities for exploratory studies on cultural evolution.

References
  • Bapteste, E., P. Lopez, F. Bouchard, F. Baquero, J. McInerney, and R. Burian (2012) Evolutionary analyses of non-genealogical bonds produced by introgressive descent. Proceedings of the National Academy of Sciences 109.45. 18266-18272.
  • Knappett, C. (2016) Resisting Innovation? Learning, Cultural Evolution and the Potter’s Wheel in the Mediterranean Bronze Age. In: Mendoza Straffon, L. (ed.) Cultural Phylogenetics: Concepts and Applications in Archaeology. Springer International Publishing: Cham and Heidelberg and New York and Dordrecht, pp. 97-111.
  • List, J.-M., P. Lopez, and E. Bapteste (2016) Using sequence similarity networks to identify partial cognates in multilingual wordlists. In: Proceedings of the Association of Computational Linguistics 2016 (Volume 2: Short Papers), pp. 599-605.
  • List, J.-M. (2016) [Review of] Cultural Phylogenetics: Concepts and Applications in Archaeology; edited by Larissa Mendoza Straffon. Systematic Biology (published online before print).
  • Morrison, D. (2011) An Introduction to Phylogenetic Networks. RJR Productions: Uppsala.
  • Prentiss, A., M. Walsh, R. Skelton, and M. Mattes (2016) Mosaic evolution in cultural frameworks: skateboard decks and projectile points. In: Mendoza Straffon, L. (ed.) Cultural Phylogenetics: Concepts and Applications in Archaeology. Springer International Publishing: Cham and Heidelberg and New York and Dordrecht, pp. 113-130.
  • Rivero, D. (2016) Darwinian archaeology and cultural phylogenetics. In: Mendoza Straffon, L. (ed.) Cultural Phylogenetics: Concepts and Applications in Archaeology. Springer International Publishing: Cham and Heidelberg and New York and Dordrecht, pp. 43-72.
  • Mendoza Straffon, L. (2016) Cultural Phylogenetics. Concepts and Applications in Archaeology. Springer International Publishing: Cham.

Network of who marries whom, by profession


This blog is supposed to be about phylogenetic networks, not social networks. However, this post is a blatant exception.

Earlier this year, Adam Pearce and Dorothy Gambrell released this interesting web page:
This chart shows who marries CEOs, doctors, chefs and janitors
It is an interactive interface to a database of who marries whom. It is well known that people in certain professions tend to marry others with a given profession, and this database quantifies this pattern. The data are from the United States Census Bureau’s 2014 American Community Survey, which covers 3.5 million households. However, much of the dataset clearly also applies to many countries in the "western world".


The infographic is a matrix of professions organized left to right by more male-dominated to more female-dominated (as determined from the data in the database). If you move the mouse-pointer over any profession (or use the search box) then lines link the most common professions that the focus profession tends to marry, with line thickness indicating quantity. The pink and blue color gradients indicate the sexes of the two spouses.

You could try well-known marriage links like those for veterinarians (who tend to marry other veterinarians) and nurses (who tend to marry medical doctors), but more interesting ones for readers of this blog might be: biologists, mathematicians and statisticians (shown in the image above), computer programmers, or information professionals.

However, if you want to get really confused, try looking at "waitresses", "cooks" and "chefs", which seem to offer intransitive relationships.

Networks of music history


Networks are currently popular in studies of music. However, they tend to be unrooted similarity networks, showing some form of alleged commonality among artists or their music, as shown in the first graph. This example displays phenotypic similarity among the named artists, although how the "similarity" is measured is not always clear (the post on The Music Genome Project is no such thing briefly discusses this).


[Note: For an alternative approach, Glenn McDonald's Every Noise at Once has a two-dimensional scatter-plot of 1,491 music genres.]

Of more interest to us is the use of a network to study the historical development of music genres, for which we need a rooted network. Clearly, music history will be reticulate rather than tree-like, given the obvious transfers of musical modes between and within cultures, and even the possible resurrection of earlier styles at a later time and even place. A similar argument applies to musical instruments, of course (see Cornets: from a tree to a network; Guitars and networks).

Music networks appear in a previous post, on Reconstructing ancestors in a splits network. That post discusses the paper by J. Miguel Díaz-Báñez, Giovanna Farigu, Francisco Gómez, David Rappaport & Godfried T. Toussaint (2004) El Compás flamenco: a phylogenetic analysis. Proceedings of BRIDGES Conference: Mathematical Connections in Art, Music and Science, pp. 61-70.

The authors provide an analysis of the hand-clapping patterns of the flamenco music of Andalucia, in southern Spain. There are four recognized patterns, plus the fandango pattern, and the authors use two different distance measures to assess their rhythmic similarities. They produce unrooted phylogenetic networks based on each of these distances, using NeighborNet, one of which is shown in the second graph.


The authors ignore the fact that "it is well established that the fountain of flamenco music is the fandango", which would make the fandango the outgroup for rooting if we did wish to treat the networks as rooted. Instead, they try to "reconstruct the 'ancestral' rhythms correspnding to the nodes" by using mid-point rooting. This is a tricky business for networks, because there are multiple paths through the graph, and so the mid-point is not necessarily unique.

A similar NeighborNet analysis had previously been provided by Godfried Toussaint (2003) Classification and phylogenetic analysis of African ternary rhythm timelines. Proceedings of BRIDGES Conference: Mathematical Connections in Art, Music and Science, pp. 25-36. This involved an analysis of the 12/8 time bell rhythms in African and Afro-American music. The distances were based on "measures of rhythmic oddity and off-beatness" (this is briefly discussed in Hunting for rhythm’s DNA).


Very few people seem to be interested in producing rooted phylogenetic diagrams directly, except when their model is a tree rather than a network. Perhaps the most ambitious of these is by Victor Grauer (2011) Sounding the Depths: Tradition and the Voices of History. This is available as a paperback or for kindle. The audio-visual examples are available as a blog page, as are the figures.

His tree is shown in the next graph, including the characters on which it is based. Note that group B3. "Social Unison" is associated with a historical bottleneck, so that the prior history appears to be uncertain.


Finally, not everyone agrees about the importance of the obvious reticulation patterns in music history, notably Sylvie Le Bomin, Guillaume Lecointre, Evelyne Heyer (2016) The evolution of musical diversity: the key role of vertical transmission. PLoS One 11: e0151570. These authors study the music of groups of farmer and hunter-gatherer Bantu and Ubanguian speakers from Gabon, in western Africa. Their music characters are from three groups: repertoire (set of pieces including circumstance and social or symbolic implicit information), performativ (polyphonic process, form, instruments and vocal techniques), and intrinsic (metrics, rhythm and melodic).


The authors present a rooted phylogenentic tree, but there is also a "filtered" NeighborNet tucked away in an appendix. It seems to contradict any claim for the data being particularly tree-like.

Finally, to return to where I started, you could take a look at Musicmap, which allegedly covers The Genealogy and History of Popular Music Genres from Origin till Present (1870-2016). To quote from the info:
Musicmap attempts to provide the ultimate genealogy of popular music genres, including their relations and history. It is the result of more than seven years of research with over 200 listed sources and cross examination of many other visual genealogies. Its aim is to focus on the delicate balance between comprehensibility, accuracy and accessibility.

You need to zoom in a long way to appreciate the complexity of the network, covering 230 music genres. There is nominally a timeline from top to bottom (starting in 1870), although the network connections are not strictly time-consistent. As the (mostly Belgian) creators (lead by Kwinten Crauwels) note:
The ideal genealogy is not only complete and correct, but also easy to understand despite its complexity. This is a utopian balance that can never be achieved but only approached. By choosing the right amount of genres, determining forms of hierarchy and analogy and ordering everything in a logical but authentic manner, a satisfactory balance can be obtained ... Musicmap is a platform in search for the perfect balance of popular music genres to provide a powerful tool for educational means or a complementary framework in the field of music metadata and automatic taxonomy.

Can biologists learn from linguists?


Of course they can. Biologists who know nothing about linguistics can learn a lot about linguistics from linguists, including the most nerdy, the most boring, and the most interesting things.

However, it is obvious that the question in the title of this post implies a different object of learning, and a more precise title would have been "Can biologists learn about evolution from linguists?" As a linguist, I would of course also provide an affirmative answer, but I doubt that most biologists would agree. At the moment, we have a situation in which the majority of interdisciplinary papers state that linguists can learn from biologists. The opposite, that biologists can learn from linguistics, can rarely be found.

Biology to linguistics

An abundance of analogies between biology and linguistics has been noticed so far, and new analogies are regularly being proposed. When looking at the analogies that have been made so far, we find that most of them have never been really followed up. Languages, for example, have been compared with organisms (Schleicher 1848: 16f), species (Pagel 2009), microbes (Nelson-Sathi et al. 2011, List et al. 2014), mutualist symbionts (van Driem 2004), and populations (Mufwene 2001). Words have been compared with cells (Schleicher 1863: 23f), amino-acids (Zwick 1978), codons (Enguix et al. 2012, Jakobson 1973) and genes (Pagel 2009. Sounds (phonemes) have been compared with nucleic bases (Hruschka et al. 2015, Enguix et al. 2012) and atoms (Zwick 1978). Only a small number of these analogies have received broader attention, many have been rejected quickly after they were first proposed, and only recently has an explicit transfer of methods and models been initiated (Atkinson and Gray 2005).

The tenor of most recent studies, especially in the literature published during the past one to two decades, is often that we finallyrealize that language evolution is largely the same as biological evolution,  surprisingly (for a recent account in this direction, see Pagel 2016). As a result, it is claimed that we can easily use biological methods to study language evolution. We need to use them, since linguistics is in a poor state with no methods of its own, and linguists have never quantified what they know about the history of their languages. Then, finally, with these new methods developed in biology, we see light at the end of the tunnel, and we can draw nice trees of our languages and see how they evolved into their current shape.

I am in complete favour of increasing the objectivity in historical linguistics, making it a more data-driven and a more transparent discipline. I also advocate interdisciplinary transfer of methods and models, and there are quite a few things we can actually learn from biologists in linguistics. What I do not like is this tone, which suggests that biology is the discipline that saved linguistics, waking it up from its 200-year-long sleep in the ivory tower. At the same time, I also do not like the horror-scenarios in traditional linguistics, which state that quantitative approaches would deprive our discipline of all its wit (see the figure below as a not too serious attempt to visualize these two perspectives). In this context, it is quite interesting to look back in history and to recapitulate what actually happened.

The biological storm of bits and bytes: Will it destroy the ivory tower of historical linguistics
or ultimately help it to shine with a new gloss?

The discipline of historical linguistics is about 200 years old, starting with the legendary scholarly work of poeple like Rasmus Rask (Rask 1818), Jakob Grimm (Grimm 1822), and Franz Bopp (Bopp 1816). Using family trees to model language history goes back to the 17th century, pre-dating the first networks in biology by one century (see David's overview in Morrison 2016). The first explicit alignments showing homologous sounds across words occur at least as early as the beginning of the 20th century (Dixon and Kroeber 1919), cladistic frameworks date back to the second half of the 19th century (Brugmann 1886), and even algorithms for tree reconstruction based on distance data occur back in the 1960s (Dyen's comment in Hymes 1960).

The discipline of historical linguistics can look back on a remarkable history of excellent scholarship. Thanks to this scholarship, we have gained invaluable insights, not only into the history of the world's languages, but also into the mechanisms that trigger linguistic diversity. It is undeniable that methods from evolutionary biology have given us some fresh insights during the past 20 years, but their actual influence is often exaggerated. On the one hand, our experience (since the quantitative turn in historical linguistics) shows that in most cases we cannot use biological methods to analyze our data directly. Instead, we need to carefully adapt them to our needs in order to get the best out of them (as I have tried to show in more detail in List 2014).

On the other hand, there is no example during the past 20 years, that I would know of, where the modern biological methods have really revolutionized our insights into language history. They have undeniably shifted our attention towards data and quantification. They have exposed weak spots, in our argumentation, and they have forced us to restate questions that we had forgotten to ask. But no new language family has been detected, no deeper genealogies between existing languages have been proposed, and no deeper insights into human prehistory have been achieved by the use of biological methods alone. Historical linguistics has profited from evolutionary biology, but not as a small oasis in the desert that was given water and seeds by the lords of bits and bytes, but as a discipline in which scholars learned to make active and critical use of interdisciplinary approaches.

Linguistics to biology

This brings us back to the question of the title. Can biology learn from linguistics? It has done so undoubtedly in the past. Tree-drawing in biology, for example, was popularized by Ernst Haeckel who himself became influenced by the linguist August Schleicher (Sutrop 2012: 300). In the early days of genetics, a multitude of metaphors were borrowed from linguistics to describe biological phenomena with words like "alphabet", "word" (Gamov 1954), or "translation" (Crick 1959).

While not all biologists have been in favor of this tendency (see, for example, Shanon 1978), and the borrowing of terms does not necessarily imply methodological transfer, we also find examples for the explicit transfer of methods and theories from the linguistic to the biological domain. As an example, consider the theory of formal grammar (Chomsky 1959) which still plays a very important role in addressing certain problems in bioinformatics (Searls 1997), like RNA folding and protein structure analysis. Biological textbooks on sequence comparison still tend to include a chapter on formal grammars and their application in biology (Durbin et al. 1998).

Biology could also profit from linguistic insights in the future, and this becomes a bit clearer when we recall, what Schleicher mentioned 150 years a go (and what has been obviously forgotten since then):
Observing how new forms descend from old ones can be done more straightforwardly and in a larger scale in linguistics than in biology. For once, the linguists have an advantage over the natural scientists. (Schleicher 1863: 18, my translation)
The advantage of linguistics, which Schleicher points out, is the availability of very concrete, very detailed, very valuable data in linguistics. This data allows us to see evolutionary forces in a detailed way of which biologists can only dream. Written sources allow us to trace the history of whole language families like Romance (and to some extent also Chinese dialects) from their ancestral speech varieties down to today. Language change is fast enough to allow us to investigate it in action. Recent topics in biology, like the importance of invoking a system perspective in evolution, have been long since debated and discussed in linguistics (Tynjanow and Jakobson 1928, since they are so much easier to detect.

In the past, when I worked intensively on the implementation of the Minimal Lateral Network method (Dagan and Martin 2007, Dagan et al. 2008) on linguistic data (List et al. 2014, List 2015), I stumbled upon numerous examples showing the limits of tree topology as a predictor for lateral transfer events. Given that the same necessarily also holds for lateral gene transfer, I was asking myself whether these false positives and the false negatives in the analyses would simply not matter due to the large amount of data in biology, or whether it was ignored due to the lack of good data for algorithmic evaluation. Later, when I read David's post on Tardigrades and phylogenetic networks, where he pointed to two analyses on the same data that explained them once with lateral gene transfer (Boothby et al. 2015) and once with errors in the data (Koutsovoulos 2015), I became aware of the strong advantage of my linguistic data, since I could test it against written records, tracing the history of words through centuries, thus being able to spot errors immediately when looking up a data point.

The detail of our data in linguistics is both a blessing and a curse. It enables us to write detailed word histories without ever having heard of tree reconciliation methods. On the other hand, it seduces us to get lost in details, forgetting about the bigger picture, and the bigger questions that we could ask, if this data was properly digitized and formalized. In this regard, historical linguistics still needs to learn from biology, as we have failed to turn historical linguistics into a modern, data-driven discipline. With more and more detailed data becoming available, however, the day will come when Schleicher is proven right, and when biologists can learn from linguists about evolution.

References
  • Atkinson, Q. and R. Gray (2005): Curious Parallels and Curious Connections: Phylogenetic Thinking in Biology and Historical Linguistics. Syst. Biol. 54.4. 513-526.
  • Boothby, T., J. Tenlen, F. Smith, J. Wang, K. Patanella, E. Osborne Nishimura, S. Tintori, Q. Li, C. Jones, M. Yandell, D. Messina, J. Glasscock, and B. Goldstein (2015): Evidence for extensive horizontal gene transfer from the draft genome of a tardigrade. Proceedings of the National Academy of Sciences 112.52. 15976-15981.
  • Bopp, F. (1816): Über das Conjugationssystem der Sanskritsprache in Vergleichung mit jenem der griechischen, lateinischen, persischen und germanischen Sprache. Nebst Episoden des Ramajan und Mahabharas in genauen metrischen Uebersetzungen aus dem Originaltexte und einigen Aabschnitten aus den Veda’s. Andreäische Buchhandlung: Frankfurt am Main.
  • Brugmann, K. (1886): Einleitung und Lautlehre: Vergleichende Laut-, Stammbildungs- und Flexionslehre der Indogermanischen Sprachen [Introduction and Phonetics. Comparative Studies of Sound Systems, Stem Formations, and Inflexion Systems of Indo-European Languages]. Grundriß der vergleichenden Grammatik der indogermanischen Sprachen [Foundations of the comparative grammar of the Indo-European languages], vol. 1. Walter de Gruyter, Berlin, Leipzig.
  • Chomsky, N. (1959): On certain formal properties of grammars. Information and Control 2. 137-167.
  • Crick, F. (1959): The present position of the coding problem. The Brookhaven Symposia in Biology 12. 35-39.
  • Dagan, T. and W. Martin (2007): Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution. Proceedings of the National Academy of Sciences 104.3. 870-875
  • Dagan, T., Y. Artzy-Randrup, and W. Martin (2008): Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution. Proceedings of the National Academy of Sciences 105.29. 10039-10044.
  • Dixon, R. and A. Kroeber (1919): Linguistic families of California. University of California Press: Berkeley.
  • van Driem, G. (2004): Language as organism: A brief introduction to the Leiden theory of language evolution. In: Lin, Y.-c., F.-m. Hsu, C.-c. Lee, J.-S. Sun, H.-f. Yang, and D.-a. Ho (eds.): Studies on Sino-Tibetan Languages. Academia Sinica: Taipei. 1-9.
  • Durbin, R., S. Eddy, A. Krogh, and G. Mitchinson (2002): Biological sequence analysis. Probabilistic models of proteins and nucleic acids. Cambridge University Press: Cambridge.
  • Enguix, G. and M. Jimenez-Lopez (2012): Natural language and the genetic code: From the semiotic analogy to biolinguistics. In: Proceedings of the 10th World Congress of the International Association for Semiotic Studies (IASS/AIS). 771-780.
  • Gamov, G. (1954): Possible relation between deoxyribonucleic acid and protein structures. Nature 173. 318.
  • Grimm, J. (1822): Deutsche Grammatik. Dieterichsche Buchhandlung: Göttingen.
  • Hruschka, D., S. Branford, E. Smith, J. Wilkins, A. Meade, M. Pagel, and T. Bhattacharya (2015): Detecting regular sound changes in linguistics as events of concerted evolution. Curr. Biol. 25.1. 1-9.
  • Hymes, D. (1960): Lexicostatistics so far. Curr. Anthropol. 1.1. 3-44.
  • Jakobson (1973): Six lectures on sound and meaning. Cambridge and London: MIT Press
  • Koutsovoulos, G., S. Kumar, D. Laetsch, L. Stevens, J. Daub, C. Conlon, H. Maroon, F. Thomas, A. Aboobaker, and M. Blaxter (2015): The genome of the tardigrade Hypsibius dujardini. bioRxiv.
  • List, J.-M., S. Nelson-Sathi, H. Geisler, and W. Martin (2014): Networks of lexical borrowing and lateral gene transfer in language and genome evolution. Bioessays 36.2. 141-150.
  • List, J.-M. (2014): Sequence comparison in historical linguistics. Düsseldorf University Press: Düsseldorf.
  • List, J.-M. (2015): Network perspectives on Chinese dialect history. Bull. Chin. Linguist. 8. 42-67.
  • Morrison, D.A. (2016): Genealogies: Pedigrees and phylogenies are reticulating networks not just divergent trees. Evol. Biol. in press.
  • Mufwene, S. (2001): The ecology of language evolution. Cambridge University Press: Cambridge.
  • Nelson-Sathi, S., J.-M. List, H. Geisler, H. Fangerau, R. Gray, W. Martin, and T. Dagan (2011): Networks uncover hidden lexical borrowing in Indo-European language evolution. Proc. R. Soc. London, Ser. B 278.1713. 1794-1803.
  • Pagel, M. (2009): Human language as a culturally transmitted replicator. Nature Reviews. Genetics 10. 405-415.
  • Pagel, M. (2016): Darwinian perspectives on the evolution of human languages. Psychonomic Bulletin & Review . 1-7.
  • Rask, R. (1818): Undersögelse om det gamle Nordiske eller Islandske sprogs oprindelse [Investigation of the origin of the Old Norse or Icelandic language]. Gyldendalske Boghandlings Forlag: Copenhagen.
  • Schleicher, A. (1848): Zur vergleichenden Sprachengeschichte. König: Bonn.
  • Schleicher, A. (1863): Die Darwinsche Theorie und die Sprachwissenschaft. Offenes Sendschreiben an Herrn Dr. Ernst Haeckel. Hermann Böhlau: Weimar.
  • Searls, D. (1997): Linguistic approaches to biological sequences. Comput. Appl. Biosci. 13.4. 333-344.
  • Shanon, B. (1978): The genetic code and human language. Synthese 39.3. 401-415.
  • Sutrop, U. (2012): Estonian traces in the Tree of Life concept and in the language family tree theory. Journal of Estonian and Finno-Ugric Lingusitics 3. 297-326.
  • Tynjanow, J. and R. Jakobson (1991): Probleme der Literatur- und Sprachforschung. In: Viehoff, R. (ed.): Alternative Traditionen.10. Vieweg: Braunschweig. 67-69.
  • Zwick, M. (1978): Some analogies of hierarchical order in biology and linguistics. In: Klir, G. (ed.): Applied General Systems Research: Recent Developments & Trends. Plenum Press: New York. 521-529.