Data shelf life

Stephen M. Stigler argues that data have a limited shelf life. The abstract:

Data, unlike some wines, do not improve with age. The contrary view, that data are immortal, a view that may underlie the often-observed tendency to recycle old examples in texts and presentations, is illustrated with three classical examples and rebutted by further examination. Some general lessons for data science are noted, as well as some history of statistical worries about the effect of data selection on induction and related themes in recent histories of science.

In a nutshell, while data itself doesn’t change, everything around it — the people who collected the data, the things that the data is about, and where the data came from — changes over time.

Tags: ,

Some hitherto unkown genealogical trees of music

In last week's post, I discussed Petter Hellström's recent doctoral thesis: Trees of Knowledge: Science and the Shape of Genealogy. In this thesis he discusses three "genealogical tees" in detail. Augustin Augier’s tree of plant families and Félix Gallet’s family tree of languages have already been covered in this blog (you can look them up using the Search box, to the right), but Henri Montan Berton’s family tree of chords has not.

Indeed, the historical literature at large has pretty much ignored the idea of a genealogical tree being associated with music. Nevertheless, the tree itself is explicitly labeled a Genealogical Tree of Chords. This tree, and its predecessor by François Guillaume Vial, thus deserve examination.

Henri Montan Berton (1767–1844) is well known within the history of music; and his tree was published as an independent broadsheet as two (almost identical) editions in c. 1807 and 1815. It seems to have been produced as a teaching tool, as indeed were also the trees of Augier and Gallet. As Petter Hellström notes, for these authors "genealogy did not necessarily involve chronology or change ... the introduction of family trees into secular knowledge production had more to do with the needs of information management, visualisation and communication".

Berton himself states (translated from the French):
In composing the Genealogical Tree, one has has had the intention to present to the eye, at a single glance, the reunion of the great family of Chords, and to demonstrate to the eye that there is only one Primordial [Chord], and that it is the source of all Harmonies.
At the base of the tree is a fundamental bass note along with its 12th and 17th major — this was the harmonic series in 18th century music theory. From here the tree produces 8 branches above, each labeled (at the bottom) with a musical chord, and with another 20 chords labeled further up the branches (all highlighted by arrows at the left). The main trunk (denoted A) is labeled Perfect or Constant Chord. The eight branches are intended to show the relationships between "8 fundamental chords [bottom arrow] and 20 inverted chords [the upper arrows]".

The tree thus displays the harmonic relationships among the chords, rather than any sort of chronological development. It was devised as an aid to learning the fundamentals of music composition.

Berton was not the first to use this idea within music theory. Four decades earlier, in 1766, François Guillaume Vial (1725–?) had produced another broadsheet, this time labeled Genealogical Tree of Harmony.

Like Berton's tree, this is not about chronology, but is about "family relationships" in a different sense. Moreover, in this instance the branching aspect of the tree is abandoned, and the tree foliage is simply festooned with medallions, labeled with chords — it is the different sections of the tree's crown that show relationships, not different branches.

The objective here was to illustrate "the most natural order of harmonic modulation", once again devised as a teaching tool. The two compass roses at the bottom left and right show the circle of fifths (left), guiding horizontal modulation among the chords, and the circle of thirds (right), guiding vertical modulation among the chords.

Vial himself states (translated from the French):
This Genealogical Tree simplifies and allows those who are capable of intonation [to practice] the art of preluding not only on a leading note, but even to change between the most desired modulations of any instrument.
Hellström traces these uses of the "family tree" metaphor in music back to Jean-Philippe Rameau (1683–1764), an influential music theorist. Thus, he concludes that we should:
read the trees of Vial and Berton as graphical codifications of an already established metaphor and manner of thinking about harmony, especially as both authors were informed by Rameau in their understanding of harmony in the first place.
In constructing their respective tree diagrams, Berton and Vial both seized upon an already existing metaphor and made it visible on paper. Their trees are not 'genealogical' in the sense that they charted family history or cross-generational relationships, they are 'genealogical' in the sense that they depict presumably natural, organic relationships, in which every part has its place in the whole, and where every part can be referred back to a common source or root.
These trees do not, therefore, fit into the usual history of genealogical trees, as this blog recognizes them, denoting a chronological history. They, would, however, fir neatly into the post on Relationship trees drawn like real trees.

The early beginnings of visual thinking

Visualization is a relatively new field. Sort of. The increased availability of data has pushed visualization forward in more recent years, but its roots go back centuries. Michael Friendly and Howard Wainer rewind back to the second half of the 1800s, looking at the rise of visual thinking.

On the first construction of the periodic table of elements:

On February 17, 1869, right after breakfast, and with a train to catch later that morning, Mendeleev set to work organizing the elements with his cards. He carried on for three days and nights, forgetting the train and continually arranging and rearranging the cards in various sequences until he noticed some gaps in the order of atomic mass. He later recalled, “I saw in a dream, a table, where all the elements fell into place as required. Awakening, I immediately wrote it down on a piece of paper.” (Strathern, 2000) He named his discovery the “periodic table of the elements.”

I sometimes wonder what they will say about current visualization work a couple of centuries from now. At what point will the historians say, “This is when visualization crashed and burned, never to be seen again.” Or, maybe it’ll go the other way: “This is when everyone understood and communicated with data, and visualization was the vehicle to do it.”

Tags: , , ,

A recent thesis about Trees of Knowledge

Recently, Petter Hellström successfully defended his doctoral thesis:
Trees of Knowledge: Science and the Shape of Genealogy
Department of the History of Science and Ideas
Uppsala University, Sweden
The thesis itself is obviously of great interest to readers of this blog. It is not currently online, but you can obtain a printed or electronic copy by contacting:

Here is the abstract:
This study investigates early employments of family trees in the modern sciences, in order to historicise their iconic status and now established uses, notably in evolutionary biology and linguistics. Moving beyond disciplinary accounts to consider the wider cultural background, it examines how early uses within the sciences transformed family trees as a format of visual representation, as well as the meanings invested in them.
Historical writing about trees in the modern sciences is heavily tilted towards evolutionary biology, especially the iconic diagrams associated with Darwinism. Trees of Knowledge shifts the focus to France in the wake of the Revolution, when family trees were first put to use in a number of disparate academic fields. Through three case studies drawn from across the disciplines, it investigates the simultaneous appearance of trees in natural history, language studies, and music theory. Augustin Augier’s tree of plant families, Félix Gallet’s family tree of dead and living languages, and Henri Montan Berton’s family tree of chords served diverse ends, yet all exploited the familiar shape of genealogy.
While outlining how genealogical trees once constituted a more general resource in scholarly knowledge production — employed primarily as pedagogical tools — this study argues that family trees entered the modern sciences independently of the evolutionary theories they were later made to illustrate. The trees from post-revolutionary France occasionally charted development over time, yet more often they served to visualise organic hierarchy and perfect order. In bringing this neglected history to light, Trees of Knowledge provides not only a rich account of the rise of tree thinking in the modern sciences, but also a pragmatic methodology for approaching the dynamic interplay of metaphor, visual representation, and knowledge production in the history of science.
The trees of Augier and Gallet have been covered in this blog, but that of Berton has not. I will discuss it in the next post.

Where are we, 60 years after Hennig?

Phylogenetic analysis is common in the modern study of evolutionary biology, and yet it often seems to be a poorly understood tool. Indeed, it seems to often be seen as nothing more than a tool, and one for which one does not need much expertise.

For example, we do not need to spend much time on Twitter to realize that many evolutionary biologists do not understand even the most basic things about the difference between taxa and characters. Taxa are often referred to as "primitive", particularly by people studying the so-called Origin of Life. However, taxa themselves cannot be either primitive or derived; instead, they are composed of mixtures of primitive and derived characters — they have derived characters relative to their ancestors and primitive ones compared to their descendants.

The logical relationship between common ancestors and monophyletic / paraphyletic groups is also apparently unknown to many evolutionary biologists. There is endless debate about whether the Last Universal Common Ancestor was a Bacterium or an Archaean when, of course, it cannot be either. That is, we sample contemporary organisms for analysis, which come from particular taxonomic groupings, and from these data we infer hypothetical ancestors. However, those ancestors cannot be part of the same taxonomic group as their descendants unless that taxonomic group is monophyletic.

This is all basic stuff, first expounded in the 1950s by Willi Hennig. So, why do so many people apparently still not know any of this 60 years later? I suspect that somewhere along the line the molecular geneticists got the idea that Hennig was part of Parsimony Analysis, and since they adopted Likelihood Analysis, instead, he is thus irrelevant.

However, Hennigian Logic underlies all phylogenetic analyses, of whatever mathematical ilk. All such analyses are based on the search for unique shared derived characters, which is the only basis on which we can objectively produce a rooted phylogenetic tree or network.

In the molecular world, many analysis techniques are based on analyzing the similarity of the taxa. However, similarity is only relevant if it is based on shared derived characters — if it is based on shared primitive characters then it cannot reliably detect phylogenetic history. This was Hennig's basic insight, and it is as true today as it was 60 years ago.

The confusing thing here is that most similarity among taxa will be based on both primitive and derived characters. This means that some of the analysis output reflects phylogenetic history and some does not. The further we go back in evolutionary time, the more likely it is that similarity reflects shared primitive characters rather than shared derived characters. This simple limitation seems to be poorly understood by evolutionary biologists.

Perhaps it would be a good idea if university courses in molecular evolutionary biology actually taught phylogenetics as a topic of its own, rather than as an incidental tool for studying evolution. After all, there is more to getting a scientific answer than feeding data into a computer program.

Obviously, I may be wrong in painting my picture with such a broad brush. If so, then it must be that the people I have described seem to have gathered on Twitter, like birds of a feather.

And yet, I see the same thing in the literature, as well. Consider this recent paper:
A polyploid admixed origin of beer yeasts derived from European and Asian wine populations. Justin C. Fay, Ping Liu, Giang T. Ong, Maitreya J. Dunham, Gareth A. Cromie, Eric W. Jeffery, Catherine L. Ludlow, Aimée M. Dudley. 2019. PLoS Biology 17(3): e3000147.
This seems to be quite an interesting study of a reticulate evolutionary history involving budding yeasts, from which the authors conclude that:
The four beer populations are most closely related to the Europe/wine population. However, the admixture graph also showed strong support for two episodes of gene flow into the beer lineages resulting in 40% to 42% admixture with the Asia/sake population.

However, they then undo all of their good work with this sentence:
The inferred admixture graph grouped the four beer populations together, with the lager and two ale populations being derived from the lineage leading to the Beer/baking population.
Nonsense! Neither lineage derives from the other, but instead they both derive from a common ancestor. This is like saying that I derive from the lineage leading to my younger brother, when in fact we both derive from the same parents. I doubt that the authors believe the latter idea, so why do they apparently believe the former?

That is a little test that you can all use when writing about phylogenetics. If your words don't make sense for a family history, then they don't make sense for phylogenetics either.

Posted by in history


Is racism Christian?

I was taught that racism developed out of Johannes Blumenbach’s Anthropological Treatises in the late eighteenth century, specifically his doctoral thesis On the Natural Variety…

What is R, what it was, and what it will become

Roger Peng provides a lesson on the roots of R and how it got to where it is now:

Chambers was referring to the difficulty in naming and characterizing the S system. Is it a programming language? An environment? A statistical package? Eventually, it seems they settled on “quantitative programming environment”, or in other words, “it’s all the things.” Ironically, for a statistical environment, the first two versions did not contain much in the way of specific statistical capabilities. In addition to a more full-featured statistical modeling system, versions 3 and 4 of the language added the class/methods system for programming (outlined in Chambers’ Programming with Data).

I’m starting feel my age, as some of the “history” feels more like recent experience.

You can also watch Peng’s keynote in the video version.

Tags: , ,

My father on D-Day: 75 years ago

Today is the 75th anniversary of D-Day—the day British, Canadian, and American troops landed on the beaches of Normandy.1

For us baby boomers it always meant a day of special significance for our parents. In my case, it was my father who took part in the invasions. That's him on the right as he looked in 1944. He was an RAF pilot flying rocket-firing typhoons in close support of the ground troops. His missions were limited to quick strikes and reconnaissance during the first few days of the invasion because Normandy was at the limit of their range from southern England. During the second week of the invasion (June 14th) his squadron landed in Crepon, Normandy and things became very hectic from then on with several close support missions every day [see Hawker Hurricanes and Typhoons in World War II].

I have my father's log book and here are the pages from June 1944 (below). The red letters on June 6 say "DER TAG." It was his way of announcing D-Day. On the right it says "Followed SQN across channel. Saw hundreds of ships ... jumped by 190s. LONG AWAITED 2nd FRONT IS HERE." Later that day they shot up German vehicles south-east of Caen where there was heavy fighting by British and Canadian troops. The next few weeks saw several sorties over the allied lines. These were mostly attack missions using rockets to shoot up German tanks, vehicles, and trains.

The photograph on the right shows a crew loading rockets onto a typhoon based just a few kilometers from the landing beaches in Normandy. You can see from the newspaper clipping in my father's log book that his squadron was especially interested in destroying German headquarter units and they almost got Rommel. It was another RAF squadron that wounded Rommel on July 17th.

The colorized photo on the left is my father in his Typhoon.

The log book entry (above) for June 10th says, "Wizard show. Recco area at 2000' south west of Caen F/S Moore and self destroyed 2 flak trucks, 2 arm'd trucks, and 1 arm'd command vehicle, Every vehicle left burning but one. Must have been a divisional headquarters? No casualties."

Here's another description of that rocket-firing typhoon raid [Air Power Over the Normandy Beaches and Beyond].
Intelligence information from ULTRA set up a particularly effective air strike on June 10. German message traffic had given away the location of the headquarters of Panzergruppe West on June 9, and the next evening a mixed force of forty rocket-armed Typhoons and sixty-one Mitchells from 2 TAF struck at the headquarters, located in the Chateau of La Caine, killing the unit's chief of staff and many of its personnel and destroying fully 75 percent of its communications equipment as well as numerous vehicles. At a most critical point in the Normandy battle, then, the Panzer group, which served as a vital nexus between operating armored forces, was knocked out of the command, control, and communications loop; indeed, it had to return to Paris to be reconstituted before resuming its duties a month later.

My father was awarded the Distinguished Flying Cross (DFC) for his efforts during the war.

(This article was first posted on June 6, 2014.)

1. The British landed at Sword Beach and Gold Beach, the Canadians at Juno Beach, and American troops landed at Omaha and Utah Beaches.

The role of cartography in early global explorations

For Lapham’s Quarterly, Elizabeth Della Zazzera turns back the clock to maps used for navigation, starting with the 1300s and through 1720:

From the fifteenth to the eighteenth century, European powers sent voyagers to lands farther and farther away from the continent in an expansionist period we now call the Age of Exploration. These journeys were propelled by religious fervor and fierce colonial sentiment—and an overall desire for new trade routes. They would not have been possible without the rise of modern cartography. While geographically accurate maps had existed before, the Age of Exploration saw the emergence of a sustained tradition of topographic surveying. Maps were being made specifically to guide travelers. Technology progressed quickly through the centuries, helping explorers and traders find their way to new imperial outposts—at least sometimes. On other occasions, hiccups in cartographic reasoning led their users even farther astray.

I particularly liked the part where in fourteen hundred ninety two, Columbus sailed the ocean blue — and made miscalculations because he misread the units on a map and ended up in the wrong part of the world.

Tags: , ,

A phylogenetic network outside science

I have written before about the presentation of historical information using the pictorial representation of a phylogeny (eg. Phylogenetic networks outside science; Another phylogenetic network outside science). These diagrams are often representations of the evolutionary history of human artifacts, and so a phylogeny is quite appropriate. They are of interest because:
  • they are usually hybridization networks, rather than divergent trees, because the artifact ideas involve horizontal transfer (ideas added) and recombination (ideas replaced);
  • they are often not time consistent, because ideas can leap forward in time, so that the reticulations do not connect contemporary artifacts (see Time inconsistency in evolutionary networks); and
  • they are sometimes drawn badly, in the sense that the diagram does not reflect the history in a consistent way.
The latter point often involves poor indication of the time direction (see Direction is important when showing history), or involves subdividing the network into a set of linearized trees.

One particularly noteworthy example that I have previously discussed is of the GNU/Linux Distribution Timeline, which illustrates the complex history of the computer operating system. The problems with this diagram as a phylogeny are discussed in the blog post section History of Linux distributions.

In this new post I will simply point out that there is a more acceptable diagram, showing the key Unix and Unix-like operating systems. I have reproduced a copy of it below.

Click to enlarge.

This version of the information correctly shows the history as a network, not a series of linearized trees (each with a central axis). It also draws the reticulations in an informative manner, rather than having them be merely artistic fancies.

It is good to know that phylogenetic diagrams can be drawn well, even outside biology and linguistics.