Not in the room where it happens!

My 9th great grandfather is John Banks (1619-1685) who was born in Essex (England) and moved to the colony of Connecticut in 1634. He marrried Marie Tainter (1619-1667) and settled down in Greenwich, Fairfield, Connecticut. I descend from his son John Banks (1650-1699).

My 8th great grandfather had a sister named Hannah Banks (1654-1684) and she married Daniel Burr (1660-1727). This was exciting because I knew that Daniel Burr was the grandfather of Arron Burr who became the third vice president of the United States. But that wasn't Aaron's main claim to fame because he also murdered Alexander Hamilton and that caused a musical named after the victim to erupt in New York City.

Could it be that I was a distant cousin of the man who sang songs like "The room where it happens"?

Alas, no. Daniel Burr and my distant cousin, Hannah Banks, were only married for a few years before she died. Burr then married Mary Sherwood1 but she had only two children (Eleanor and Hannah) before she died. Daniel Burr's third wife, Jane "Elizabeth" Pinkley, is the mother of Rev. Aaron Burr who is the father of the vice president and successful duelist.


1. I'm also related to Mary Sherwood.

My father on D-day

Today is the 78th anniversary of D-Day—the day British, Canadian, and American troops landed on the beaches of Normandy in World War II.1

For us baby boomers it always meant a day of special significance for our parents. In my case, it was my father who took part in the invasions. That's him on the right as he looked in 1944. He was an RAF pilot flying rocket-firing typhoons in close support of the ground troops. His missions were limited to quick strikes and reconnaissance during the first few days of the invasion because Normandy was at the limit of their range from southern England. During the second week of the invasion (June 14th) his squadron landed in Crepon, Normandy and things became very hectic from then on with several close support missions every day [see Hawker Hurricanes and Typhoons in World War II].


Read more »

Happy St. Patrick’s Day!

Happy St. Patrick's Day! These are my great-grandparents Thomas Keys Foster, born in County Tyrone on September 5, 1852 and Eliza Ann Job, born in Fintona, County Tyrone on August 18, 1852. Thomas came to Canada in 1876 to join his older brother, George, on his farm near London, Ontario, Canada. Eliza came the following year and worked on the same farm. Thomas and Eliza decided to move out west where they got married in 1882 in Winnipeg, Manitoba, Canada.

The couple obtained a land grant near Salcoats, Saskatchewan, a few miles south of Yorkton, where they build a sod house and later on a wood frame house that they named "Fairview" after a hill in Ireland overlooking the house where Eliza was born. That's where my grandmother, Ella, was born.

Other ancestors in this line came from the adjacent counties of Donegal (surname Foster) and Fermanagh (surnames Keys, Emerson, Moore) and possibly Londonderry (surname Job).

One of the cool things about studying your genealogy is that you can find connections to almost everyone. This means you can celebrate dozens of special days. In my case it was easy to find other ancestors from England, Scotland, Netherlands, Germany, France, Spain, Poland, Lithuania, Belgium, Ukraine, Russia, and the United States. Today, we will be celebrating St. Patrick's Day. It's rather hectic keeping up with all the national holidays but somebody has to keep the traditions alive!

It's nice to have an excuse to celebrate, especially when it means you can drink beer. However, I would be remiss if I didn't mention one little (tiny, actually) problem. Since my maternal grandmother is pure Irish, I should be 25% Irish but my DNA results indicate that I'm only 4% Irish. That's probalby because my Irish ancestors were Anglicans and were undoubtedly the descendants of settlers from England, Wales, and Scotland who moved to Ireland in the 1600s. This explains why they don't have very Irish-sounding names.

I don't mention this when I'm in an Irish pub.


A recent thesis about Trees of Knowledge


Recently, Petter Hellström successfully defended his doctoral thesis:
Trees of Knowledge: Science and the Shape of Genealogy
Department of the History of Science and Ideas
Uppsala University, Sweden
The thesis itself is obviously of great interest to readers of this blog. It is not currently online, but you can obtain a printed or electronic copy by contacting:


Here is the abstract:
This study investigates early employments of family trees in the modern sciences, in order to historicise their iconic status and now established uses, notably in evolutionary biology and linguistics. Moving beyond disciplinary accounts to consider the wider cultural background, it examines how early uses within the sciences transformed family trees as a format of visual representation, as well as the meanings invested in them.
Historical writing about trees in the modern sciences is heavily tilted towards evolutionary biology, especially the iconic diagrams associated with Darwinism. Trees of Knowledge shifts the focus to France in the wake of the Revolution, when family trees were first put to use in a number of disparate academic fields. Through three case studies drawn from across the disciplines, it investigates the simultaneous appearance of trees in natural history, language studies, and music theory. Augustin Augier’s tree of plant families, Félix Gallet’s family tree of dead and living languages, and Henri Montan Berton’s family tree of chords served diverse ends, yet all exploited the familiar shape of genealogy.
While outlining how genealogical trees once constituted a more general resource in scholarly knowledge production — employed primarily as pedagogical tools — this study argues that family trees entered the modern sciences independently of the evolutionary theories they were later made to illustrate. The trees from post-revolutionary France occasionally charted development over time, yet more often they served to visualise organic hierarchy and perfect order. In bringing this neglected history to light, Trees of Knowledge provides not only a rich account of the rise of tree thinking in the modern sciences, but also a pragmatic methodology for approaching the dynamic interplay of metaphor, visual representation, and knowledge production in the history of science.
The trees of Augier and Gallet have been covered in this blog, but that of Berton has not. I will discuss it in the next post.

My father on D-Day: 75 years ago

Today is the 75th anniversary of D-Day—the day British, Canadian, and American troops landed on the beaches of Normandy.1

For us baby boomers it always meant a day of special significance for our parents. In my case, it was my father who took part in the invasions. That's him on the right as he looked in 1944. He was an RAF pilot flying rocket-firing typhoons in close support of the ground troops. His missions were limited to quick strikes and reconnaissance during the first few days of the invasion because Normandy was at the limit of their range from southern England. During the second week of the invasion (June 14th) his squadron landed in Crepon, Normandy and things became very hectic from then on with several close support missions every day [see Hawker Hurricanes and Typhoons in World War II].


I have my father's log book and here are the pages from June 1944 (below). The red letters on June 6 say "DER TAG." It was his way of announcing D-Day. On the right it says "Followed SQN across channel. Saw hundreds of ships ... jumped by 190s. LONG AWAITED 2nd FRONT IS HERE." Later that day they shot up German vehicles south-east of Caen where there was heavy fighting by British and Canadian troops. The next few weeks saw several sorties over the allied lines. These were mostly attack missions using rockets to shoot up German tanks, vehicles, and trains.


The photograph on the right shows a crew loading rockets onto a typhoon based just a few kilometers from the landing beaches in Normandy. You can see from the newspaper clipping in my father's log book that his squadron was especially interested in destroying German headquarter units and they almost got Rommel. It was another RAF squadron that wounded Rommel on July 17th.

The colorized photo on the left is my father in his Typhoon.

The log book entry (above) for June 10th says, "Wizard show. Recco area at 2000' south west of Caen F/S Moore and self destroyed 2 flak trucks, 2 arm'd trucks, and 1 arm'd command vehicle, Every vehicle left burning but one. Must have been a divisional headquarters? No casualties."

Here's another description of that rocket-firing typhoon raid [Air Power Over the Normandy Beaches and Beyond].
Intelligence information from ULTRA set up a particularly effective air strike on June 10. German message traffic had given away the location of the headquarters of Panzergruppe West on June 9, and the next evening a mixed force of forty rocket-armed Typhoons and sixty-one Mitchells from 2 TAF struck at the headquarters, located in the Chateau of La Caine, killing the unit's chief of staff and many of its personnel and destroying fully 75 percent of its communications equipment as well as numerous vehicles. At a most critical point in the Normandy battle, then, the Panzer group, which served as a vital nexus between operating armored forces, was knocked out of the command, control, and communications loop; indeed, it had to return to Paris to be reconstituted before resuming its duties a month later.

My father was awarded the Distinguished Flying Cross (DFC) for his efforts during the war.

(This article was first posted on June 6, 2014.)


1. The British landed at Sword Beach and Gold Beach, the Canadians at Juno Beach, and American troops landed at Omaha and Utah Beaches.

A phylogenetic network outside science


I have written before about the presentation of historical information using the pictorial representation of a phylogeny (eg. Phylogenetic networks outside science; Another phylogenetic network outside science). These diagrams are often representations of the evolutionary history of human artifacts, and so a phylogeny is quite appropriate. They are of interest because:
  • they are usually hybridization networks, rather than divergent trees, because the artifact ideas involve horizontal transfer (ideas added) and recombination (ideas replaced);
  • they are often not time consistent, because ideas can leap forward in time, so that the reticulations do not connect contemporary artifacts (see Time inconsistency in evolutionary networks); and
  • they are sometimes drawn badly, in the sense that the diagram does not reflect the history in a consistent way.
The latter point often involves poor indication of the time direction (see Direction is important when showing history), or involves subdividing the network into a set of linearized trees.

One particularly noteworthy example that I have previously discussed is of the GNU/Linux Distribution Timeline, which illustrates the complex history of the computer operating system. The problems with this diagram as a phylogeny are discussed in the blog post section History of Linux distributions.

In this new post I will simply point out that there is a more acceptable diagram, showing the key Unix and Unix-like operating systems. I have reproduced a copy of it below.

Click to enlarge.

This version of the information correctly shows the history as a network, not a series of linearized trees (each with a central axis). It also draws the reticulations in an informative manner, rather than having them be merely artistic fancies.

It is good to know that phylogenetic diagrams can be drawn well, even outside biology and linguistics.

What happens when twins get their DNA tested?

The Canadian Broadcastng Company (CBC) has a TV show called Marketplace that promotes itself as an advocate of consumers' rights. It has a history of testing the claims of advertisers and usually shows that these claims are misleading or false. Here's what they say on their website.
On air since 1972, Marketplace is Canada’s consumer watchdog. We get the goods to help you shop smarter and protect yourself from slick scams and misleading marketing claims. We investigate the products and services we all use every day and push companies and government for answers. And we expose the truth on stories that matter to you and your family.
Last spring, one of the hosts of the show, Charlise Agro, decided to get her DNA tested by five of the leading ancestry sites to see how accurate they were at predicting where her ancestors came from. The twist in this story is that she has an identical twin sister, Carley Agro, who submitted her DNA to the same five companies. The results were widely reported on the Canadian National News and in social media and the dominant theme was that the results were very different for the two sisters calling into question the claims of companies like Ancestry.com, 23andMe, MyHeritage, Living Your Ancestry, and Family Tree. The on-air show was also pretty negative about the ancestry results and the fact that the twins had different results [Twins get some 'mystifying' results when they put 5 ancestry DNA kits to the test]. Let's see whether the negative press was justified

The results from the two most popular DNA testing companies, Ancestry.com and 23andMe, were pretty accurate so I'm going to ignore the other companies. I'll concentrate on explaining the Ancestry.com results since I know more about that service and I've posted a couple of articles showing that my results were quite accurate at predicting where my ancestors came from and who I'm related to [On the accuracy of Ancestry.com DNA predictions] [My DNA story].

Here are the twins' results from Ancestry.com and 23andMe.



There are two issues here and I'll discuss them separately: (1) why are the tests not identical for identical twins, and (2) how accurate is the estimate of where their ancestors come from?

Why are the results not identical?

Adult identical twins do not have identical DNA sequences in every cell of their bodies. That's the first myth we have to dispel. Yes, it's true that they come from the same zygote so their DNA should be very similar but it won't be identical because of mutations that occurred subsequent to the splitting of the early embryo. There's some controversy over the somatic cell mutation rate with some workers arguing that it's 3-10× higher than the germ line mutation rate of 0.5 mutations per cell division but let's just use the much more reliable germ line mutation rate to see how different the twins DNA could be when it's extracted from adult epithelial cells (e.g. cheek cells from inside your mouth) [Somatic cell mutation rate in humans].

Epithelial cells divide fairly rapidly but I don't know how many cell divisions have occurred from zygote to cheek cells of an adult. Let's guess that it's 1000 cell divisions—that means 500 mutations in each twin so their DNA will differ at 1000 sites.1 That may not seem like much in a genome of 6.4 billion base pairs but keep in mind that the DNA testing companies are looking at 700,000 bp covering most of the hotspots where the mutation rates are higher than normal. Chances are pretty good that they'll detect a few of these differences so the twins DNA results will not be identical because of somatic cell mutations.

But that's probably not the main source of the difference between the twins' DNA results. The main problem by far is due to the way the tests are done which is by hybridizing the customers' DNA to DNA on a microchip and reading the chip to see if there's a match. (Ancestry.com uses the latest Illumina microchip that assays 700,000 SNPs.) I think the rate of false positives is quite low but the rate of false negatives is about 2% according to 23andMe [Ancestry]. The absence of a match where there should be one can be due to bad luck and differences in the threshold level of binding that constitutes a "hit." It's these "no-reads" that makes up most of the false negatives. Because of these limitations of the assay the twins' DNA results could differ by 2-4% of the SNPs being tested.

Charlise Agro visited the lab of Mark Gerstein at Yale University to see if he could explain why the sisters' DNA was not identical. Gerstein says the following on the video ....
I have to say that one really shocked us. I mean, we expected two identical twins to have the exact same ancestry, and they should. So the fact that they present different results between you and your sister I find very mystifying ...
His group looked at the raw data and found that the DNA from the twins was between 98.4% and 99.7% in agreement—a result they report as statistically identical. In the case of Ancestry.com, for example, the company looked at 664,429 sites and 656,197 (98.8%) were identical. Nevertheless, there were still more than 8,000 sites that differed between the two twins. (This is probably due to no-reads on one or the other of the twins' microchips and it's a lower frequency that I estimated above.) Gerstein's group doesn't explain why identical twins' DNA wouldn't be identical and that's a missed opportunity to educate the public on the accuracy of these tests.

Are the ancestry predictions accurate?

In order to predict where your ancestors came from, the testing company needs to compare your haplotypes to a large database of people from different parts of the world. If you have a particular haplotype, say XYZ, and people from Italy have a high frequency of the XYZ haplotype then chances are good that you have Italian ancestors. The accuracy of this prediction depends to a large extent on the size of of the database and that's why the results from Ancestry.com and 23andMe are bound to be more accurate than the predictions from smaller companies.

There are many ways of parsing the haplotype data to divide it into geographical regions and there are different ways of labeling those regions. The clustering algorithms are constantly being improved as more and more data comes in and this is why Ancestry.com recently revised its ancestry predictions but it's not surprising that two different companies would give slightly different predictions and that's why you see different percentages of Italian and Eastern European ancestry when comparing Ancestry.com and 23andMe. The companies agree that almost two thirds of the twins' DNA comes from ancestors who lived in Italy and Eastern Europe but they apportion those predictions differently. I suspect this is largely due to differences in clustering and labeling; for example, if a haplotype is common in the Trieste region of Northern Italy do you include it in "Italian" or "Eastern European"? The fact that the percentages are different for each twin is probably due to differences in how the algorithms handled the slight differences in the microchip data due to false negatives.

We aren't told very much about the ancestors of the Agro twins beyond the fact that some of them, presumably on their father's side, are from Sicily and some are Polish/Ukranian. It's too bad that they didn't report more about their genealogy so they could confirm that the DNA results were accurate.

I conclude that for Ancestry.com and 23andMe the results are consistent with identical twins given that their DNA is not identical and that the assay has an associated error rate. I conclude that the ancestry predictions are probably fairly accurate given the current state of the databases and the quality of the clustering algorithms although I didn't expect such a big difference between Ancestry.com and 23andMe. Nevertheless, it's clear that the Agro twins' immediate ancestors are from Italy/Sicily (father) and Eastern Europe (mother) and that fits with what they said in the show.


1. This will depend on the age of the sisters but if you think I'm going to guess their age then you must think I'm crazy.

On the accuracy of Ancestry.com DNA predictions

I'm very impressed with the DNA test administered by Ancestry.com. They report that I have over 600 fourth cousins or closer but I have confirmed some even more distant relationships. See below for the most distant relationship that the DNA tests reveal.

In the vast majority of cases the people who share DNA markers with me have no family tree that's on Ancestry.com so it's impossible to say for sure whether we are related. There are often clues based on who else shares our haplotypes but unless the person reveals their name and some of their ancestors that's all I can do. I usually contact those people who could hep me sort out some unknown relationships but I rarely get a reply.

The ones who already have a tree are much easier because then I have a list of names I can search and, furthermore, they are much more likely to correspond. We've found some of our common ancestors by comparing notes but in many cases we can't make the connection because one of us is missing some key links. The most interesting discovery was a 3rd cousin relationship from someone whose parent had been adopted but who didn't know their biological parents.

Some people have detailed family trees so it's possible to trace the exact connection and the results confirm the DNA prediction. There are no false positives so far but that's not surprising since it would be very difficult to prove that two putative DNA matches are NOT related. Each of the subjects would have to have a very detailed and accurate family tree extending back more than 7 generations and it would have to include all ancestors plus all siblings and their descendants.

The most distant connection so far is a seventh cousin once removed. We are direct descendants of two people who lived more than three hundred years ago! Daniel Robbins/Robinson (1627-1714) was born in Blair Atholl, Perthshire, Scotland. He was a Scottish soldier fighting on the side of King Charles I in the English Civil War when he was captured and sent to the colonies (Connecticut) as an indentured servant. There he met and married Hope Potter (1641-1687) and they moved to Woodbridge, New Jersey when he had served his time.


A seventh cousin and I share a French Canadian and a Dutch ancestor from the former colony in New York—Peter Montras (1715-1790) and Emmetje Anderzon (1717-1790). Peter was born in Phillipsburgh (Tarrytown), New York and he was baptized in the Dutch Reform church in Sleepy Hollow [Sleepy Hollow]. His parents are buried in the sleepy hollow cemetery (see photo). Emmetje Anderzon was born in New York (formerly New Amsterdam).


There are lots of fifth cousins. Here's an example of a connection to the Fraser clan in Scotland. John Fraser (1754-1807) was born in Inverness, Scotland just eight years after the Battle of Culloden. His wife Isabella Mackay (1762-1856) was also born in Inverness. They emigrated to Canada in 1803 with their five children.


So far I've established 25 direct connections to my DNA relatives and there are another 25 or so where we know our common ancestor but not the details. There are three problem groups in my tree but, unfortunately, the DNA results haven't helped sort them out (yet).


My DNA story

This is the latest update from Ancestry.com. Their algorithms are getting better and better. This corresponds very closely to what I know of my ancestors.



The pedigree of grape varieties


We are all familiar with the concept of a family tree (formally called a pedigree). People have been compiling them for at least a thousand years, as the first known illustration is from c.1000 CE (see the post on The first royal pedigree). However, these are not really tree-like, in spite of their name, unless we exclude most of the ancestors from the diagram. After all, family histories consist of males and females inter-breeding in a network of relationships, and this cannot be represented as a simple tree-like diagram without leaving out most of the people. I have written blog posts about quite a few famous people who have really quite complex and non-tree-like family histories (including Cleopatra, Tutankhamun, Charles II of Spain, Charles Darwin, Henri Toulouse-Lautrec, and Albert Einstein).

A history of disease within an Amish community

Clearly, the history of domesticated organisms is even more complex than that of humans. After all, in most cases we have gone to a great deal of trouble to make these histories complex, by deliberately cross-breeding current varieties (of plants) and breeds (of animals) to make new ones. So, I have previously raised the question: Are phylogenetic trees useful for domesticated organisms? The answer is the same: no, unless you leave out most of the ancestry.

In most cases, we have no recorded history for domesticated organisms, because most of the breeding and propagating was undocumented. Until recently, it was effectively impossible to reconstruct the pedigrees. This has changed with modern access to genetic information; and there is now quite a cottage industry within biology, trying to work out how we got our current varieties of cats, dogs, cows and horses, as well as wheat, rye and grapes, etc. I have previously looked at some of these histories, including Complex hybridizations in wheat, and Complex hybridizations in barley and its relatives.

Grapes

One example of particular interest has been grape varieties. I have discussed some of the issues in a previous post: Grape genealogies are networks, not trees, including the effects of unsampled ancestors when trying to perform the reconstruction.

There are a number of places around the web where you can see heavily edited summaries of what is currently known about the grape pedigree. However, these simplifications defeat the purpose of this blog post, which is to emphasize the historical complexity. The only diagram that I know of that shows you the full network (as currently known) is one provided by Pop Chart (The Genealogy of Wine), a commercial group who provide infographic posters for just about anything. They will sell you a full-sized poster of the pedigree (3' by 2'), but here I have provided a simple overview (which you can click on to see somewhat larger).

Grape variety genealogy from Pop Chart

You can actually zoom in on the diagram on the Pop Chart web page to see all of the details. This allows you to spend a few happy hours finding your favorite varieties, and to see how they are related. You will presumably get lost among the maze of lines, as I did.