There are several ways to report a mutation rate. You can state it as the number of mutations per base pair per year in which case a typical mutation rate for humans is about 5 × 10-10. Or you can express it as the number of mutations per base pair per generation (~1.5 × 10-8).You can use the number of mutations per generation or per year if you are only discussing one species. In humans, for example, you can describe the mutation rate as 100 mutations per generation and just assume that everyone knows the number of base pairs (6.4 × 109).
The intrinsic mutation rate depends on the error rate of DNA replication. We don't know the exact value of this error rate but it's pretty close to 10-10 per base pair when you take repair into account [Estimating the Human Mutation Rate: Biochemical Method]. For single-cell species you simply multiply this number by the number of base pairs in the genome to get a good estimate of the mutation rate per generation.
The calculation for multicellular species is much more complicated because you have to know the number of cell divisions between zygote and mature germ cells. In some cases it's impossible to know this number (e.g.flowering plants, yeast). In other cases we have a pretty good estimate: for example, in humans there are about 400 cell divisions between zygote and mature sperm and about 30 cell divisions between zygote and mature egg cells. The number of cell divisions depends on the age of the parent, especially in males [Parental age and the human mutation rate]. This effect is significant—older parents pass on twice as many mutations as the youngest parents.
The parental age effect is comparable to the extremes in estimations of the human mutation rate based on different ways of measuring it [Human mutation rates - what's the right number?] [Human mutation rates ]. Those values range from about 60 mutations per generation to about 160 mutations per generation.
Thus, in the case of humans, we're dealing with estimates that differ by a factor of two depending on method and parental age.
Let's assume that each child is born with 100 new mutations. This seems like a reasonable number. It's on the high end of direct counts by sequencing parents and siblings but there are reasons to believe these counts are underestimated (Scally, 2016). On the other hand, this value (100 mutations) is on the low end of the estimates using the biochemical method and the phylogenetic method.
Most of these mutations occur in the father but some were contributed by the mother. Since the child is diploid, we calculate the mutation rate per bp as: 100 ÷ 6.4 × 109 = 1.56 × 10-8 per base pair per generation. Assuming an average generation time of 30 years, this gives 1.56 × 10-8 ÷ 30 = 5.2 × 10-10 mutations per bp per year. That's the value given above (rounded to 5 × 10-10). Scally (2016) uses this same value except he assumes a generation time of 29 years.
There are many who think this value is considerably lower than previous estimates and this casts doubt on the traditional times of divergence chimps and human and the other great apes. For example, Scally (2016) says that prior to the availability of direct sequencing date the "consensus value" was 10 × 10-10 per bp per year.1 That's twice the value he prefers today. It works out to 186 mutations per generation!
I think it's been a long time since workers in the field assumed such a high mutation rate but let's assume he is correct and current estimates are considerably lower than those from twenty years ago.
You can calculate a time of divergence (t) between any two species if you know the genetic distance (d) between them measured in base pairs and the mutation rate (μ) in mutations per year.2 The genetic distance can be estimated by comparing genome sequences and counting the differences. It represents the number of mutations that have become fixed in the two lineages since they shared a common ancestor. Haploid reference genome sequences are sufficient for this estimate.
The mutation rate (μ) is 100 mutations per generation divided by 30 years = 3.3 mutations per year.
The time of divergence is then calculated by dividing half that distance (in nucleotides) by the mutation rate (t = d/2 ÷ μ). (There are all kinds of "corrections" that can be applied to these values but let's ignore them for now and see what the crude data says.)
Human and chimp genomes differ by about 1.4%, which corresponds to 44.8 million nucleotide differences and d/2 = 22.4 million. Using 100 mutations per generation as the mutation rate means 5 × 10-10 per bp per year. From t = d/2 ÷ μ we get t = 6.8 million years.
This is a reasonable number. It's consistent with the known fossil record and it's in line with the current views of a divergence time for chimps and humans.
However, there are reasons to believe that some of the assumptions in this calculation are wrong. For example, the average generation time is probably not 30 years in both lineages over the last few million years. It's probably shorter, at least in the chimp lineage where the current generation time is 25 years. Using a generation time of 25 years gives a divergence time of 5.6 million years.
In addition, the overall differences between the human and chimp genomes may be only 1.2% instead of 1.4% (see Moorjani et al., 2016). If you combine this value with the shorter generation time, you get 4.25 million years for the time of divergence.
Given the imprecision of the mutation rate, the question of real generation time, and problems in estimating the overall difference between humans and chimps, we can't know for certain what time of divergence is predicted by a molecular clock. On the other hand, the range of values (e.g. 4.25 - 6.8 million years) isn't cause for great concern.
So, what's the problem? The problem is that applying the human mutation rate (100 mutations per generation) to more distantly related species gives strange results. For example. Scally (2016) uses this mutation rate and a difference of 2.6% to estimate the time of divergence of humans and orangutans. The calculation yields a value of 26 million years. This is far too old according to the fossil record.
Several recent papers have addressed this issue (Scally, 2016; Moorjani et al., 2016a; Moorjani et al., 2016b). Most of the problem is solved by assuming a much higher mutation rate in the past. The biggest effect is the generation time in years. It may have been as low as 15 years for much of the past ten million years. Many of the problems go away when you adjust for this effect.
What puzzles me is the approach taken by Moorjani et al. in their two recent papers. They say that the "new" mutation rate is 5 × 10-10 per bp per year. That's exactly the value I use above. It's roughly 100 new mutations per child (per generation). Moorjani et al. (2016a) think this value is surprisingly low because it leads to a surprising result. They explain it in a section titled "The Puzzle."
They assume that the human and chimp genomes differ by 1.2%. That works out to 38 million mutations over the entire genome. This is 19 million fixed mutated alleles in each lineage if the mutation rate in both lineages is equal and constant.
If the mutation rate is 5 × 10-10 per bp per year then for a haploid genome this is 1.6 mutations per year. Dividing 19 million by 1.6 gives 11.9 million years (rounded to 12 million) for the time of divergence. This is the value quoted by the authors.
Taken at face value, this mutation rate suggests that African and non-African populations split over 100,000 years and a human-chimpanzee divergence time of 12 million years ago (Mya) (for a human–chimpanzee average nucleotide divergence of 1.2% at putatively neutral sites). These estimates are older than previously believed, but not necessarily at odds with the existing—and very limited—paleontological evidence for Homininae. More clearly problematic are the divergence times that are obtained for humans and orangutans or humans and OWMs [Old World Monkeys]. As an illustration, using whole genome divergence estimates for putatively neutral sites suggests a human–orangutan divergence time of 31 Mya and human–OWM divergence time of 62 Mya. These estimates are implausibly old, implying a human-oraguntan divergence well into the Oligocene and OWM-hominoid divergence well into or beyond the Eocene. Thus, the yearly mutation rates obtained from pedigrees seem to suggest dates that are too ancient to be readily reconciled with the current understanding of the fossil record. Here's the problem. If the mutation rate is 100 mutations per generation then this applies to DIPLOID genomes. Some of the mutations are contribute by the mother and some (more) by the father. If you apply this rate to a DIPLOID genome then the number of mutations per year is 3.1 (100/30 years). Or,
5 × 10-10 per bp per year × 6.4 × 109 bp (diploid) = 3.2 mutations per year
Dividing 19 million mutations by 3.2 give a time of divergence of 5.9 million years. This is a reasonable number but it's half the value calculated by Moorjani et al. (2016a).
They also calculate a value of 12.1 million years for the human-chimp divergence in their second paper (and 15.1 million years for the divergence of humans and gorillas) (Moorjani et al., 2016b).
I think their calculations are wrong because they used the haploid genome size rather than the diploid genome where the mutations are accumulating. Both these papers appear in good journals and both were peer-reviewed. Furthermore, the senior author, Molly Przeworski, is a Professor at Columbia University (New York, NY, USA) and she's an expert in this field.
What am I doing wrong? Is it true that a mutation rate of ~100 mutations per generation means that human and chimpanzees must have been separated for 12 million years as Moorjani et al. say? Or is the real value 5.9 million years as I've calculated above?
Image Credit: The chromosome image is from Wikipedia: Creative Commons Attribution 2.0 Generic license. The chimp photo is also from Wikipedia.
1. Scally takes this value from Nachman and Crowell (2000) who claim that the mutation rate is ~2.5 × 1008 mutations per bp in humans. This works out to 160 mutations per generation and an overall mutation rate of 8 × 10-10 based on a generation time of 30 years, not 10 × 10-10 as Scally states.
2. This assumes that all mutations are neutral. The rate of fixation of neutral alleles over time is equal to the mutation rate. Since 8% of the genome is under selection, it's not true that all mutations are neutral but to a first approximation it's not far off.
Moorjani, P., Gao, Z., and Przeworski, M. (2016) Human germline mutation and the erratic evolutionary clock. PLoS Biology, 14:e2000744. [doi: 10.1371/journal.pbio.2000744]
Moorjani, P., Amorim, C.E.G., Arndt, P.F., and Przeworski, M. (2016b) Variation in the molecular clock of primates. Proc. Nat. Acad. Sci. (USA) 113:10607-10612. [doi: 10.1073/pnas.1600374113 ]
Nachman, M.W., and Crowell, S.L. (2000) Estimate of the mutation rate per nucleotide in humans. Genetics, 156:297-304. [PDF]
Scally, A. (2016) The mutation rate in human evolution and demographic inference. Current opinion in genetics & development, 41:36-43. [doi: 10.1016/j.gde.2016.07.008]
Most mitochondrial genes have been transferred from the ancestral mitochondrial genome to the nuclear genome over the course of 1-2 billion years of evollution. They are no longer present in mitochondria but they are easily recognized because they resemble α-proteobacterial sequences more than the other nuclear genes [see Endosymbiotic Theory].This process of incorporating mitochondrial DNA into the nuclear genome continues to this day. The latest human reference genome has about 600 examples of nuclear sequences of mitochondrial origin (= numts). Some of them are quite recent while others date back almost 70 million years—the limit of resolution for junk DNA [see Mitochondria are invading your genome!].
Estimating the number of numts isn't as easy as you might imagine. There are two main problems according to Hazkani-Covo and Martin (2017).
The authors examined 36 genomes for the presence of mitochondrial DNA. They looked at each potential event separately to verify that it was a genuine numt. They also looked for nupts—plastid DNA—in 24 genomes.
- Simple BLAST searches using mitochondrial sequences against the nuclear genome may overestimate the number of insertion events. That's because the hits need to be concatenated to see the extent of the insertion. You also need to take into account subsequent events, such as the insertion of a transposon into the mitochondrial fragment, that makes a single insertion event look like two independent events in genomic analyses.
- The number of numts may be underestimated because mitochondrial sequences are usually thought to be contaminants and they are removed from the genome sequence. There are several documented cases.
The results vary from a low of 7 numts to 6550 numts depending on the size of the genome. The best estimates for humans is 592, which is pretty much in line with earlier results. The number of nupts in plants and algae is about the same.
Image Credit: Moran, L.A., Horton, H.R., Scrimgeour, K.G., and Perry, M.D. (2012) Principles of Biochemistry 5th ed., Pearson Education Inc. page 175 [Pearson: Principles of Biochemistry 5/E]
Hazkani-Covo, E., and Martin, W.F. (2017) Quantifying the number of independent organelle DNA insertions in genome evolution and human health. Genome Biology and Evolution, evx078. [doi: 10.1093/gbe/evx078]
Lateral gene transfer (LGT), or horizontal gene transfer (HGT), is widespread in bacteria. It leads to the creation of pangenomes for many bacterial species where different subpopulations contain different subsets of genes that have been incorporated from other species. It also leads to confusing phylogenetic trees such that the history of bacterial evolution looks more like a web of life than a tree [The Web of Life]. Bacterial-like genes are also found in eukaryotes. Many of them are related to genes found in the ancestors of modern mitochondria and chloroplasts and their presence is easily explained by transfer from the organelle to the nucleus. Eukaryotic genomes also contain examples of transposons that have been acquired from bacteria. That's also easy to understand because we know how transposons jump between species.
The literature on eukaryotic genomes is full of additional claims of LGT from bacteria (and other eukaryotes) but many of those have subsequently been attributed to contamination of genomic DNA [see Contaminated genome sequences]. Nevertheless, it's commonly accepted that lateral gene transfer from bacteria to eukaryotes is real and each new eukaryotic genome has several hundred genes acquired from bacteria. It usually accounts for about 1% of the genome. For example, even after extensive analysis of tardigrade genome sequences, there's still somewhere between 1% and 2% HGT/LGT (Yoshida et al., 2017).
An extensive analysis of the finished human genome sequence still suggested that there were 145 genes derived from LGT (Crisp et al., 2015). Those same authors claim to have detected a low level of LGT/HGT in dozens of other eukaryotic species. Here's what they say in their abstract ...
We have taken advantage of the recent availability of a sufficient number of high-quality genomes and associated transcriptomes to carry out a detailed examination of HGT in 26 animal species (10 primates, 12 flies and four nematodes) and a simplified analysis in a further 14 vertebrates. Genome-wide comparative and phylogenetic analyses show that HGT in animals typically gives rise to tens or hundreds of active ‘foreign’ genes, largely concerned with metabolism. Our analyses suggest that while fruit flies and nematodes have continued to acquire foreign genes throughout their evolution, humans and other primates have gained relatively few since their common ancestor. We also resolve the controversy surrounding previous evidence of HGT in humans and provide at least 33 new examples of horizontally acquired genes.That result was challenged by Salzberg (2017) who presented convincing evidence that many of the LGT claims were due to contamination, or they are mitochondrial genes, or they did not meet the minimal standards for LGT claims. He says,
In this study, I re-examined the claims of Crisp et al.  focusing on the human genes. Instead of using a large-scale, automated analysis, which by its very nature could enrich the results for artifactual findings, I looked at each human gene individually to determine whether the evidence is sufficient to support the conclusion that HGT occurred. An important principal here is that extraordinary claims require extraordinary evidence: there is no doubt that the vast majority of human genes owe their presence in the human genome to the normal process of inheritance by vertical descent. Thus, if other, more mundane processes can explain the alignments of a human gene sequence, these explanations are far more likely than HGT.Bill martin is also skeptical. He also claims that even a low level of LGT in eukaryotes is too much. He claims there's no solid evidence to support those claims and they persist because researchers are not thinking critically about their results and the consequences (Martin, 2017). He says,
Claims for LGT among eukaryotes essentially did not exist before we had genomes because, in contrast to prokaryotes, there are no characters known among eukaryotes that require LGT in order to explain their distribution, except perhaps the spread of plastids via secondary symbiosis. Today, claims for eukaryote LGT are common in the literature, so common that students or nonspecialists might get the impression that there is no difference between prokaryotic and eukaryotic genetics. The time has come where we need to ask whether the many claims for eukaryote LGT – prokaryote to eukaryote LGT and eukaryote to eukaryote LGT – are true.There are several problems with these claims according to Bill Martin. First, the pattern of LGT doesn't conform to what we see in bacteria where entire clades have inherited genes transferred from bacteria. Most of the claims of LGT are confined to a single species. Second, there's no reasonable mechanism for LGT as there is in bacteria.
The reality checks are simple. If the claims are true, then we need to see evidence in eukaryotic genomes for the cumulative effects of LGT over time, as we see with pangenomes in prokaryotes, and as we see with sequence divergence. That is, the number of genes acquired by LGT needs to increase in eukaryotic lineages as a function of time. We also need to see evidence for genetic mechanisms that could spread genes across eukaryote species (and order, and phylum) boundaries, as we see in prokaryotes. If we do not see the cumulative effects, and if there are no tangible genetic mechanisms, then we have to openly ask why, and entertain the possibility that the claims might not be true. Could it be that eukaryote LGT does not really exist to any significant extent in nature, but is an artefact produced by genome analysis pipelines?This is not a popular view. That's not surprising coming from Bill Martin because he often challenges the current dogmas. He raises an issue that's more important than the presence of LGT in eukaryotes and that's the tendency of today's scientists to adopt a consensus view without thinking critically.
Why should I care about eukaryote LGT anyway? Is not the practical solution to just believe what everyone else does and “get with the programme” as a prominent eukaryote LGT proponent recently recommended that I do (Dan Graur is my witness). At eukaryote genome meetings, where folks pride themselves on the amounts and kinds of LGT they are finding in a particular eukaryote genome (not in all genomes), I feel like Winston Smith in Orwell's novel 1984, listening to an invented truth recited by members of the Inner Party. My mentors taught me that students of the natural sciences are not obliged to get with anyone's program, instead we are supposed to think independently and always to critically inspect, and re-inspect, current premises. Doing "get with the program" science in herds can produce curious effects. For example, the well-managed ENCODE project that ascribed a function to 80% of the human genome was a textbook case of everyone "getting with the program," and everyone, however, also missing the point, obvious to evolutionary biologists, that the headline result of 80% function cannot be true.
Image Credit:Scientific American, Doolittle, W. (2000) Uprooting the Tree of Life. Scientific American, February 2000.
Crisp, A., Boschetti, C., Perry, M., Tunnacliffe, A., and Micklem, G. (2015) Expression of multiple horizontally acquired genes is a hallmark of both vertebrate and invertebrate genomes. Genome biology, 16:50. [doi: 10.1186/s13059-015-0607-3]
Martin, W.F. (2017) Too Much Eukaryote LGT. BioEssays, 1700115. [doi: 10.1002/bies.201700115]
Salzberg, S.L. (2017) Horizontal gene transfer is not a hallmark of the human genome. Genome biology, 18:85. [doi: 10.1186/s13059-017-1214-2]
Yoshida, Y., Koutsovoulos, G., Laetsch, D.R., Stevens, L., Kumar, S., Horikawa, D.D., Ishino, K., Komine, S., Kunieda, T., and Tomita, M. (2017) Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus. PLoS Biology, 15:e2002266. [doi: 10.1371/journal.pbio.2002266]
This is the second post discussing creationist1 papers on pseudogenes. The first post addressed a paper by Jeffrey Tomkins on the β-globin pseudogene [Creationists questioning pseudogenes: the beta-globin pseudogene]. This post covers another paper by Tomkins claiming that the GULO pseudogenes in various primate species are not derived from a common ancestor but instead have been deactivated independently in each lineage.The Tomkins' article was published in 2014 in Answers Research Journal, a publication that describes itself like this:
ARJ is a professional, peer-reviewed technical journal for the publication of interdisciplinary scientific and other relevant research from the perspective of the recent Creation and the global Flood within a biblical framework.Tomkins explains two fundamental axioms of Young Earth Creationism.
The idea here is that the loss of a gene for synthesizing vitamin C (GULO gene)2 is consistent with the YEC view of increasing loss and degradation of the genome. Such degradation must occur within species since the YEC model doesn't allow for shared ancestry. The main question Tomkins addresses is whether the pattern of GULO pseudogenes in various species is consistent with gene loss in an ancestral species and subsequent inheritance of a pseudogene in different lineages or whether the pattern is consistent with separate and independent loss in related species.
- "An emerging theme from the continuing progression of genomics research across the spectrum of eukaryotic life is the widespread decay of pathways for vitamin-synthesis (Helliwell, Wheeler, and Smith 2013). This paradigm is of great importance to the creationist model of genetic entropy which postulates that genomes are in a continual state of degradation over time, not forward progressing evolution (Sanford 2010)."
- "Another important component of the creationist model of origins is the idea of molecular discontinuity between unrelated taxon (Tomkins and Bergman 2013). As will be demonstrated in this report, the enigma of the GULO pseudogene analyzed in the light of new genomic evidence most closely aligns with a creationist model incorporating both of these paradigms."
As you might have guessed, Tomkins argues that that the pattern is inconsistent with common ancestry and lends support to Young Earth Creationism. Here's the article ...
The Human GULO Pseudogene—Evidence for Evolutionary Discontinuity and Genetic Entropy
Jeffrey P. Tomkins, Institute for Creation research, Dallas, TX, USA
Answers in Genesis
Abstract: Modern genomics provides the ability to screen the DNA of a wide variety of organisms to scrutinize broken metabolic pathways. This wealth of data has revealed wide-spread genetic entropy in human and other genomes. Loss of the vitamin C pathway due to deletions in the GULO (L-gulonolactone oxidase) gene has been detected in humans, apes, guinea pigs, bats, mice, rats, pigs, and passerine birds. Contrary to the popularized claims of some evolutionists and neo-creationists, patterns of GULO degradation are taxonomically restricted and fail to support macroevolution. Current research and data reported here show that multiple GULO exon losses in human, chimpanzee, and gorilla occurred independently in each taxon and are associated with regions containing a wide variety of transposable element fragments. Thus, they are another example of sequence deletions occurring via unequal recombination associated with transposable element repeats. The 28,800 base human GULO region is only 84% and 87% identical compared to chimpanzee and gorilla, respectively. The 13,000 bases preceding the human GULO gene, which corresponds to the putative area of loss for at least two major exons, is only 68% and 73% identical to chimpanzee and gorilla, respectively. These DNA similarities are inconsistent with predictions of the common ancestry paradigm. Further, gorilla is considerably more similar to human in this region than chimpanzee—negating the inferred order of phylogeny. Taxonomically restricted gene degradation events are emerging as a common theme associated with genetic entropy and systematic discontinuity, not macroevolution.
The GULO gene encodes the enzyme L-glucono-γ-lactone oxidase, the terminal enzyme in the synthesis of ascorbic acid. Ascorbic acid is required in the synthesis of collagen and a few other processes in mammals. Mutations in the GULO gene can lead to loss of function but this is not lethal in many species because they get enough ascorbic acid in their diet.
The human gene is nonfunctional giving rise to a unitary pseudogene located on chromosome 8 at p21. As a result, ascorbic acid is now an essential component of the human diet. Because it has become essential, it is now called a vitamin (vitamin C) (see Helliwell et al., 2013) [Human GULOP Pseudogene].
The standard explanation for the origin of this pseudogene—and all other unitary pseudogenes—is that the original gene became inactivated by mutation at some time in the past. That null allele then became fixed in the population by random genetic drift. All descendants of that population inherited the pseudogene.
Tomkins takes a scattergun approach to the problem by bringing up all kinds of objections to the standard explanation. I don't have time to discuss all of his objections and I don't have enough knowledge of some of the issues to respond to his points. For example, I don't know enough about bird evolution to say whether the pattern of GULO gene loss is compatible with common ancestry or not.
Let's just look at the pseudogenes in primates to see which explanation is more reasonable. Lapachapelle and Drouin (2011) looked at the pattern of neutral substitutions in the primate lineages. All Haplorrhini3 primates (e.g. humans, chimpanzees, macaque, gibbon etc) have a pseudogene with certain shared characteristics, including a number of identical substitutions. This suggests that the ancestor of all Haplorrhini primates contained the pseudogene, which must have arisen shortly after the the split between Haplorrhini and Strepsirrhini (lemurs, galagos, etc.). According to the fossil record, the split occurred about 63 million years ago.
Lapachapelle and Drouin calculated that the pseudogene must have arisen about 61 Mya based on the neutral substitution rate. The fact that these values are so close lends support to the idea that all Haplorrhini species are derived from a common ancestor that lost the GULO gene.
The authors also looked at specific deletions to see if the results are consistent with common ancestry. All of the primate pseudogenes are missing exons 1 and 2 of the intact, functional, gene.4 They compared the sequences of the human, chimpanzee, and macaque genes to that of the galago gene. The result is shown in Figure 4 of their paper (see below).
Note that the large deletion of the two exons ("deletion") occurs at the same position in the human, chimpanzee, and macaque genomes. All three genomes also have two identical seven base pair indels in the upstream region preceding exon 1. This is evidence of common ancestry.
Lachapelle and Drouin were testing the hypothesis that large deletions in the GULO pseudogene were due to aberrant recombination between flanking transposable elements (TE). They mapped all surrounding transposons in the primate genes and concluded that TE's did not play a role in the deletion.
Tomkins discusses this paper in his creationist journal article. He ignores the evidence of common descent and focuses instead on the transposable elements. He points out that Lachapelle and Drouin failed to find evidence that TE's were responsible for the deletions. Here's what he wants his readers to conclude from a paper that strongly supports common descent ....
Despite the fact that TEs are apparently one of the main genomic drivers of deletion events in the genome, the researchers (Lachapelle and Drouin 2011) concluded that the lineage specific TE insertion patterns, which defied the standard inferred evolutionary model for primates, did not contribute to the loss of exons in the GULO gene. Thus, their evolutionary presuppositions caused the rejection of otherwise strong genomic data that implicated TE related unequal recombination at the GULO locus (resulting in exon deletion) that occurred in taxonomically restricted events. I think it's disingenuous of Tomkins to focus on that aspect of the study while ignoring all the evidence for common descent.
The GULO pseudogene locus on human chromosome 8 is in a gene-rich region. Orthologous genes are present at the same site in all vertebrate species although the order of the surrounding genes has been repeatedly shuffled by microrearrangements (Yang, 2013).
The presence and order of the exons within the GULO gene/pseudogene in diverse vertebrates is consistent with several independent inactivations and descent from a common ancestor (Yang, 2013). One of them occurred in the primate lineage. All of the primate pseudogenes are missing exons 1 & 2 as well as exons 5, 7, and 10 as shown in the figure below.
The data is consistent with an ancestral pseudogene gene that was missing exons 1, 2, 5, 7, and 10. Exons 3 & 4 were subsequently lost in a separate events in the gibbon lineage. The orangutan and human pseudogenes are similar with respect to exon loss and the chimpanzee pseudogene is probably the same. (The 5′ region of the GULO pseudogene was not present in the chimp genome sequence.)
Tomkins doesn't discuss the evidence for the common ancestry of the primate pseudogenes and he doesn't try to explain the pattern according to a Young Earth Creationist worldview. Instead, he draws attention to another part of Yang's paper—the part where he documents the rearrangements of the genes surrounding the GULO locus. There's nothing unusual about such rearrangements. They are common between closely related species and even within a species. Over time, blocks of genes are shuffled and re-ordered so that distantly related vertebrates show very little synteny.
Tomkins thinks this is a serious problem for evolution ...
The GULO gene lies within a gene-dense region in all vertebrate genomes studied thus far (Yang 2013). Related to this fact is the evolutionary anomaly that the gene neighborhood surrounding the GULO locus is rearranged across the vertebrate spectrum of life, and the patterns cannot be readily resolved into the standard inferred evolutionary lineages (Yang 2013). Once again, Tomkins is cherry-picking the data to focus on minor anomalies that don't fit with his strawman version of evolution. Once, again, he ignores the much more important data in the same paper that supports an ancient origin of an ancesral GULO pseudogene.
Let me close by mentioning one other "anomaly" that Tomkins raises. He questions whether the functional rat gene is an appropriate standard of comparison. You might be amused by his logic ...
Traditionally, the human GULO pseudogene has been compared to the functional rat GULO gene (Nishikimi, Kawai, and Yagi 1992; Nishikimi et al. 1994; Ohta and Nishikimi 1999). According to the UCSC genome browser (genome.ucsc.edu) and the Rat Genome Database (rgd.mcw.edu), the rat GULO gene (chr15, region p12) is oriented and transcribed on the minus strand. Interestingly, the human and ape GULO pseudogenes are oriented in the plus strand configuration (chr8, region p21.2 in human). While the rat GULO gene may serve as a general guide to exon presence and absence in degraded GULO genes in other mammals, the rat GULO is clearly in a different chromosomal configuration (compared to humans and apes) and represents a unique design pattern specific to rodents (mouse GULO is on chr15, minus strand).
1. In this case, Young Earth Creationist.
2. For more information on the GULO pseudogene see ...
How do Intelligent Design Creationists deal with pseudogenes and false claims?
Junk & Jonathan: Part 8—Chapter 5
Human GULOP Pseudogene
3. Also spelled Haplorhini.
4. Lachapelle and Drouin include a short 5′ exon (#1) that isn't present in most species. It's likely an artifact. I renumbered the exons according to Yang (2013).
Helliwell, K.E., Wheeler, G.L., and Smith, A.G. (2013) Widespread decay of vitamin-related pathways: coincidence or consequence? TRENDS in Genetics, 29:469-478. [doi: 10.1016/j.tig.2013.03.003]
Lachapelle, M.Y., and Drouin, G. (2011) Inactivation dates of the human and guinea pig vitamin C genes. Genetica, 139:199-207. [doi: 10.1007/s10709-010-9537-x]
Yang, H. (2013) Conserved or lost: molecular evolution of the key gene GULO in vertebrate vitamin C biosynthesis. Biochemical genetics, 51:413-425. [doi: 10.1007/s10528-013-9574-0]
Jonathan Kane recently (Oct. 6, 2017) posted an article on The Panda's Thumb where he claimed that Young Earth Creationists often don't get enough credit for raising serious issues about evolution [Five principles for arguing against creationism].
He mentioned some articles about pseudogenes as prime examples. I asked him for references and he responded with two articles by Jeffrey Tomkins that were published on the Answers in Genesis website. The first was on the β-globin pseudogene and the second was on the GULO pseudogene. Both articles claim that these DNA sequences aren't really pseudogenes because they have functions.I'll deal with the β-globin pseudogene in this post and the GULO pseudogene in a subsequent post.
Here's the article ....
The Human Beta-Globin Pseudogene is Non-Variable and Functional.Before addressing the specific criticisms in this article it's important to not lose sight of the bigger issue. Creationists tend to focus on particular examples while ignoring the big picture. In this case, there is abundant evidence of gene duplications in all species and there's abundant evidence that the fate of one duplicated copy of a gene is often to become inactivated rendering it a pseudogene. This has given rise to a robust explanation of multigene families referred to as Birth-and-Death Evolution [The Evolution of Gene Families] [On the evolution of duplicated genes: subfunctionalization vs neofunctionalization]. In order for Young Earth Creationists to mount a serious challenge to evolution they need to provide a better explanation for all this data and they need to provide solid evidence that the Earth is less than 10,000 years old.
Jeffrey P. Tomkins, Institute for Creation research, Dallas, TX, USA
Answers Research Journal
Abstract: One of the iconic (yet enigmatic) arguments for human-ape common ancestry has been the β-globin pseudogene (HBBP1). Evolutionists originally speculated that apparent mutations in HBBP1 were shared mutational mistakes derived from a human-chimpanzee common ancestor. However, others noted that if the gene was indeed non-functional, then it should have mutated markedly in the past 3 to 6 million years of human evolution due to a lack of selective constraint on the region. Recent research confirms that the HBBP1 region of the 6-gene β-globulin cluster is highly non-variable compared to the other β-globin genes based on large-scale DNA diversity assessment within both humans and chimpanzees. Highlighting the lack of HBBP1 sequence variability is genetic data from three different reports that link point mutations in the HBBP1 gene with β-thalassemia disease pathologies. Biochemical evidence for functionality is indicated by multiple categories of functional genomics data showing that the HBBP1 gene is transcriptionally active and a key interactive component of the β-globin gene network. In brief, the HBBP1 gene encodes two consensus regulatory RNAs that are alternatively transcribed and/or post-transcriptionally spliced. This functional complexity produces at least 16 different exon variant transcripts and 42 different intron variant transcripts. Two major regulatory regions in the HBBP1 locus contain active transcription factor binding sites that overlap multiple categorical regions of epigenetic data for functionally active chromatin. The HBBP1 gene also has the most regulatory associations with active and open chromatin within the entire β-globin cluster and its transcripts are expressed in at least 251 different human cell and/or tissue types. Instead of being a useless genomic fossil according to evolutionary predictions, the HBBP1 gene appears to be a highly functional and cleverly integrated feature of the human genome that is intolerant of mutation.
There are about 15,000 pseudogenes of various kinds in the human genome. You can't challenge the big picture of pseudogenes and junk DNA by picking out one example and trying to prove it has a function. This will not refute evolution even if it turns out to be true that one particular stretch of DNA looks like a pseudogene but actually has a function. And it certainly won't be evidence of a Young Earth.
Now let's deal with the Tomkins article. Here's a diagram showing the pseudogene in the β-globin gene cluster in humans and chimps.
There's a pseudogene at this locus in most of the great apes—an observation that's consistent with a duplication event tens of million of years ago followed by the loss of function of one of the copies. The pseudogene became fixed in the ancestral population and was passed down to all modern species. The rate at which most of the pseudogene sequence has accumulated base substitutions is consistent with the rate at which neutral mutations are fixed by random genetic drift. This indicates that most of the sequence is not under negative selection. As far as I know, creationists—especially Young Earth Creationists—haven't offered a reasonable explanation of this observation.
Tomkins' main point is that this stretch of DNA has a function so, presumably, the creator(s) copied this useful part of the genome and plugged it into one of the chromosomes as they were building each of the species. They didn't really care very much about the surrounding DNA so they didn't worry about copying it exactly. As it turns out, they introduced differences in the surrounding DNA so that the sequences of chimps and humans differ by about 2% and chimps and gorillas differ by about 4%. Humans and gorillas also differ by about 4%. The important point is that there are far fewer differences in the exons of the functional genes so they look "conserved" if you adopt an evolutionary perspective.
There's a stretch of DNA near the human β-globin pseudogene that has far fewer changes if you examine the chimp and human genomes. In evolutionary terms, it is "conserved." (It is reusable design if you are a Young Earth Creationist.) Tomkins quotes a paper by Moleirinho et al. (2013) documenting this conservation. The explanation is that the region between the γ-globin genes and the pseudogene is involved in regulating expression of the β-globin genes, probably because it contains a scaffold attachment site and associated sequences that regulate chromatin conformation. This role appears to have arisen shortly before the divergence of chimps and humans.
Here's what the sequence similarities look like on the UCSC Genome Browser. The degree of sequence similarity between the human genome and the genomes of chimps, gorillas, orangutans, and monkeys is shown as a histogram where the height of the bar indicates significant similarity. As you can see, the exons of the functional genes are conserved but the pseudogene sequence is not conserved. This is exactly what you expect if the pseudogene sequence is gradually drifting away from the ancestral gene that was functional right after the gene duplication event.
Much of the sequence surrounding the γ-globin genes is under selection, including a stretch that extend toward the pseudogene. This is the regulatory region that controls expression of the entire locus.
Thus, the evolutionary explanation is that a gene duplication occurred and one of the copies became a pseudogene. Subsequently, a region in the vicinity of the pseudogene acquired a new function involved in chromatin looping and regulation. That's why a large stretch of DNA near the γ-globin gene is conserved. I don't know how Tomkins explains the data other than just saying that the presence of function casts doubt on evolution.
Tomkins' other evidence for function relies on the ENCODE data. He notes that the pseudgogene region is transcribed as part of the pervasive transcription noted by ENCODE. It also contain numerous transcription factor binding sites, DNase I sensitive regions, and histone markers. Some of this might be remnants of the original gene but most are just spurious events that occur throughout the genome in junk DNA. Sandwalk readers will be familiar with the idea that ENCODE data does not prove function.
John Harshman send me his comparison of the β-globin region on the UCSC Genome Browser.
As you can see, the pseudogene region seems to be only slightly less conserved than the functional genes in this analysis. This isn't unexpected. The functional genes will drift apart over 100 million years by accumulating neutral mutations in the coding regions. The pseudogene arose about 65 million years ago in primate ancestors so it will have accumulated mutations at a faster rate since that time but not before. The difference in the primate lineage should amount to about 20% in that time.
When you compare the "conservation" of the various loci using an outgroup to the primate lineage, the pseudogene will only be about 20% less conserved than the functional genes. That's pretty much what you see in the figure.
When you do a binary comparison (e.g. chimp vs human), I'm assuming the algorithm subtracts the neutral mutation rate in order to calculate whether a sequence is conserved or not. Thus, in my figure, the pseudogene region only shows up as a small blip. This may be statistical error or a small bit of conserved sequence within the the second exon.
That's how I interpret the results. Any help will be appreciated. If you know how to get % sequence similarity comparisons on this browser then please post that information in the comments or email me.
Moleirinho, A., Seixas, S., Lopes, A.M., Bento, C., Prata, M.J., and Amorim, A. (2013) Evolutionary constraints in the β-globin cluster: the signature of purifying selection at the δ-globin (HBD) locus and its role in developmental gene regulation. Genome Biology and Evolution 5:559-571. [doi: 10.1093/gbe/evt029]
I'm working (slowly) on a book called What's in Your Genome?: 90% of your genome is junk! The first chapter is an introduction to genomes and DNA [What's in Your Genome? Chapter 1: Introducing Genomes ]. Chapter 2 is an overview of the human genome. It's a summary of known functional sequences and known junk DNA [What's in Your Genome? Chapter 2: The Big Picture]. Chapter 3 defines "genes" and describes protein-coding genes and alternative splicing [What's in Your Genome? Chapter 3: What Is a Gene?].Chapter 4 is all about pervasive transcription and genes for functional noncoding RNAs. I've finally got a respectable draft of this chapter. This is an updated summary—the first version is at: What's in Your Genome? Chapter 4: Pervasive Transcription.
Chapter 4: Pervasive Transcription
How much of the genome is transcribed?
The latest data indicates that about 90% of the human genome is transcribed if you combine all the data from all the cell types that have been analyzed. This is about the same percentage that was reported by ENCODE in their preliminary study back in 2007 and about the same percentage they reported in the 2012 papers. Most of the transcripts are present in less than one copy per cell. Most of them are only found in one or two cell types. Most of them are not conserved in other species.How do we know about pervasive transcription?
There are several technologies that are capable of detecting all the transcripts in a cell. The most powerful is RNA-Seq, a technique that copies RNAs into cDNA then performs massive parallel sequencing ("next gen" sequencing) on all the cDNAs. The sequences are then matched back to the reference genome to see which parts of the genome were transcribed. The technique is capable of detecting concentrations of less than one transcript per cell.Different kinds of noncoding RNAs
There are ribosomal RNAs, tRNAs, and a variety of unique RNAs like those that are part of RNAse P, signal recognition particle etc. In addition there are six main classes of other noncoding RNAS in humans: small nuclear RNAs (snRNAs); small nucleolar RNAs (snoRNAs); microRNAs (miRNAs); short interfering RNAs (siRNAs); PIWI-interacting RNAs (piRNAs); and long noncoding RNAs (lncRNAs). There are many proven examples of functional RNAs in each of the main classes but there are also large numbers of putative members that may or may not be true functional noncoding RNAs. Box 4-1: Long noncoding RNAs (lncRNAs)
There are more than 100,000 transcripts identified as lncRNAS. Nobody knows how many of these are actually real functional lncRNAs and how many are just spurious transcripts. The best analyses suggest that less than 20,000 meet the minimum criteria for function and probably only a fraction of these are actually functional.Understanding transcription
It's important to understand that transcription is an inherently messy process. Regulatory proteins and RNA polymerase initiation complexes will bind to thousands of sites in the human genome that have nothing to do with transcription of nearby genes. Box 4-2: Revisiting the Central Dogma
Many scientists and journalist believe that the discovery of massive numbers of noncoding RNAs overthrows the Central Dogma of Molecular Biology. They are wrong. Box 4-3: John Mattick proves his hypothesis?
John Mattick claims that the human genome produces tens of thousands of regulatory RNAs that are responsible for fine-tuning the expression of the protein-coding genes. He was given the 2012 Chen Award by the Human Genome Organization for "proving his hypothesis over the course of 18 years." He has not proven his hypothesis.Antisense transcription
Some transcripts are complimentary to the coding strand in protein-coding genes. This is consistent with spurious transcription to yield junk RNA but many workers have suggested functional roles for most of these antisense RNAs.What the scientific papers don't tell you
There are hundreds of scientific papers devoted to proving that most newly-discovered noncoding RNAs have a biological function. What they don't tell you is that most of these transcripts are present in concentrations that are inconsistent with function (<1 molecule per cell). They also don't tell you that conservation is the best measure of function and these transcripts are (mostly) not conserved. More importantly, the majority of these papers don't even mention the possibility that these transcripts could be junk RNA produced by spurious transcription. That's a serious omission—it means that science writers who report on this work are unaware of the controversy.On the origin of new genes
Some scientists are willing to concede that most transcripts are just noise but they claim this is an adaptation for future evolution. The idea here is that the presence of these transcripts makes it easier to evolve new protein-coding genes. While it's true that such genes could evolve more readily in a genome full of noise and junk, this cannot be a reason for such a sloppy genome.How do you determine function?
The best way to determine function is to take a single transcript and show that it has a demonstrable function. If you take a genomics approach, then the best way to narrow down the list is to concentrate on those transcripts that are present in sufficient concentrations and are conserved in related species. In the absence of evidence, the null hypothesis is junk.Biochemistry is messy
We're used to the idea that errors in DNA replication give rise to mutations and mutations drive evolution. We're less used to the idea that all other biochemical processes have much higher error rates. This is true of highly specific enzymes and it's even more true of complex processes like transcription, RNA processing (splicing), and translation. The idea that transcription errors could give rise to spurious transcripts in large genomes is perfectly consistent with everything we know about such processes. In fact, it's inevitable that spurious transcripts will be common in such genomes. Box 4-4: The random genome project
Sean Eddy has proposed an experiment to establish a baseline level of spurious transcripts and to demonstrate that the null hypothesis is the best explanation for the majority of transcripts. He suggests that scientists construct a synthetic chromosome of random DNA sequences and insert it into a human cell line. The next step is to perform an ENCODE project on this DNA. He predicts that the methods will detect hundreds of transcription factor binding sites and transcripts.Change your worldview
There are two ways of looking at biochemical processes within cells. The first imagines that everything has a function and cells are as fine-tuned and functional as a Swiss watch. The second imagines that biochemical processes are just good enough to do the job and there's lots of mistakes and sloppiness. The first worldview is inconsistent with the evidence. The second worldview is consistent with the evidence. If you are one of those people who think that cells and genomes are the products of adaptive excellence then it's time to change your worldview.
This is a podcast from Cold Spring Harbor [Dark Matter of the Genome, Pt. 1 (Base Pairs Episode 8)]. The authors try to convince us that most of the genome is mysterious "dark matter," not junk. The main theme is that the genome contains transposons that could play an important role in evolution and disease.
Here's a few facts.
There's much value in research on ALS but does it have to be coupled with an incorrect view of our genome? How many errors can you recognize in this podcast? Keep in mind that this is sponsored by one of the leading labs in the world.
- A gene is a DNA sequence that's transcribed. There are about 20,000 protein-coding genes and they cover about 25% of the genome (including introns). It's false to say that genes only occupy 2% of the genome. In addition to protein-coding genes, there are about 5,000 noncoding genes that take up about 5% of the genome. Most of them have been known for decades.
- It has been known for many decades that the human genome has no more than 30,000 genes. This fact was known by knowledgeable scientists long before the human genome sequence was published.
- It has been known for decades that about 50% of our genome is composed of defective bits and pieces of once-active transposons. Thus, most of our genome looks like junk and behaves like junk. It is not some mysterious "dark matter." (The podcast actually say that 50% of our genome is defective transposons but they claim this is a recent discovery and it's not junk.)
- The evidence for junk DNA comes from many different sources. It's not a mystery. It's really junk DNA. The term "junk DNA" was not created to disguise our ignorance of what's in your genome.
- In addition to genes, there are lots of other functional regions of the genome. No knowledgeable scientists ever thought that the only functional parts of the genome were the exons of protein-coding genes.
Most of the genome is not genes, but another form of genetic information that has come to be known as the genome’s “dark matter.” In this episode, we explore how studying this unfamiliar territory could help scientists understand diseases such as ALS.