Press release from the Francis Crick Institute misrepresents junk DNA

Press releases have become a serious problem. I'm frequently upset whenever I read a press release covering a field I'm familiar with. They rarely do a good job of explaining what's actually in the paper and putting it into the proper context. The people who write press releases are more concerned with sensationalizing the work than they are with teaching the general public about how science works. They often do this with the blessing and participation of the scientists who did the work.

Let me illustrate the problem using a recent examples from the Francis Crick Institute in London, UK [Non-coding DNA changes the genitals you're born with]. The press release covers a recent Science paper from the Lovell-Badge lab ....
Gonen, N., Futtner, C.R., Wood, S., Garcia-Moreno, S.A., Salamone, I.M., Samson, S.C., Sekido, R., Poulat, F., Maatouk, D.M., and Lovell-Badge, R. (2018) Sex reversal following deletion of a single distal enhancer of Sox9. Science. [doi: 10.1126/science.aas9408]
These workers discovered and characterized a regulatory region upstream of the mouse Sox9 gene. The Sox9 gene controls the development of testis and deletion of the regulatory region reduces the level of Sox9 gene expression leading to XY individuals that are phenotypically female.

We have known about regulatory DNA for more than 50 years so the paper doesn't make any contribution to our general understanding of transcriptional regulation. In fact, it fits right in with decades of work on enhancers, promoters, and transcription factors.

You wouldn't know that from reading the press release. Even the title of the article (Non-coding DNA changes the genitals you're born with) suggests that there's something unusual about noncoding DNA that has a function. This point is highlighted in the press release ...
Only 2% of human DNA contains the 'code' to produce proteins, key building blocks of life. The remaining 98% is 'non-coding' and was once thought to be unnecessary 'junk' DNA, but there is increasing evidence that it can play important roles.
This is 2018. Isn't it about time that science writers stopped spreading this fake news? There was never a time when knowledgeable scientists thought that all noncoding DNA was junk. Never.

Furthermore, we've had a pretty good understanding of regulatory DNA since the early 1980s. Think about what that means. It means that "increasing evidence" is a misleading way of saying that the basic facts have been known and understood for more than thirty years. Forty years ago you might have gotten away with saying that there's "increasing evidence" that noncoding DNA has a function, but not today.

Is this just sloppy science written by an employee who really doesn't understand the history of gene expression and genome composition? No, it isn't just ignorance on the part of the press office because they have the support of the lead author on the paper; a postdoc named Nitzan Gonen. She is quoted in the press release ...
Dr Nitzan Gonen, first author of the paper and postdoc at the Crick, says: "Typically, lots of enhancer regions work together to boost gene expression, with no one enhancer having a massive effect. We identified four enhancers in our study but were really surprised to find that a single enhancer by itself was capable of controlling something as significant as sex."

"Our study also highlights the important role of what some still refer to as 'junk' DNA, which makes up 98% of our genome. If a single enhancer can have this impact on sex determination, other non-coding regions might have similarly drastic effects. For decades, researchers have looked for genes that cause disorders of sex development but we haven't been able to find the genetic cause for over half of them. Our latest study suggests that many answers could lie in the non-coding regions, which we will now investigate further."
Here's a better way of explaining the significance of this paper.
The opening sentence of the paper says, "The regulation of genes with important roles in embryonic development can be complex, involving multiple, often redundant enhancers, repressors, and insulators."

These regulatory elements are usually found near the genes they regulate and they represent an important part of the genome. This study identifies a regulatory element that controls the Sox9 gene in mice. Defects in regulatory elements are known to cause genetic disorders and it has long been suspected that disorders of sex development are also due to mutations in regulatory elements. This study identifies an important regulatory element that controls sex development and demonstrates that mutations in this element cause sex development disorders.
There's no mention of "noncoding DNA" in the paper and no mention of junk DNA. That's because nobody is surprised to find regulatory elements that aren't in coding exons. Nobody who reads the paper is going to be surprised to learn that noncoding DNA has a function even though they understand that 90% of our genome is junk. Why can't the press release make this clear to the general reader? Why can't the authors make sure the press release accurately represents the published report?

Is lateral gene transfer (LGT) Lamarckian?

There's an interesting discussion going on about lateral gene transfer (LGT) in eukaryotes. LGT is the process by which DNA from one species invades the genome of another species. It was apparently very common among primitive bacteria several billion years ago and it's still quite common in modern bacteria.

There are many reports of LGT in eukaryotes but some of them seem to be due to contamination from bacteria rather than true LGT. Many scientists are skeptical of these reports; notably Bill Martin (Heinrich Heine Universität, Düsseldorf, Germany) who suggests that almost all of them are artifacts and lateral gene transfer in eukaryotes is extremely rare [see Lateral gene transfer in eukaryotes - where's the evidence?].

Andrew J. Roger studies deep evolution at Dalhousie University in Halifax, Nova Scotia, Canada. He has vigorously defended the existence of LGT in eukaryotes (see his comments in my earlier post).

In addition to this debate about the existence of LGT in eukaryotes, there's a discussion about whether lateral gene transfer in eukaryotes, if it exists, is a fundamentally Lamackian process. The latest exchange took place with two letters published in the May issue of Nature Ecology & Evolution.
Eukaryote lateral gene transfer is Lamarckian, William F. Martin
[doi: 10.1038/s41559-018-0521-7]

Reply to 'Eukaryote lateral gene transfer is Lamarckian,' Andrew J. Roger
[doi: 10.1038/s41559-018-0522-6]
Martin is mostly concerned about adaptationist claims of LGT in eukaryotes. He references a number of papers that make such claims; for example, Hirooka et al. (2017) claim that the green alga, Chlamydomonas eustigma, recently adapted to an acidic environment by taking up genes from bacteria or other eukaryotes. These genes are not present in related species that have not adapted to an acidic environment. Here's how Martin describes such studies.
The core of eukaryotic LGT adaptation claims is that eukaryotes lack the genetic material required to survive in particular environments and acquire the genes needed in order to access those environments from those that already live there. Lamarckian? Yes. In eukaryote LGT adaptationism, the environment is the source of natural variation, not the evolving organism itself.
I see Martin's objection as essentially an attack on naive adapatationism invoking LGT as a mechanism for adaptive change. The fact that many of those claims of LGT are probably false is only part of the problem. The other part is the adaptationist claim to justify recent and abundant LGT.

It's true that there are Lamarckian characteristics behind these (probably false) claims. However, that's probably not the best way to criticize the hyper-adaptationism that's associated with false conclusions about LGT in eukaryotes. In addition, Martin clearly goes too far when he implies that all eukaryotic LGT is Lamarckian and false.

Andrew Roger takes up the challenge implied by Martin's exaggeration. The gist of his argument can be found in the first two sentences of his letter ...
Martin argues here and elsewhere that nearly all claims of lateral gene transfer (LGT) into eukaryotic genomes are untrue, and that accompanying narratives are fundamentally 'Lamarckian.' Some eukaryotic claims have proven false, but this does not mean that most are. Although rare, gene transfers have had a profound effect on the evolution of traits in eukaryotes.
He goes on to explain the "proper" view of LGT.
Chunks of DNA are accidentally incorporated into chromosomes creating genetic variation that is neutral, deleterious or, in rare cases, beneficial. If they enhance fitness, acquired genes are likely to be fixed in the population by natural selection. Any reasonable adaptive LGT claim has a similar etiological narrative that respects modern evolutionary principles.
Here's the problem. Martin knows the proper role of LGT but that's not what he was criticizing. He was criticizing many "unreasonable" adaptive LGT claims but he went too far by implying that all claims were of this type.

The real issue here is that a great many claims of LGT in eukaryotes are probably false—I suspect that most are false. We should not let bickering over Lamarck obscure that fact. I wish Bill Martin had not raised the issue about Lamarckian evolution.
The world is not inhabited exclusively by fools and when a subject arouses intense interest and debate, as this one has, something other than semantics is usually at stake.

Stephan Jay Gould (1982)

As Stephan Jay Gould once said, when scientists squabble over semantics, there's usually something more at stake. In this case, it's the origin of basic metabolic processes. Martin is one of a group of scientists who propose that primitive eukaryotes were facultative anaerobes. They were capable of growing and reproducing in the presence of oxygen and in its absence. They acquired this capability because the primitive mitochondrial endosymbiont had all of the enzymes necessary for both types of metabolism. In some lineages, the ability to carry out anaerobic metabolism has been lost. This hypothesis is sometimes called the "hydrogen hypothesis" because an important terminal electron acceptor is protons that can be reduced to form hydrogen.

Here's how Müller at al. (2012) explain the controversy over the origin of anaerobic metabolism in eukaryotes (e.g. protists).
For the origin of anaerobic energy metabolism in protists, the question is, Were the genes present in the single eukaryote common ancestor, or do they clearly reflect multiple origins, and if the former is true, does their single origin coincide with the origin of mitochondria? This has in turn given rise to two main competing alternative hypotheses for the origin of anaerobic energy metabolism in protists: (i) the enzymes were present in the eukaryote ancestor and were inherited vertically by modern groups, or (ii) they were lacking in the eukaryote ancestor (which would then implicitly have been a strict aerobe) and were acquired in different eukaryotes groups independently via lateral gene transfers (LGTs). Those views generated very different predictions with regard to the evolutionary patterns of the underlying genes.
The authors, including Bill Martin, conclude that the enzymes were present in the early mitochondrial ancestor although they don't preclude that 1-2% of the genes could have been acquired by LGT.

Andrew Roger has proposed that many of the genes required for anaerobic metabolism were acquired by LGT after the initial symbiotic event (Hug et al., 2010).

Thus, the two participants in the exchange of letters are on opposite sides of the bigger debate on the origin of genes for anaerobic metabolism in eukaryotes. Bill Martin favors the hydrogen hypothesis and the idea that the primitive bacterium giving rise to mitochondria was a facultative anaerobe carrying the genes necessary for anaerobic metabolism. These genes have been lost in many eukaryotes but the core genes all descend from a common ancestor. Andrew Roger is on the side of those who argue that anaerobic metabolism arose independently by LGT in many eukaryotic lineages. This is why he ends his letter in Nature Ecology & Evolution with ...
So why such resistance to LGT in eukaryotes? Endosymbiotic organelle origins and endosymbiotic gene transfer have been championed as dominant mechanisms in eukaryotic gene evolution. Indeed, the widely publicized 'hydrogen hypothesis' of eukaryogenesis depends heavily on assuming a mitochondrial ancestry of 'bacterial-like' enzymes of anaerobic energy metabolism in eukaryotes. Acknowledging LGT as an important mechanism provides an alternative explanation for such patchily distributed genes in eukaryotes that do not show the hallmarks of mitochondrial or plastid origin.
Keep in mind that if Andrew Roger is correct about how laterally transferred genes are eventually fixed in a lineage, then LGT must be very common because most events will not give rise to an adaptive advantage. They will be neutral or disadvantageous. Bill Martin believes that most claims of LGT in eukaryotes are false—he's probably right about this—and that LGT must be quite rare. If Bill Martinis is correct then it's unlikely that LGT can account for all the examples of anaerobic metabolism in eukaryotes.

I'm not a big fan of either explanation for anaerobic metabolism. The idea that it's the primitive condition that has been lost in may lineages seems a bit far-fetched given the patchy distribution in eukaryotes. On the other hand, using LGT to explain this patchy distribution of fundamentally similar enzymes activities seems equally unlikely.

The various tests of these hypotheses relay on sophisticated analyses of sequences that diverged billions of years ago. Both of these men (Roger and Martin) are experts in this field but they are pushing the boundaries of the field using algorithms that are incomprehensible to the average scientist.

Photos: The first photo is of Bill Martin and me having coffee at Tim Hortons in Toronto last year. The second one is Andrew Roger and me at the "Tree of Life" meeting in Halifax in 2009.

Hug, L.A., Stechmann, A., and Roger, A.J. (2009) Phylogenetic distributions and histories of proteins involved in anaerobic pyruvate metabolism in eukaryotes. Molecular Biology and Evolution, 27:311-324. [doi: 10.1093/molbev/msp237]

Müller, M., Mentel, M., van Hellemond, J.J., Henze, K., Woehle, C., Gould, S.B., Yu, R.-Y., van der Giezen, M., Tielens, A.G., and Martin, W.F. (2012) Biochemistry and evolution of anaerobic energy metabolism in eukaryotes. Microbiology and Molecular Biology Reviews, 76:444-495. [doi: 10.1128/MMBR.05024-11]

Fixing carbon by reversing the citric acid cycle

The citric acid cycle1 is usually taught as depicted in the diagram on the right.2 A four-carbon compound called oxaloaceate is joined to a two-carbon compound called acetyl-CoA to produce a six-carbon tricarboxylic acid called citrate. In subsequent reactions, two carbons are released in the form of carbon dioxide to regenerate the original oxaloacetate. The cycle then repeats. The reactions produce one ATP equivalent (ATP or GTP), three NADH molecules, and one QH2 molecule.

The GTP/ATP molecule and the reduced coenzymes (NADH and QH2) are used up in a variety of other reactions. In the case of NADH and QH2, one of the many pathways to oxidation is the membrane-associated electron transport system that creates a proton gradient across a membrane. The electron transport complexes are buried in membranes—plasma and internal membranes in bacteria and the inner mitochondrial membrane in eukaryotes. Students are often taught that this is the only fate of NADH and QH2 but that's not true.

One of the other common misconceptions is that the citric acid cycle runs exclusively in one direction; namely, the direction shown in the diagram. That's also not true. The reactions of the citric acid cycle are near-equilibrium reactions like most reactions in the cell. What this means is that the concentrations of the reactants and products are close to the equilibrium values so that a slight increase in one of them will lead to a rapid equilibration. The reactions can run in either direction.3

Furthermore, the citric acid cycle does not exist in isolation. Many of the intermediates are also intermediates in other reactions such as the synthesis and degradation of amino acids and fatty acids, to mention just two possibilities. The best biochemistry courses will make sure that students understand this and make sure they understand that all these molecules exist in a "soup" inside the cell where the fate of any one molecule depends on its concentration and the concentration of surrounding molecules. The very best courses will explain that many species of bacteria have some of the enzymes but can't make a "cycle" because one of the key enzymes is missing. This leads to an understanding of how such an irreducibly complex pathway could have evolved [The evolution of the citric acid cycle].

Sorry for the long-winded introduction. What I want to tell you about is a couple of papers that were recently published in Science along with a summary in the news section of the journal.
Ragsdale, S. W. (2018) Stealth reactions driving carbon fixation. Science 359:517-518. [doi: 10.1126/science.aar6329]

Nunoura, T., Chikaraishi, Y., Izaki, R., Suwa, T., Sato, T., Harada, T., Mori, K., Kato, Y., Miyazaki, M., Shimamura, S., Yanagawa, K., Shuto, A., Ohkouchi, N., Fujita, N., Takak, Y., Atomi, H., and Takai, K. (2018) A primordial and reversible TCA cycle in a facultatively chemolithoautotrophic thermophile. Science 359:559-563. [doi: 10.1126/science.aao3407]

Mall, A., Sobotta, J., Huber, C., Tschirner, C., Kowarschik, S., Bačnik, K., Mergelsberg, M., Boll, M., Hügler, M., Eisenreich, W., and Berg, I.A. (2018) Reversibility of citrate synthase allows autotrophic growth of a thermophilic bacterium. Science 359:563-567. [doi: 10.1126/science.aao2410]
The papers report solid evidence that carbon dioxide can be fixed in two different species of bacteria by reversing the citric acid cycle. This shouldn't come as a big surprise given everything that I said in the first part of the post. If all of the reactions are really near-equilibrium reactions then the enzymes can catalyze reactions in either direction. This is what I taught students in introductory biochemistry and it's what's in my textbook.

So, why do these papers deserve to be published in one of the most prestigious journals? It's because most biochemists have a very different view of biochemistry than the one I described. That view is false, in my opinion, but it leads to the conclusion that reversal of the citric acid cycle is impossible. That's why the two papers seem so revolutionary. They "refute" a concept that my students already knew was false!

Most biochemists think that some reactions are irreversible because of unfavorable thermodynamics. One of these reactions is catalyzed by citrate synthase, the enzyme that interconverts oxaloacetate and citrate in the citric acid cycle [EC].
acetyl-CoA + H2O + oxaloacetate = citrate + HS-CoA + H+
The standard Gibbs free energy change for this reaction in the direction written above is about −36kJ/mol (ΔG°′=−36 kJ/mol). This is a very big number; one that's normally associated with reactions such as the hydrolysis of ATP to ADP + Pi. Scientists such as Stephen Ragsdale—the author of the news article—hold to the view that these reactions are highly thermodynamically favorable such that the reverse reaction was thought to be impossible. That's why the papers are thought to be so important. They challenge the prevailing (false) view that standard Gibbs free energy changes determine whether reactions are "exergonic" (release energy) or "endergonic" (absorb energy).

This is why many biochemists will be surprised to discover that a reaction with what was thought to be a highly favorable Gibbs free energy change can be reversible. They shouldn't be surprised because, in fact, the free energy change inside the cell is close to zero because it's at equilibrium. It's not a surprise to some that the reaction is readily reversible.

(I need to acknowledge one of the earlier co-authors on my textbook, Ray Ochs of St. John's University in Queens, New York, USA. He taught me how to understand the difference between standard Gibbs free energy changes and the real free energy changes that take place inside cells.)

1. Otherwise known as the tricarboxylic acid (TCA) cycle or the Krebs cycle,

2. Image Credit: Moran, L.A., Horton, H.R., Scrimgeour, K.G., and Perry, M.D. (2012) Principles of Biochemistry 5th ed., Pearson Education Inc. page 391 and page 409. © Pearson/Prentice Hall

3. You would think that every single biochemistry course and textbook would at least get all the reactions correct. Unfortunately, this is not the case. Most teachers and most textbook authors have copied errors that were introduced decades ago. If you search the web you will find that's it's almost impossible to find a site where all the reactions are depicted correctly [Biochemistry on the Web: The Citric Acid Cycle].

Philosophers talking about genes

It's important to define what you mean when you use the word "gene." I use the molecular definition since most of what I write refers to DNA sequences. There's no perfect definition but, for most purposes, a good working definition is: A gene is a DNA sequence that is transcribed to produce a functional product. [What Is a Gene?].

There are two types of genes: protein-coding genes and those that specify a functional noncoding RNA (i.e ribosomal RNA, lincRNA). The gene is the part of the DNA that's transcribed so it includes introns. Transcription is controlled by regulatory sequences such as promoters, operators, and enhancers but these are not part of the gene.

In addition to genes, there are many other functional parts of the genome. In the case of eukaryotic genomes, these include centromeres, telomeres, origins of replication, SARs, and some other bits. None of this is new ... these functions have been known for decades and the working definition I use has been common among knowledgeable experts for half-a-century. Scientists know what they are talking about when they say that the human genome contains about 20,000 protein-coding genes and at least 5,000 genes for non-coding RNAs. They are comfortable with the idea that our genome has lots of other functional regions that lie outside of the genes.

Non-experts may not be familiar with the topic and they may have many misconceptions about genes and DNA sequences but we don't base our science on the views of non-experts.

Because of my interest in this topic, I was intrigued by the title of a new book, The Gene: from Genetics to Postgenomics. I ordered it a soon as I heard about it and I've just finished reading it. The version I read has been translated from German by Adam Bostanci.

The authors are Hans-Jörg Rheimberger of the Max Planck Institute for the History of Science in Berlin, Germany, and Staffan Müller-wille of the Centre for the study of the Life Sciences at the University of Exeter, UK. They are philosophers. They have two goals in mind: (1) to cover the history of the gene concept, and (2) to demonstrate that recent discoveries have radically undermined the concept of a gene.

... those ignorant of history are not condemned to repeat it; they are merely destined to be confused.

Stephen Jay Gould
Ontogeny and Phylogeny (1977)
They have only partially achieved the first goal. They recognize that the word "gene" can be used in many different contexts. In the first half of the twentieth century it referred almost exclusively to a unit of heredity or a unit of selection (or, more correctly, a unit of evolution). With the recognition that DNA was the genetic material, the word "gene" took on an additional meaning as a physical unit of function. In other words, acquired a physical form in contrast to the nebulous genetic meaning of the word. This is the molecular gene. It's at this point in their book that the authors lose their way. They never give us a molecular definition. I suspect they are thinking of a gene as coding sequences but you have to struggle to interpret their view of the molecular definition. They talk about "structural genes" and imply that the discovery of "regulatory genes" altered our concept of the gene but these terms were never used by experts in the way that the authors imagine (p. 66).

The authors never discuss the definition I prefer. It's not clear they have even considered it since they rely on the work of other philosophers who have also ignored it [see Debating philosophers: The molecular gene].

The problem with this part of the book (the part about the molecular gene) is that the authors seem to be confused about the difference between a molecular gene and the view that "genes" are the only thing that count in genetics, evolution, metabolism etc. They seem to think that the gene-centric view requires that everything be attributed to DNA sequences that encode proteins. Thus, when they recognize that important functional elements exist outside of genes, they conclude that the gene-centric view is fatally flawed. This leads us to their second goal where they try to convince us that the definition of "gene" is fatally flawed because genes aren't the only things that play an important role in genetics.

They fail in this goal because they are arguing against a strawman version of biology that no experts believe in.

This seems to be a common problem among philosophers. They refuse to use critical thinking to unravel the meaning of the molecular gene —a meaning that is really quite simple even though it's not perfect. Then they confuse themselves by thinking that knowledgeable experts use the word "gene" as a synonym for all functional sequences in the genome. Finally, they misunderstand the term "gene-centric" where the word "gene" is used metaphorically to refer to any DNA sequence that functions in population genetics and evolution. (Philosophers also tend to greatly over-estimate the influence of Richard Dawkins and the selfish gene.)

& Junk DNA
The book contains all the usual misconceptions that come from reading the uniformed literature and assuming it represent the views of experts. Here's a short list of views that have been effectively challenged—and sometimes refuted— in the scientific literature ...
  1. Scientists were surprised that the human genome didn't contain 100,000 genes or more (p. 84)
  2. Crick's sequence hypothesis is no longer valid (p. 68)
  3. junk DNA is just a term used to describe DNA of no known function (p. 69)
  4. alternative splicing means that most genes can make many different proteins (pp. 70, 84, 107)
  5. evolutionary-developmental biology (evo-devo) threatens our understanding of the gene concept (pp. 88, 94-98)
  6. the ENCODE results have transformed our understanding of genes and genomes (p. 91)
  7. "the existence of epigenetic systems of inheritance poses the greatest challenge for the classical molecular gene concept" (p. 92)
  8. the discovery of Lamarckian inheritance casts doubt on the central dogma of molecular genetics (p. 92)
  9. plasticity is a problem (p. 98)
  10. 98% of the genome was thought to be junk but, thanks to ENCODE, we now know that it's full of regulatory elements (pp. 104-105)
In addition to this list of the usual misunderstandings and misconceptions, the authors have come up with two others that are quite novel. I'll quote directly from page 84 and let you see for yourselves ...
[There are] ... two further unexpected results of the genome project that complemented each other but also pointed in opposite directions. First, comparisons of the human genome with those of other primates revealed a surprisingly high degree of sequence conservation. Given remarkable differences in the physical constitution of these closest relatives of Homo sapiens, in particular differences in the so-called higher, mental faculties as a consequence of several million years of evolution, this degree of genomic affinity was astonishing. Major changes in the phenotype were apparently compatible with relatively minor changes in the genotype. The second surprising finding was that the genomes of different human individuals exhibit considerable differences. This genetic polymorphism was not, however, necessarily accompanied by correspondingly pronounced phenotypic differences.

Observations of this kind presented a serious challenge for gene-centrism and prompted proponents of the big genome projects to herald the dawn of an age of "postgenomics" in which the whole cell and the whole organism would move into the limelight.1
A little learning is a     dangerous thing;
drink deep, or taste not the     Pierian spring:
there shallow draughts     intoxicate the brain,
and drinking largely     sobers us again.
                  Alexander Pope
This book was published in 2017. It was revised and updated at that time. The scientific literature is full of debate and discussion about the topics covered here but you won't find any mention of controversy in this book. This can't be blamed exclusively on philosophers since there are many scientists who also ignore the controversies over junk DNA, alternative splicing, evolutionary theory, epigenetics etc. Like Rheinberger and Müller-Wille, they are content with promoting only one side of the story—the one that corresponds to their biases. Perhaps one should expect better critical thinking from philosophers?

There's one way in which this book differs from similar books written by scientists [see Human genome books]. Whereas scientists tend to quote scientific papers, Rheinberger and Müller-Wille rely heavily of the views of other philosophers. I get the distinct impression that almost all philosophers of science have reached the same conclusions and they support those (mostly false) conclusions by referencing each other instead of going back to the scientific literature [see When philosophers talk about genomes] [Debating philosophers: The Lu and Bourrat paper].

The views in this book are remarkably similar to those of Evelyn Fox Keller who is a Professor Emerita in the History and Philosophy of Science at the Massachusetts Institute of Technology in Boston, USA. I have already commented on one of her articles, "The Postgenomic Genome," in a previous post [When philosophers talk about genomes]. She is quoted several times in this book and her misconceptions are the same as those expressed by Rheinberger and Müller-Wille. You should follow the link to see what she says about genes and junk DNA in order to see for yourselves how badly modern philosphers have misinterpreted the science.

1. If you don't immediately see what's wrong with these arguments then ask a question in the comments.

Required reading for the junk DNA debate

This is a list of scientific papers on junk DNA that you need to read (and understand) in order to participate in the junk DNA debate. It's not a comprehensive list because it's mostly papers that defend junk DNA and refute arguments for massive amounts of function. The only exception is the paper by Mattick and Dinger (2013).1 It's the only anti-junk paper that attempts to deal with the main evidence for junk DNA. If you know of any other papers that make a good case against junk DNA then I'd be happy to include them in the list.

If you come across a publication that argues against junk DNA, then you should immediately check the reference list. If you do not see some of these references in the list, then don't bother reading the paper because you know the author is not knowledgeable about the subject.

Brenner, S. (1998) Refuge of spandrels. Current Biology, 8:R669-R669. [PDF]

Brunet, T.D., and Doolittle, W.F. (2014) Getting “function” right. Proceedings of the National Academy of Sciences, 111:E3365-E3365. [doi: 10.1073/pnas.1409762111]

Casane, D., Fumey, J., et Laurenti, P. (2015) L’apophénie d’ENCODE ou Pangloss examine le génome humain. Med. Sci. (Paris) 31: 680-686. [doi: 10.1051/medsci/20153106023] [doi: PDF]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Doolittle, W.F., Brunet, T.D., Linquist, S., and Gregory, T.R. (2014) Distinguishing between “function” and “effect” in genome biology. Genome biology and evolution 6, 1234-1237. [doi: 10.1093/gbe/evu098]

Doolittle, W.F., and Brunet, T.D. (2017) On causal roles and selected effects: our genome is mostly junk. BMC biology, 15:116. [doi: 10.1186/s12915-017-0460-9]

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Graur, D. (2017) Rubbish DNA: The functionless fraction of the human genome Evolution of the Human Genome I (pp. 19-60): Springer. [doi: 10.1007/978-4-431-56603-8_2 (book)] [PDF]

Graur, D. (2017) An upper limit on the functional fraction of the human genome. Genome Biology and Evolution, 9:1880-1885. [doi: 10.1093/gbe/evx121]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Graur, D., Zheng, Y., and Azevedo, R.B. (2015) An evolutionary classification of genomic function. Genome Biology and Evolution, 7:642-645. [doi: 10.1093/gbe/evv021]

Gregory, T. R. (2005) Synergy between sequence and size in large-scale genomics. Nature Reviews Genetics, 6:699-708. [doi: 10.1038/nrg1674]

Haerty, W., and Ponting, C.P. (2014) No Gene in the Genome Makes Sense Except in the Light of Evolution. Annual review of genomics and human genetics, 15:71-92. [doi:10.1146/annurev-genom-090413-025621]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

Mattick, J. S., and Dinger, M. E. (2013) The extent of functionality in the human genome. The HUGO Journal, 7:2. [doi: 10.1186/1877-6566-7-2]

Five Things You Should Know if You Want to Participate in the Junk DNA DebateMorange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Ohno, S. (1972) An argument for the genetic simplicity of man and other mammals. Journal of Human Evolution, 1:651-662. [doi: 10.1016/0047-2484(72)90011-5]

Ohno, S. (1972) So much "junk" in our genome. In H. H. Smith (Ed.), Evolution of genetic systems (Vol. 23, pp. 366-370): Brookhaven symposia in biology.

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]

Rands, C. M., Meader, S., Ponting, C. P., and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLOS Genetics, 10:e1004525. [doi: 10.1371/journal.pgen.1004525]

Thomas Jr, C.A. (1971) The genetic organization of chromosomes. Annual review of genetics, 5:237-256. [doi:]

1. The paper by Kellis et al. (2014) is ambiguous. It's clear that most of the ENCODE authors are still opposed to junk DNA even though the paper is mostly a retraction of their original claim that 80% of the genome is functional.

I’m going to a birthday party!

It's Bruce Alberts' 80th birthday party in San Francisco. There will be food, wine, cake, and (probably) dancing but first you go to the symposium on education.

Bruce Alberts’ 80th Birthday Gathering and Symposium

Saturday, April 14
Symposium on Science Education and Science Policy in Honor of Bruce Alberts’ 80th Birthday
(At the Metropolitan Club, 640 Sutter St., San Francisco 94102)

9a Guests arrive and register

10a Introduction by Master of Ceremonies Gregor Eichele

10:10a Session 1 How do we convey the importance of science to the public?
Moderator: Maureen Munn
Panelists: Janet Coffey, Will Colglazier, Janet English, Caroline Kiehle

11:40a Break

12p Buffet Lunch served in the Garden Room

1:30p Session 2 Innovations in Teaching and Learning in Higher Education
Moderators: Doug Kellogg and Kimberly Tanner.
Panelists: Judy Miner, Sally Pasion (one more panelist TBA)

2:30p Coffee and tea break

3p Session 3 Challenges Facing the Next Generation of Scientists
Moderators: Cynthia Fuhrmann and Bill Theurkauf.
Panelists: Marc Kirschner, Barry Selick, Nolan Sigal

4p Break

4:30p Session 4 Science Policy
Moderators: Mary Maxon and Jason Rao
Panelists: Bill Colglazier, Haile Debas, Donna Riordan, Keith Yamamoto

5:30p Elaine Bearer’s Duet for clarinet and viola: “Replication Machine”

6:15p Reception at Metropolitan Club Bar (4th Floor)

7p Buffet Dinner (Metropolitan Club Main Dining Hall — 4th Floor) Ending at 9:30p.

Sunday, April 15

10a - 2p Drop-in Brunch for all hosted at Beth Alberts’ home

Photo: Bruce Alberts with his first three graduate students: Glenn Herrick (right), Keith Yamamoto (left), Larry Moran (middle right), Bruce Alberts (middle left).

Cafe Scientific Mississauga: The Good, Bad, & Natural

Dan Riskin: The Good, Bad, & Natural: What Mother Nature says
about morality?

Thursday, April 12, 2018
7:30 - 10:00 pm
The Franklin House
263 Queen Street S
Streetsville (Mississauga), Ontario, Canada

"People often act like “natural” is synonymous with “good.” Using heinous examples from the scientific literature, Dan Riskin will blow the hinges off that misconception. Then he’ll give some thoughts about where, if not from nature, the roots of human morality might lie.

Dan Riskin, PhD, is a television personality, scientist, author, and podcaster. He is best known as the co-host of Discovery's flagship science program, Daily Planet, and as the host of Animal Planet's show about parasites, Monsters Inside Me. To make science accessible and interesting to wide audiences, Dan has appeared as a guest on The Tonight Show with Jay Leno, The Late Late Show with Craig Ferguson, The Dr. Oz Show, and on several news outlets, including CP24, CTV, CNN, and CBS. Dan has published more than 20 papers in scientific journals, and his first popular book, Mother Nature is Trying to Kill You was a Canadian bestseller.

This meetup starts 30 minutes later than our regular meeting time to give Dan time to drive to Mississauga from Scarborough.
You are welcome to come at 7 or 7:30, but don't expect the talk to begin before 8 pm. It will definitely be worth it.

Subhash Lakhotia: The concept of ‘junk DNA’ becomes junk

Continuing my survey of recent papers on junk DNA, I stumbled upon a review by Subash Lakhotia that has recently been accepted in The Proceedings of the Indian National Science Academy (Lakhotia, 2018). It illustrates the extent of the publicity campaign mounted by ENCODE and opponents of junk DNA. In the title of this post, I paraphrased a sentence from the abstract that summarizes the point of the paper; namely, that the 'recent' discovery of noncoding RNAs refutes the concept of junk DNA.

Lakhotia claims to have written a review of the history of junk DNA but, in fact, his review perpetuates a false history. He repeats a version of history made popular by John Mattick. It goes like this. Old-fashioned scientists were seduced by Crick's central dogma into thinking that the only important part of the genome was the part encoding proteins. They ignored genes for noncoding RNAs because they didn't fit into their 'dogma.' They assumed that most of the noncoding part of the genome was junk. However, recent new discoveries of huge numbers of noncoding RNAs reveal that those scientists were very stupid. We now know that the genome is chock full of noncoding RNA genes and the concept of junk DNA has been refuted.

Here's the abstract ...
Major discoveries like the one gene-one enzyme hypothesis, demonstration of DNA as the genetic material and finally the elucidation of the double helical structure of DNA in 1940s and early 1950s set the stage for emergence of molecular biology. Parallel cell biological studies during this period also indicated a correlation between rate of protein synthesis in a cell and the amount of cytoplasmic RNA. Following the proposal of George Gamow, a physicist, about the triplet genetic code and possible involvement of RNA in the transfer of information from DNA to proteins, Crick proposed the 'central dogma of molecular biology' to suggest the paths of information transfer between nucleic acids and proteins, with the limitation that the information cannot flow back from protein to nucleic acids. With emphasis on proteins as the central phenotypic determinants and the continuing enigma of heterochromatin, which largely appeared to be ‘gene desert’, enriched in repetitive DNA sequences and claimed to be inert in transcription, the many observations in 1960s of a large variety of heterogeneous nuclear RNAs remained ignored. Curiosity in the nuclear RNAs that do not see the face of cytoplasm appeared to be quelled by concepts of ‘selfish’ or ‘junk’ DNA in the early 1980s, notwithstanding the fact that active transcription of typical heterochromatin regions and repetitive and other noncoding DNA sequences was well demonstrated in 1960s and 1970s. With a few exceptions like the hsrω and roX transcripts in Drosophila and the Xist RNAs in mammals, the noncoding RNAs remained largely ignored for nearly two decades. The discovery of RNA interference and sequencing of different eukaryotic genomes, including the human genome, led to revisits to possible significance of noncoding RNAs (ncRNAs) in the new millennium. The occasional identification of ncRNAs in early 2000s has in recent years transformed into a ‘tsunami’, resulting in concepts of ‘selfish’ or ‘junk’ DNA themselves becoming junk. There is now increasing realization that the subtle and large phenotypic effects of heterochromatin and the existence of diverse nucleus-limited RNAs reported through painstaking genetic and biochemical studies that were undertaken before molecular biology had grown fully, can be largely related to the enormous diversity of short and long ncRNAs now known to be produced by all genomes. Although Crick’s proposal of the Central Dogma was only about the directions of information transfer, its mis-interpretation due to the great emphasis on the central roles of proteins and the reductionist linear approach of molecular biology that led to widespread belief in concepts of 'selfish' or 'junk' DNA, delayed the appreciation of multi-dimensional roles that ncRNAs actually play in maintaining homeostasis in complex biological networks.
There are two (major) things that bother me about this review. First, even if there are 100,000 functional noncoding RNA genes—an absurdly high number—that would still only account for a few percent of the genome. The logic of the argument against junk DNA is fatally flawed.

Five Things You Should Know if You Want to Participate in the Junk DNA DebateSecond, there is an extensive literature on the subject. It includes papers that discuss the actual history and papers that discuss the role (or not) of noncoding RNAs. Many of them defend junk DNA and point out the abundant evidence for the concept. Subhash Lakhotia ignores most of those papers in his review. This is not good science.

It's 2018, why are papers like this one still getting published? What happened to peer review?

I close with another quotation from this paper. The irony is palpable.
The present review briefly examines history of development of these concepts and how misunderstanding and/or mis-interpretation of some concepts thwarted the appreciation of great functional significance of the noncoding RNAs in biological organization.

NOTE: Most biochemists and molecular biologists had a very protein-centric view of genes and gene expression. They believed that a gene could be defined as a DNA sequence that encodes a protein. I do not dispute that claim. Indeed, I suspect that it is still the dominant view today—it certainly is the view taught to undergraduates by most professors. However, one should not write the history of an idea based on the misunderstandings of the average scientist outside of the field. It's the experts who count. Those experts had good reasons to believe that most of our genome is junk and those reasons are as valid today as they were forty years ago.

Subhash Lakhotia has been sent a link to this post. Looking forward to his response.

Lakhotia, S.C. (2018) Central Dogma, Selfish DNA and Noncoding RNAs: a Historical Perspective. Proceedings of the Indian National Science Academy. doi: [PDF]

Peter Larsen: “There is no such thing as ‘junk DNA'”

The March 2018 issue of Chromosome Research is a Special Issue on Transposable Elements and Genome Function. I found it as I was doing my routine search for papers on junk DNA in order to see whether scientists are finally beginning to understand the issue. Peter Larsen (guest editor) wrote the introduction to the special issue. He says ...
There is no such thing as “junk DNA.” Indeed, a suite of discoveries made over the past few decades have put to rest this misnomer and have identified many important roles that so-called junk DNA provides to both genome structure and function (this special issue; Biémont and Vieira 2006; Jeck et al. 2013; Elbarbary et al. 2016; Akera et al. 2017; Chen and Yang 2017; Chuong et al. 2017). Nevertheless, given the historical focus on coding regions of the genome, our understanding of the biological function of non-coding regions (e.g., repetitive DNA, transposable elements) remains somewhat limited, and therefore, all those enigmatic and poorly studied regions of the genome that were once identified as junk are instead best viewed as genomic “dark matter.”

This is very disappointing. Anyone working on transposons should know that more than half of our genome is composed of various bits and pieces of defective transposons. Nobody has ever provided convincing evidence that most of that flotsam and jetsam is functional. The default explanation is that it is junk and that makes a lot of sense since it certainly looks like junk.

Larsen proposes that transposable elements are involved in the third and fourth dimensions of the genome. The third dimension is DNA & chromatin structure and the fourth dimension is time-related biological processes. He provides no evidence that half of our genome plays a functional role in these "dimensions."

There is evidence that some transposon-related sequences have been co-opted to perform regulatory and structural roles but that doesn't mean that all of them do. That crazy form of argument has been ridiculed so many times that I'm surprised to see it resurface in 2018. It's almost as though the scientists who use it don't even read the literature on junk DNA.

Five Things You Should Know if You Want to Participate in the Junk DNA DebateFurthermore, the evidence for junk DNA is not confined to speculation about the role of transposon fragments. There's lots of other data that must be refuted before you announce the death of junk DNA. If you don't know what that evidence is, then you have no business writing about the subject.

I'm also annoyed about sloppy use of the term "dark matter." As far as I can tell, it's an attempt to: (1) shift the burden of proof, and (2) glamorize ignorance. The default explanation for transposon fragments is junk. The burden of proof is on those who want to prove function. By saying that it's "dark matter" they ignore the default explanation and shift the burdon of proof on to those who say it's junk DNA. The glamorous part is due to associating the term with the dark matter of the universe. There's plenty of evidence for the existence of that kind of dark matter even though astronomers don't know exactly what it's composed of. The idea here is that by referring to the 'dark matter' of the genome you imply that there really is something mysterious and important going on but we just don't know what it is.

That's not true. We know a lot about genomes and there are no great mysteries [What's In Your Genome? - The Pie Chart]. We know that most of the human genome is junk in spite of what Peter Larsen says.

Can someone explain what's going on? There really isn't much of a controversy any more. Knowledgeable scientists have examined the data and concluded that about 90% of our genome is junk. How can you write about junk DNA without mentioning that data and how does an article like this get past peer review?

Peter Larsen has received a link to this post. I'm looking forward to his response.

Larsen, P.A. (2018) Transposable elements and the multidimensional genome. Chromosome Research, 26:1-3. [doi: 10.1007/s10577-018-9575-2]

What’s In Your Genome? – The Pie Chart

Here's my latest compilation of the composition of the human genome. It's depicted in the form of a pie chart.1 [UPDATED: March 29, 2018]

There are several ways of estimating the amount of functional DNA and the amount of junk DNA. All of them are approximations but they only differ by a few percent. Note that several categories overlap. For example, introns and pseudogenes contain substantial amounts of DNA derived from transposons. The total amount of transposon-related sequence is about 60% when you include this fraction.

Here's the list of DNA sequences that are known or presumed to have a function (i.e. they are not junk).
  • functional parts of protein-coding genes (mostly coding regions): 1%
  • functional parts of genes for likely noncoding RNAs: 1%
  • regulatory sequences: 0.2%
  • scaffold attachment regions (SARs): 0.3%
  • origins of replication: 0.3%
  • centromeres: 1%
  • telomeres: 0.1%
  • functional virus sequences: 0.1%
  • functional transposons: 0.1%
  • conserved sequences of unknown function: ~3.9% (maximum)
This adds up to 8% of the genome. The remaining 92% is junk.

Most of the junk consists of: (1) very obvious examples of broken genes (pseudogenes 5%); (2) bits and pieces of transposon sequences that used to be capable of transposing but have mutated over time (45%); and (3) ancient viral sequences that have degenerated (9%). That's 59% of the genome that's clearly junk DNA. In addition, there's plenty of evidence that most intron sequences are dispensable. That accounts for another 28% of the genome. The total amount of junk DNA is at least 87%.

Note that protein-coding genes take up about 23% of the genome (1% exons, 22% introns). Genes for functional noncoding RNAs take up an additional 7% of the genome (1% exons, 6% introns). (Much of the functional region of noncoding RNA genes consists of 300 copies of ribosomal RNA genes (0.4%).) The important point is that roughly 30% of the genome is genes when we define a gene as a DNA sequence that's transcribed. A lot of this is junk within introns.

Also keep in mind that the well-characterized functional parts of the genome account for about 4% of the total but the functional regions of genes are only half of this total. Thus, we know that genes make up less than half of the total functional DNA in the human genome. This fact is not widely known even though the data is half-a-century old. I guess it takes some scientists a long time to learn the facts about the human genome.

1. I have to use a pie chart because they were invented by my wife's ancestor, William Playfair.