Improvements made to genomes FTP site

We’ve been making improvements to the contents of NCBI’s genomes FTP site. Highlights include: addition of new file types, including a feature_count.txt file with counts of gene, RNA, and CDS features of specific types and a translated_cds.faa file with conceptual … Continue reading

Philosophers talking about genes

It's important to define what you mean when you use the word "gene." I use the molecular definition since most of what I write refers to DNA sequences. There's no perfect definition but, for most purposes, a good working definition is: A gene is a DNA sequence that is transcribed to produce a functional product. [What Is a Gene?].

There are two types of genes: protein-coding genes and those that specify a functional noncoding RNA (i.e ribosomal RNA, lincRNA). The gene is the part of the DNA that's transcribed so it includes introns. Transcription is controlled by regulatory sequences such as promoters, operators, and enhancers but these are not part of the gene.

In addition to genes, there are many other functional parts of the genome. In the case of eukaryotic genomes, these include centromeres, telomeres, origins of replication, SARs, and some other bits. None of this is new ... these functions have been known for decades and the working definition I use has been common among knowledgeable experts for half-a-century. Scientists know what they are talking about when they say that the human genome contains about 20,000 protein-coding genes and at least 5,000 genes for non-coding RNAs. They are comfortable with the idea that our genome has lots of other functional regions that lie outside of the genes.

Non-experts may not be familiar with the topic and they may have many misconceptions about genes and DNA sequences but we don't base our science on the views of non-experts.

Because of my interest in this topic, I was intrigued by the title of a new book, The Gene: from Genetics to Postgenomics. I ordered it a soon as I heard about it and I've just finished reading it. The version I read has been translated from German by Adam Bostanci.

The authors are Hans-Jörg Rheimberger of the Max Planck Institute for the History of Science in Berlin, Germany, and Staffan Müller-wille of the Centre for the study of the Life Sciences at the University of Exeter, UK. They are philosophers. They have two goals in mind: (1) to cover the history of the gene concept, and (2) to demonstrate that recent discoveries have radically undermined the concept of a gene.

... those ignorant of history are not condemned to repeat it; they are merely destined to be confused.

Stephen Jay Gould
Ontogeny and Phylogeny (1977)
They have only partially achieved the first goal. They recognize that the word "gene" can be used in many different contexts. In the first half of the twentieth century it referred almost exclusively to a unit of heredity or a unit of selection (or, more correctly, a unit of evolution). With the recognition that DNA was the genetic material, the word "gene" took on an additional meaning as a physical unit of function. In other words, acquired a physical form in contrast to the nebulous genetic meaning of the word. This is the molecular gene. It's at this point in their book that the authors lose their way. They never give us a molecular definition. I suspect they are thinking of a gene as coding sequences but you have to struggle to interpret their view of the molecular definition. They talk about "structural genes" and imply that the discovery of "regulatory genes" altered our concept of the gene but these terms were never used by experts in the way that the authors imagine (p. 66).

The authors never discuss the definition I prefer. It's not clear they have even considered it since they rely on the work of other philosophers who have also ignored it [see Debating philosophers: The molecular gene].

The problem with this part of the book (the part about the molecular gene) is that the authors seem to be confused about the difference between a molecular gene and the view that "genes" are the only thing that count in genetics, evolution, metabolism etc. They seem to think that the gene-centric view requires that everything be attributed to DNA sequences that encode proteins. Thus, when they recognize that important functional elements exist outside of genes, they conclude that the gene-centric view is fatally flawed. This leads us to their second goal where they try to convince us that the definition of "gene" is fatally flawed because genes aren't the only things that play an important role in genetics.

They fail in this goal because they are arguing against a strawman version of biology that no experts believe in.

This seems to be a common problem among philosophers. They refuse to use critical thinking to unravel the meaning of the molecular gene —a meaning that is really quite simple even though it's not perfect. Then they confuse themselves by thinking that knowledgeable experts use the word "gene" as a synonym for all functional sequences in the genome. Finally, they misunderstand the term "gene-centric" where the word "gene" is used metaphorically to refer to any DNA sequence that functions in population genetics and evolution. (Philosophers also tend to greatly over-estimate the influence of Richard Dawkins and the selfish gene.)

& Junk DNA
The book contains all the usual misconceptions that come from reading the uniformed literature and assuming it represent the views of experts. Here's a short list of views that have been effectively challenged—and sometimes refuted— in the scientific literature ...
  1. Scientists were surprised that the human genome didn't contain 100,000 genes or more (p. 84)
  2. Crick's sequence hypothesis is no longer valid (p. 68)
  3. junk DNA is just a term used to describe DNA of no known function (p. 69)
  4. alternative splicing means that most genes can make many different proteins (pp. 70, 84, 107)
  5. evolutionary-developmental biology (evo-devo) threatens our understanding of the gene concept (pp. 88, 94-98)
  6. the ENCODE results have transformed our understanding of genes and genomes (p. 91)
  7. "the existence of epigenetic systems of inheritance poses the greatest challenge for the classical molecular gene concept" (p. 92)
  8. the discovery of Lamarckian inheritance casts doubt on the central dogma of molecular genetics (p. 92)
  9. plasticity is a problem (p. 98)
  10. 98% of the genome was thought to be junk but, thanks to ENCODE, we now know that it's full of regulatory elements (pp. 104-105)
In addition to this list of the usual misunderstandings and misconceptions, the authors have come up with two others that are quite novel. I'll quote directly from page 84 and let you see for yourselves ...
[There are] ... two further unexpected results of the genome project that complemented each other but also pointed in opposite directions. First, comparisons of the human genome with those of other primates revealed a surprisingly high degree of sequence conservation. Given remarkable differences in the physical constitution of these closest relatives of Homo sapiens, in particular differences in the so-called higher, mental faculties as a consequence of several million years of evolution, this degree of genomic affinity was astonishing. Major changes in the phenotype were apparently compatible with relatively minor changes in the genotype. The second surprising finding was that the genomes of different human individuals exhibit considerable differences. This genetic polymorphism was not, however, necessarily accompanied by correspondingly pronounced phenotypic differences.

Observations of this kind presented a serious challenge for gene-centrism and prompted proponents of the big genome projects to herald the dawn of an age of "postgenomics" in which the whole cell and the whole organism would move into the limelight.1
A little learning is a     dangerous thing;
drink deep, or taste not the     Pierian spring:
there shallow draughts     intoxicate the brain,
and drinking largely     sobers us again.
                  Alexander Pope
This book was published in 2017. It was revised and updated at that time. The scientific literature is full of debate and discussion about the topics covered here but you won't find any mention of controversy in this book. This can't be blamed exclusively on philosophers since there are many scientists who also ignore the controversies over junk DNA, alternative splicing, evolutionary theory, epigenetics etc. Like Rheinberger and Müller-Wille, they are content with promoting only one side of the story—the one that corresponds to their biases. Perhaps one should expect better critical thinking from philosophers?

There's one way in which this book differs from similar books written by scientists [see Human genome books]. Whereas scientists tend to quote scientific papers, Rheinberger and Müller-Wille rely heavily of the views of other philosophers. I get the distinct impression that almost all philosophers of science have reached the same conclusions and they support those (mostly false) conclusions by referencing each other instead of going back to the scientific literature [see When philosophers talk about genomes] [Debating philosophers: The Lu and Bourrat paper].

The views in this book are remarkably similar to those of Evelyn Fox Keller who is a Professor Emerita in the History and Philosophy of Science at the Massachusetts Institute of Technology in Boston, USA. I have already commented on one of her articles, "The Postgenomic Genome," in a previous post [When philosophers talk about genomes]. She is quoted several times in this book and her misconceptions are the same as those expressed by Rheinberger and Müller-Wille. You should follow the link to see what she says about genes and junk DNA in order to see for yourselves how badly modern philosphers have misinterpreted the science.

1. If you don't immediately see what's wrong with these arguments then ask a question in the comments.

Required reading for the junk DNA debate

This is a list of scientific papers on junk DNA that you need to read (and understand) in order to participate in the junk DNA debate. It's not a comprehensive list because it's mostly papers that defend junk DNA and refute arguments for massive amounts of function. The only exception is the paper by Mattick and Dinger (2013).1 It's the only anti-junk paper that attempts to deal with the main evidence for junk DNA. If you know of any other papers that make a good case against junk DNA then I'd be happy to include them in the list.

If you come across a publication that argues against junk DNA, then you should immediately check the reference list. If you do not see some of these references in the list, then don't bother reading the paper because you know the author is not knowledgeable about the subject.

Brenner, S. (1998) Refuge of spandrels. Current Biology, 8:R669-R669. [PDF]

Brunet, T.D., and Doolittle, W.F. (2014) Getting “function” right. Proceedings of the National Academy of Sciences, 111:E3365-E3365. [doi: 10.1073/pnas.1409762111]

Casane, D., Fumey, J., et Laurenti, P. (2015) L’apophénie d’ENCODE ou Pangloss examine le génome humain. Med. Sci. (Paris) 31: 680-686. [doi: 10.1051/medsci/20153106023] [doi: PDF]

Doolittle, W.F. (2013) Is junk DNA bunk? A critique of ENCODE. Proc. Natl. Acad. Sci. (USA) published online March 11, 2013. [PubMed] [doi: 10.1073/pnas.1221376110]

Doolittle, W.F., Brunet, T.D., Linquist, S., and Gregory, T.R. (2014) Distinguishing between “function” and “effect” in genome biology. Genome biology and evolution 6, 1234-1237. [doi: 10.1093/gbe/evu098]

Doolittle, W.F., and Brunet, T.D. (2017) On causal roles and selected effects: our genome is mostly junk. BMC biology, 15:116. [doi: 10.1186/s12915-017-0460-9]

Eddy, S.R. (2012) The C-value paradox, junk DNA and ENCODE. Current Biology, 22:R898. [doi: 10.1016/j.cub.2012.10.002]

Eddy, S.R. (2013) The ENCODE project: missteps overshadowing a success. Current Biology, 23:R259-R261. [10.1016/j.cub.2013.03.023]

Graur, D. (2017) Rubbish DNA: The functionless fraction of the human genome Evolution of the Human Genome I (pp. 19-60): Springer. [doi: 10.1007/978-4-431-56603-8_2 (book)] [PDF]

Graur, D. (2017) An upper limit on the functional fraction of the human genome. Genome Biology and Evolution, 9:1880-1885. [doi: 10.1093/gbe/evx121]

Graur, D., Zheng, Y., Price, N., Azevedo, R. B., Zufall, R. A., and Elhaik, E. (2013) On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution published online: February 20, 2013 [doi: 10.1093/gbe/evt028

Graur, D., Zheng, Y., and Azevedo, R.B. (2015) An evolutionary classification of genomic function. Genome Biology and Evolution, 7:642-645. [doi: 10.1093/gbe/evv021]

Gregory, T. R. (2005) Synergy between sequence and size in large-scale genomics. Nature Reviews Genetics, 6:699-708. [doi: 10.1038/nrg1674]

Haerty, W., and Ponting, C.P. (2014) No Gene in the Genome Makes Sense Except in the Light of Evolution. Annual review of genomics and human genetics, 15:71-92. [doi:10.1146/annurev-genom-090413-025621]

Hurst, L.D. (2013) Open questions: A logic (or lack thereof) of genome organization. BMC biology, 11:58. [doi:10.1186/1741-7007-11-58]

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G. E., and Dekker, J. (2014) Defining functional DNA elements in the human genome. Proc. Natl. Acad. Sci. (USA) 111:6131-6138. [doi: 10.1073/pnas.1318948111]

Mattick, J. S., and Dinger, M. E. (2013) The extent of functionality in the human genome. The HUGO Journal, 7:2. [doi: 10.1186/1877-6566-7-2]

Five Things You Should Know if You Want to Participate in the Junk DNA DebateMorange, M. (2014) Genome as a Multipurpose Structure Built by Evolution. Perspectives in biology and medicine, 57:162-171. [doi: 10.1353/pbm.2014.000]

Niu, D. K., and Jiang, L. (2012) Can ENCODE tell us how much junk DNA we carry in our genome?. Biochemical and biophysical research communications 430:1340-1343. [doi: 10.1016/j.bbrc.2012.12.074]

Ohno, S. (1972) An argument for the genetic simplicity of man and other mammals. Journal of Human Evolution, 1:651-662. [doi: 10.1016/0047-2484(72)90011-5]

Ohno, S. (1972) So much "junk" in our genome. In H. H. Smith (Ed.), Evolution of genetic systems (Vol. 23, pp. 366-370): Brookhaven symposia in biology.

Palazzo, A.F., and Gregory, T.R. (2014) The Case for Junk DNA. PLoS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]

Rands, C. M., Meader, S., Ponting, C. P., and Lunter, G. (2014) 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage. PLOS Genetics, 10:e1004525. [doi: 10.1371/journal.pgen.1004525]

Thomas Jr, C.A. (1971) The genetic organization of chromosomes. Annual review of genetics, 5:237-256. [doi:]

1. The paper by Kellis et al. (2014) is ambiguous. It's clear that most of the ENCODE authors are still opposed to junk DNA even though the paper is mostly a retraction of their original claim that 80% of the genome is functional.

Subhash Lakhotia: The concept of ‘junk DNA’ becomes junk

Continuing my survey of recent papers on junk DNA, I stumbled upon a review by Subash Lakhotia that has recently been accepted in The Proceedings of the Indian National Science Academy (Lakhotia, 2018). It illustrates the extent of the publicity campaign mounted by ENCODE and opponents of junk DNA. In the title of this post, I paraphrased a sentence from the abstract that summarizes the point of the paper; namely, that the 'recent' discovery of noncoding RNAs refutes the concept of junk DNA.

Lakhotia claims to have written a review of the history of junk DNA but, in fact, his review perpetuates a false history. He repeats a version of history made popular by John Mattick. It goes like this. Old-fashioned scientists were seduced by Crick's central dogma into thinking that the only important part of the genome was the part encoding proteins. They ignored genes for noncoding RNAs because they didn't fit into their 'dogma.' They assumed that most of the noncoding part of the genome was junk. However, recent new discoveries of huge numbers of noncoding RNAs reveal that those scientists were very stupid. We now know that the genome is chock full of noncoding RNA genes and the concept of junk DNA has been refuted.

Here's the abstract ...
Major discoveries like the one gene-one enzyme hypothesis, demonstration of DNA as the genetic material and finally the elucidation of the double helical structure of DNA in 1940s and early 1950s set the stage for emergence of molecular biology. Parallel cell biological studies during this period also indicated a correlation between rate of protein synthesis in a cell and the amount of cytoplasmic RNA. Following the proposal of George Gamow, a physicist, about the triplet genetic code and possible involvement of RNA in the transfer of information from DNA to proteins, Crick proposed the 'central dogma of molecular biology' to suggest the paths of information transfer between nucleic acids and proteins, with the limitation that the information cannot flow back from protein to nucleic acids. With emphasis on proteins as the central phenotypic determinants and the continuing enigma of heterochromatin, which largely appeared to be ‘gene desert’, enriched in repetitive DNA sequences and claimed to be inert in transcription, the many observations in 1960s of a large variety of heterogeneous nuclear RNAs remained ignored. Curiosity in the nuclear RNAs that do not see the face of cytoplasm appeared to be quelled by concepts of ‘selfish’ or ‘junk’ DNA in the early 1980s, notwithstanding the fact that active transcription of typical heterochromatin regions and repetitive and other noncoding DNA sequences was well demonstrated in 1960s and 1970s. With a few exceptions like the hsrω and roX transcripts in Drosophila and the Xist RNAs in mammals, the noncoding RNAs remained largely ignored for nearly two decades. The discovery of RNA interference and sequencing of different eukaryotic genomes, including the human genome, led to revisits to possible significance of noncoding RNAs (ncRNAs) in the new millennium. The occasional identification of ncRNAs in early 2000s has in recent years transformed into a ‘tsunami’, resulting in concepts of ‘selfish’ or ‘junk’ DNA themselves becoming junk. There is now increasing realization that the subtle and large phenotypic effects of heterochromatin and the existence of diverse nucleus-limited RNAs reported through painstaking genetic and biochemical studies that were undertaken before molecular biology had grown fully, can be largely related to the enormous diversity of short and long ncRNAs now known to be produced by all genomes. Although Crick’s proposal of the Central Dogma was only about the directions of information transfer, its mis-interpretation due to the great emphasis on the central roles of proteins and the reductionist linear approach of molecular biology that led to widespread belief in concepts of 'selfish' or 'junk' DNA, delayed the appreciation of multi-dimensional roles that ncRNAs actually play in maintaining homeostasis in complex biological networks.
There are two (major) things that bother me about this review. First, even if there are 100,000 functional noncoding RNA genes—an absurdly high number—that would still only account for a few percent of the genome. The logic of the argument against junk DNA is fatally flawed.

Five Things You Should Know if You Want to Participate in the Junk DNA DebateSecond, there is an extensive literature on the subject. It includes papers that discuss the actual history and papers that discuss the role (or not) of noncoding RNAs. Many of them defend junk DNA and point out the abundant evidence for the concept. Subhash Lakhotia ignores most of those papers in his review. This is not good science.

It's 2018, why are papers like this one still getting published? What happened to peer review?

I close with another quotation from this paper. The irony is palpable.
The present review briefly examines history of development of these concepts and how misunderstanding and/or mis-interpretation of some concepts thwarted the appreciation of great functional significance of the noncoding RNAs in biological organization.

NOTE: Most biochemists and molecular biologists had a very protein-centric view of genes and gene expression. They believed that a gene could be defined as a DNA sequence that encodes a protein. I do not dispute that claim. Indeed, I suspect that it is still the dominant view today—it certainly is the view taught to undergraduates by most professors. However, one should not write the history of an idea based on the misunderstandings of the average scientist outside of the field. It's the experts who count. Those experts had good reasons to believe that most of our genome is junk and those reasons are as valid today as they were forty years ago.

Subhash Lakhotia has been sent a link to this post. Looking forward to his response.

Lakhotia, S.C. (2018) Central Dogma, Selfish DNA and Noncoding RNAs: a Historical Perspective. Proceedings of the Indian National Science Academy. doi: [PDF]

Peter Larsen: “There is no such thing as ‘junk DNA'”

The March 2018 issue of Chromosome Research is a Special Issue on Transposable Elements and Genome Function. I found it as I was doing my routine search for papers on junk DNA in order to see whether scientists are finally beginning to understand the issue. Peter Larsen (guest editor) wrote the introduction to the special issue. He says ...
There is no such thing as “junk DNA.” Indeed, a suite of discoveries made over the past few decades have put to rest this misnomer and have identified many important roles that so-called junk DNA provides to both genome structure and function (this special issue; Biémont and Vieira 2006; Jeck et al. 2013; Elbarbary et al. 2016; Akera et al. 2017; Chen and Yang 2017; Chuong et al. 2017). Nevertheless, given the historical focus on coding regions of the genome, our understanding of the biological function of non-coding regions (e.g., repetitive DNA, transposable elements) remains somewhat limited, and therefore, all those enigmatic and poorly studied regions of the genome that were once identified as junk are instead best viewed as genomic “dark matter.”

This is very disappointing. Anyone working on transposons should know that more than half of our genome is composed of various bits and pieces of defective transposons. Nobody has ever provided convincing evidence that most of that flotsam and jetsam is functional. The default explanation is that it is junk and that makes a lot of sense since it certainly looks like junk.

Larsen proposes that transposable elements are involved in the third and fourth dimensions of the genome. The third dimension is DNA & chromatin structure and the fourth dimension is time-related biological processes. He provides no evidence that half of our genome plays a functional role in these "dimensions."

There is evidence that some transposon-related sequences have been co-opted to perform regulatory and structural roles but that doesn't mean that all of them do. That crazy form of argument has been ridiculed so many times that I'm surprised to see it resurface in 2018. It's almost as though the scientists who use it don't even read the literature on junk DNA.

Five Things You Should Know if You Want to Participate in the Junk DNA DebateFurthermore, the evidence for junk DNA is not confined to speculation about the role of transposon fragments. There's lots of other data that must be refuted before you announce the death of junk DNA. If you don't know what that evidence is, then you have no business writing about the subject.

I'm also annoyed about sloppy use of the term "dark matter." As far as I can tell, it's an attempt to: (1) shift the burden of proof, and (2) glamorize ignorance. The default explanation for transposon fragments is junk. The burden of proof is on those who want to prove function. By saying that it's "dark matter" they ignore the default explanation and shift the burdon of proof on to those who say it's junk DNA. The glamorous part is due to associating the term with the dark matter of the universe. There's plenty of evidence for the existence of that kind of dark matter even though astronomers don't know exactly what it's composed of. The idea here is that by referring to the 'dark matter' of the genome you imply that there really is something mysterious and important going on but we just don't know what it is.

That's not true. We know a lot about genomes and there are no great mysteries [What's In Your Genome? - The Pie Chart]. We know that most of the human genome is junk in spite of what Peter Larsen says.

Can someone explain what's going on? There really isn't much of a controversy any more. Knowledgeable scientists have examined the data and concluded that about 90% of our genome is junk. How can you write about junk DNA without mentioning that data and how does an article like this get past peer review?

Peter Larsen has received a link to this post. I'm looking forward to his response.

Larsen, P.A. (2018) Transposable elements and the multidimensional genome. Chromosome Research, 26:1-3. [doi: 10.1007/s10577-018-9575-2]

What’s In Your Genome? – The Pie Chart

Here's my latest compilation of the composition of the human genome. It's depicted in the form of a pie chart.1 [UPDATED: March 29, 2018]

There are several ways of estimating the amount of functional DNA and the amount of junk DNA. All of them are approximations but they only differ by a few percent. Note that several categories overlap. For example, introns and pseudogenes contain substantial amounts of DNA derived from transposons. The total amount of transposon-related sequence is about 60% when you include this fraction.

Here's the list of DNA sequences that are known or presumed to have a function (i.e. they are not junk).
  • functional parts of protein-coding genes (mostly coding regions): 1%
  • functional parts of genes for likely noncoding RNAs: 1%
  • regulatory sequences: 0.2%
  • scaffold attachment regions (SARs): 0.3%
  • origins of replication: 0.3%
  • centromeres: 1%
  • telomeres: 0.1%
  • functional virus sequences: 0.1%
  • functional transposons: 0.1%
  • conserved sequences of unknown function: ~3.9% (maximum)
This adds up to 8% of the genome. The remaining 92% is junk.

Most of the junk consists of: (1) very obvious examples of broken genes (pseudogenes 5%); (2) bits and pieces of transposon sequences that used to be capable of transposing but have mutated over time (45%); and (3) ancient viral sequences that have degenerated (9%). That's 59% of the genome that's clearly junk DNA. In addition, there's plenty of evidence that most intron sequences are dispensable. That accounts for another 28% of the genome. The total amount of junk DNA is at least 87%.

Note that protein-coding genes take up about 23% of the genome (1% exons, 22% introns). Genes for functional noncoding RNAs take up an additional 7% of the genome (1% exons, 6% introns). (Much of the functional region of noncoding RNA genes consists of 300 copies of ribosomal RNA genes (0.4%).) The important point is that roughly 30% of the genome is genes when we define a gene as a DNA sequence that's transcribed. A lot of this is junk within introns.

Also keep in mind that the well-characterized functional parts of the genome account for about 4% of the total but the functional regions of genes are only half of this total. Thus, we know that genes make up less than half of the total functional DNA in the human genome. This fact is not widely known even though the data is half-a-century old. I guess it takes some scientists a long time to learn the facts about the human genome.

1. I have to use a pie chart because they were invented by my wife's ancestor, William Playfair.

What is “dark DNA”?

Some DNA sequencing technologies aren't very good at sequencing and assembling DNA that's rich in GC base pairs. What this means is that some sequenced genomes could be missing stretches of GC-rich DNA if they rely exclusively on those techniques. This difficult-to-sequence DNA was called "dark DNA" in a paper published last summer (July 2017).

The paper looked at some missing genes in the genome of the sand rat Psammomys obesus. The authors initially used a standard shotgun strategy in order to sequence the sand rat genome. They combined millions of short reads (<200 bp) to assemble a complete genome. A large block of genes seemed to be missing—genes that were conserved and present in the genomes of related species (Hargraves et al., 2017). They knew the genes were present because they could detect the mRNAs corresponding to those genes.

Hargraves et al. isolated GC-rich DNA and sequenced it using a different technique. This revealed the missing DNA and the missing genes. As expected, the entire block of DNA, containing 88 genes, had a high percentage of GC base pairs relative to AT base pairs. The authors attribute this to insertion of GC-rich repeats and to a phenomenon known as "gene conversion."

Gene conversion, or more appropriately, biased gene conversion, is associated with recombination. Recombination results in stretches of DNA containing mismatched base pairs such as A:C or G:T. The mismatches must be repaired to restore the normal GC and AT base pairs. There's plenty of evidence showing a bias in the repair process such that the final product favors GC pairs over AT pairs. This is biased gene conversion.1

Biased gene convesion leads to a gradual increase in GC content in regions of the genome that are hotspots for recombination. This is a well-understood and reasonable explanation of the GC-rich region in the sand rat genome.

Biased gene conversion and GC-rich regions are not new. What's new in the paper is the idea that large regions of the genome may be missing from a genome assembly because of limitations of standard sequencing technology. This is "dark DNA."

A species related to the sand rat also has a high GC content in the same region suggesting that the shift to high GC content occurred before their last common ancestor. GC-rich genes are missing from the chicken genome assembly suggesting that dark DNA may be more common than anyone suspected.

If that's all there is to the story we probably wouldn't have head about it. However, the lead author of the paper, Adam Hargreaves, is mainly interested in how changes in the genome can lead to innovation and adaptation. He wrote an article last summer for The Conversation in which he emphasized the possible role of dark DNA in evolution [Introducing ‘dark DNA’ – the phenomenon that could change how we think about evolution].

He said,
Most textbook definitions of evolution state that it occurs in two stages: mutation followed by natural selection. DNA mutation is a common and continuous process, and occurs completely at random. Natural selection then acts to determine whether mutations are kept and passed on or not, usually depending on whether they result in higher reproductive success. In short, mutation creates the variation in an organism’s DNA, natural selection decides whether it stays or if it goes, and so biases the direction of evolution.

But hotspots of high mutation within a genome mean genes in certain locations have a higher chance of mutating than others. This means that such hotspots could be an underappreciated mechanism that could also bias the direction of evolution, meaning natural selection may not be the sole driving force.

So far, dark DNA seems to be present in two very diverse and distinct types of animal. But it’s still not clear how widespread it could be. Could all animal genomes contain dark DNA and, if not, what makes gerbils and birds so unique? The most exciting puzzle to solve will be working out what effect dark DNA has had on animal evolution.
This is pure speculation. There's only a hint of this idea in the original paper. It's true that hotspots of mutation are going to show more variation than other regions of the genome. That's just common sense. The question that's important is whether there's some underlying selection for hotspots in order to shift the species in a certain direction. The other, more likely, possibility is that the formation of hotspots is fortuitous and evolution just has to cope with the problem.

The role of hotspots in evolution is an interesting question but Hargreaves seems to be capitalizing on the sexy term "dark DNA" when, in fact, he's just speculating that hotspots of recombination may play a role in evolution. The hotspot regions may or may not be "dark DNA" depending on how you sequence a genome. If you use the right sequencing methods then the DNA won't be "dark" at all.

Hargraves followed up on his popularity by publishing an article in a recent issue of New Scientist (March 10-16, 2018). The title is Dark DNA: The missing matter at the heart of nature.2 It made the cover of the magazine.

As you can see, the title conveys the idea that there's a connection between "dark DNA" and "dark matter." The former, like the latter, is supposed to be some mysterious stuff that scientists can't explain. But, as I pointed out above, we have a perfectly good explanation of "dark DNA"—it's GC-rich DNA that can't easily be sequenced using some sequencing technologies.

Here's how Hargreaves hypes his work in the New Scientist article ....
The discovery of dark DNA is so recent that we are still trying to work out how widespread it is and whether it benefits those species that possess it. However, its very existence raises some fundamental questions about genetics and evolution. We may need to look again at how adaptation occurs at the molecular level. Controversially, dark DNA might even be a driving force of evolution.
It's true that we don't know whether extensive GC-rich regions are rare or common. The evidence so far suggests they are not common judging by the quality of the genomes that have been sequenced. Thus, I believe that Hargraves is misleading readers on this point.

In the absence of evidence, we assume that the GC-rich regions does not benefit the species—this is the default assumption. I think it's misleading to speculate that hotspots benefit the species.

In my opinion, the existence of large blocks of GC-rich regions of the genome does not raise fundamental questions about genetics and evolution. Hargreaves is wrong about that.

On the other hand, these articles do raise fundamental questions about the quality of science journalism and how we communicate with the public. Is it acceptable to hype your own work to make it seem far more important than it actually is? Do we, as scientists, have a responsibility to speak out against this behavior?

DNA Image Credit: Moran, L.A., Horton, H.R., Scrimgeour, K.G., and Perry, M.D. (2012) Principles of Biochemistry 5th ed., Pearson Education Inc. page 581 © Pearson/Prentice Hall

1. A mismatched A:C base pair can be converted by removing and substituting either base. If gene conversion were unbiased then the repair would yield A:T and G:C pairs at the same frequency. Instead, there is a higher probability of generating the G:C product.

2. This is the title of the online version of the article. The print version title is The hunt for dark DNA.

Hargreaves, A.D., Zhou, L., Christensen, J., Marlétaz, F., Liu, S., Li, F., Jansen, P.G., Spiga, E., Hansen, M.T., Pedersen, S.V.H., Biswas, S., Serikawa, K., Fox, B.A., Taylor, W.R., Mulley, J.F., Zhang, G., Heller, R.S., and Holland, P.W.H. (2017) Genome sequence of a diabetes-prone rodent reveals a mutation hotspot around the ParaHox gene cluster. Proc. Natl. Acad. Sci. (USA) 114:7677-7682. [doi: 10.1073/pnas.1702930114]

Making Sense of Genes by Kostas Kampourakis

Kostas Kampourakis is a specialist in science education at the University of Geneva, Geneva (Switzerland). Most of his book is an argument against genetic determinism in the style of Richard Lewontin. You should read this book if you are interested in that argument. The best way to describe the main thesis is to quote from the last chapter.

Here is the take-home message of this book: Genes were initially conceived as immaterial factors with heuristic values for research, but along the way they acquired a parallel identity as DNA segments. The two identities never converged completely, and therefore the best we can do so far is to think of genes as DNA segments that encode functional products. There are neither 'genes for' characters nor 'genes for' diseases. Genes do nothing on their own, but are important resources for our self-regulated organism. If we insist in asking what genes do, we can accept that they are implicated in the development of characters and disease, and that they account for variation in characters in particular populations. Beyond that, we should remember that genes are part of an interactive genome that we have just begun to understand, the study of which has various limitations. Genes are not our essences, they do not determine who we are, and they are not the explanation of who we are and what we do. Therefore we are not the prisoners of any genetic fate. This is what the present book has aimed to explain.
If you are interested in real facts about genes and the history of gene definitions, then you will be sorely disappointed because the author has fallen for the ENCODE hype. Similarly, if you want to know about genomes and junk DNA then don't read this book. The author takes his cues from Junk DNA by Nessa Carey and The Deeper Genome by John Parrington.

Genomes and junk are the topics that interest me so let's look at some other excerpts from the book, keeping in mind that the main part of the book is about genetic determinism and the large-scale phenotypic effects of genes and alleles.

The concept of a "gene" was poorly defined in the first part of the twentieth century. That fuzzy definition is still common today. It imagines a gene as a nebulous entity responsible for some visible trait. It's the way most people still think of a gene and it's the way students are often taught when they study genetics. Kostas Kampourakis does a pretty good job of describing the history of this idea up until 1953.

The next stage is something he calls the "molecularization" of genes. That's the transformation from a gene as the subject of genetics to the idea that a gene is the subject of biochemistry and molecular biology. This is an important shift and the author is justified in emphasizing the transformation.

From this point on, the book gets pretty confusing. The part I like is that the author doesn't get bogged down in the old-fashioned idea that genes only encode proteins. From fairly early on in the book he recognizes that a gene can specify either a protein or a functional RNA.1 So far, so good.

The problems begin when he starts describing all the things that make a precise definition of a gene so difficult. Rather than treat these as exceptions that can be accommodated by a good working definition [What Is a Gene?], he focuses on the problems ...
Regulatory sequences, discontinuous genes, overlapping genes, trans-splicing, RNA editing, among other things, have made impossible the structural individualization of genes on DNA. Looking more closely into the phenomena presented in this chapter might make one argue that the RNA transcript should be considered as the "true" gene. ... The important conclusion from all these phenomena is that DNA does not contain distinct segments corresponding to the genes it is supposed to contain, or, in other words, that genes cannot be structurally individuated. These phenomena can therefore put the existence of genes into doubt. Do genes really exist? Perhaps they are a heuristic tool for research but nevertheless a human invention that we are still trying to force into existence.
Kampourakis has created a problem for himself by failing to point out that there are functional DNA sequences that don't count as genes using the molecular definition (regulatory sequences, centromeres, origins of replication) but do count as "genes" in the classic genetic sense since mutations in these sequences can produce an effect on the organism. His description would be much clearer if he had made this distinction.

In addition, he got confused by reading the ENCODE papers and falling for their paradigm shaft about the nature of genes [What is a gene, post-ENCODE?].

Now let's look at how the author deals with junk DNA. It's the subject of Chapter 11: "Genomes Are More than the Sum of Genes." That's an interesting title. It's correct, of course, especially if you take into account essential DNA sequences that aren't genes. However, it's a bit late in the book to be bringing up this topic. Here's what he says on page 210.
Is 98 percent of our DNA meaningless, as in the [example] above? Is it really "junk," perhaps the relic of our evolutionary history during which DNA sequences were simply accumulated? The answer is no, and in this chapter I explain why. The relevant knowledge has been emerging during the recent years, and we have come to know that much of what we used to call "junk" DNA seems to have important functions, particularly in the regulation of the the expression of genes.
As I pointed out above, Kampourakis should have addressed this point early on when he was discussing how to define a gene. He left readers with the impression that the only important genome sequences were genes. He brings up the old canard that protein-coding regions are the only ones that count and all the rest was thought to be junk. Now he proposes to refute this strawman by explaining what he should have made clear 100 pages earlier.2

In fairness, he notes that the strawman view has been challenged in the past.
However, it should be noted that although the details have emerged recently, several researchers had been long aware that "junk" DNA was not entirely useless and that some DNA that does not code for proteins has important roles (Palazzo & Gregory, 2014).
I find it interesting that he quotes a four-year-old paper from my colleagues where they explain the real history of the problem. The details have not emerged recently as Kampourakis claims. We've known about important non-coding DNA for 50 years!

So, what is this recent data that calls into question the existence of junk DNA? You can probably guess the answer. Kampourakis recognizes that the genes for transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs) had been identified long ago. But then he says,
By that time [late 1960s], it had already become clear that nontranslated or noncoding RNA molecules, such as rRNA and tRNA, have an important role in gene expression. But as the ENCODE project showed, there are other functional sequences outside protein-coding genes, which encode certain noncoding RNA molecules. This led to the expanded definition of genes presented in Chapter 4, which includes the genes for noncoding RNA as well. Except for tRNA and rRNA, these genes encode other types of RNA molecules, such as small nuclear RNAs (snoRNAs) that are involved in RNA editing and micro RNAs (miRNAs) that have important regulatory functions. Although the details are still under study, the emerging evidence suggest there are a lot more genes encoding regulatory RNAs than proteins in the human genome (Morris & Mattick, 2014).
There are several things wrong with those sentences. For one thing, it totally misrepresents the history of the field. Noncoding RNAs such as snRNAs, miRNAs, and others were well known for many decades before ENCODE was started. Also, the definition of a gene as a DNA sequence that specifies a functional RNA was common in textbooks long before ENCODE. The ENCODE results did not prompt a serious revision of the definition of a gene in spite of the claims of ENCODE researchers. Finally, it is not true that there are more genes for regulatory RNAs than for proteins. (There are about 20,000 protein-coding genes.) The final results are not in but it's very unlikely that there are 20,000 different genes for noncoding RNAs. And even if that statement turns out to be true, it doesn't represent a significant fraction of the genome.

It's clear that Kampourakis is solidly in ENCODE camp and it's clear that he does not understand the Palazzo & Gregory paper and does not understand the evidence for junk DNA [Five Things You Should Know if You Want to Participate in the Junk DNA Debate].

Some beating of dead horses may be ethical, where here and there they display unexpected twitches that look like life.

Zuckerkandl and Pauling (1965)

Sandwalk readers are probably annoyed at me for beating a dead horse but here's the problem. It's been more that ten years since the initial ENCODE results were published and more than five years since the main results were published in 2012 (along with the massive publicity campaign). Criticisms of the ENCODE hype have been widely available in the scientific literature and elsewhere since 2007. Many experts in evolutionary biology have explained the evidence for junk DNA and pointed out the limitations of the ENCODE conclusions.

All of this information is available to anyone who studies the problem. All knowledgeable scientists recognize that the case for junk DNA is very strong. Kampoourakis addresses some of this criticism—notably the lack of conservation of presumed functional RNAs—but he ignores most of the other criticisms. Why? Why do so many authors perpetuate the ENCODE hype in the face of so much evidence that it's wrong? Is it because the publicity campaign organized by ENCODE researchers—with the help of Nature and Science—was so effective that it continues to overwhelm any attempt to correct the record? That's not a very good excuse for someone who is supposed to do the research before publishing a book on the subject of genes and genomes.

1. He's not very consistent. There are times in the second half of the book when he talks about genes as sequences that encode proteins.

2. Keep in mind that we include introns when we define a gene as a sequence that's transcribed. Thus, intron-containing protein-coding sequences make up 25% of our genome and known noncoding genes account for another 5%. Genes occupy 30% of our genome—a fact that should be mentioned in a book about genes.

Palazzo, A.F. and Gregory, T.R. (2014) The Case for Junk DNA. PLOS Genetics, 10:e1004351. [doi: 10.1371/journal.pgen.1004351]

Junk DNA and selfish DNA

Selfish DNA is a term that became popular with the publication of a series of papers in Nature in 1980. The authors were referring to viruses and transposons that insert themselves into a genome where they exist solely for the purposes of propagating themselves. These selfish DNA sequences are often thought, incorrectly, to be the same as the Selfish Genes of Richard Dawkins1 [Selfish genes and transposons]. In fact, "selfish genes" refers to the idea that some DNAs enhance fitness and the frequency of these genes will increase in a population through their effect on the vehicle that carries them. It's an adaptationist view of evolution. The selfish DNA of transposons and viruses is quite different. These sequences only propagate themselves—the fitness of the organism is largely irrelevant. These elements do not contribute directly to the adaptive evolution of the species.

Transposons and integrated viruses are are subjected to mutation just like the rest of the genome. Deleterious mutations cannot be purged by natural selection because inactivating a transposon has no effect on the fitness of the organism.2 As a result, large genomes are littered with defective transposons and bits and pieces of dead transposons. This is not selfish DNA by any definition. It is junk DNA [What's in Your Genome?].

It's important to remember that real selfish DNA makes up only a tiny percentage of the human genome. This is a fact that was not widely known in 1980 although some of the discussion back then alluded to the possibility.

This brings me to a recent article by Itai Yanai and Martin Lercher [Life doesn’t make trash]. They are the authors of The Society of Genes. I wrote a short review of this book where I said that my main beef was their over-emphasis on The Selfish Gene and their adaptationist approach to evolution [Human genome books].

The article in Eon continues the emphasis on selfish genes and adaptation. Read it yourself to see if Yanai and Lercher are adaptationists or not.

Most of the article is about junk DNA. You have to read very carefully to see that the authors have gotten the basic facts correct. They conclude that about 10% of our genome is functional based on the criterion that it is conserved—although I'm not sure that point comes across very clearly. They say,
There is good evidence for this 10 per cent. If we compare our genome to that of other mammals, we find that 90 per cent of the genome was free to change through random mutations. Those DNA letters apparently did not contribute to the efficiency of the survival machine, us. By contrast, mutations in the remaining 10 per cent were weeded out by natural selection because they would have compromised the DNA sequences’ ability to spread – either by damaging the survival machine’s functioning, or by reducing the sequences’ freeloading capacity. This is the definition of function that has traditionally been used by evolutionary biologists as well as by philosophers of science: if something is conserved by natural selection, then it is functional. Function, then, is identified as the feature that ensures the spread or maintenance of a particular DNA sequence.
So far, so good. I disagree with their description of the rest of the genome. They imply that most of it is selfish DNA composed of transposons like Alu's and LINE-1 sequences. I wish they had put more emphasis on the fact that much of our genome consists of defective transposons and viruses that are junk, plain and simple. They aren't selfish DNA today, although they once were in the past.

1. The confusion stems from the fact that Dawkins briefly mentioned these selfish DNAs in his book The Selfish Gene.

2. Strictly speaking, this isn't true. There may be some fitness advantage to eliminating transposons. In species with large populations, this small fitness advantage can lead to small genomes. That explains why most bacteria are not littered with defective transposons.

Human genome books

& Junk DNA

I'm trying to read all the recent books on the human genome and anything related. There are a lot of them. Here's a list with some brief comments. You should buy some of these books. There are others you should not buy under any circumstances.

The Deeper Genome: Why there is more to the human genome than meets the eye
by John Parrington
Oxford University Press (2015)
ISBN 978-0-19-968873-9

John Parrington is an Associate Professor in Molecular and Cellular Pharmacology at the University of Oxford (UK). He claims that most of our genome is functional (not junk) based largely on the results of the ENCODE study. He ignores most of the scientific evidence in favor of junk DNA. This is a very bad book [Georgi Marinov reviews two books on junk DNA] [John Parrington discusses genome sequence conservation].

Junk DNA: A journey Through the Dark Matter of the Genome
by Nessa Carey
Columbia University Press (2015)
ISBN 978-0-231-53941-8

Nessa Carey is former researcher in epigenetics. She is currently a science writer based in the United Kingdom. She claims that recent discoveries have revealed that most of the mysterious “dark matter” of the genome (formerly junk DNA) is actually required for the regulation of gene expression. This book is even worse than Parrington’s [Georgi Marinov reviews two books on junk DNA] [Teaching about genomes using Nessa Carey's book: Junk DNA] [Nessa Carey doesn't understand junk DNA]. It's even worse than the book written by ID proponent Jonathan Wells (see below). In fact, it's a classic example of everything that's wrong with modern science writing [On explaining science to the general public].

The Myth of Junk DNA
by Jonathan Wells
Discovery Institute Press (2011)
ISBN 978-1-9365990-0-4

Jonathan Wells has a Ph.D. in Molecular & Cell Biology from the University of California, Berkeley. He is a leading advocate of intelligent design. According to Wells, the idea that most of our genome is junk is a myth promoted by Darwinian scientists. The science in this book is far superior to the first two books on the list. Wells acknowledges and deals with the main evidence for junk DNA but he still reaches the wrong conclusion [The Myth of Junk DNA by Jonathan Wells].

Human Evolution: Genes, Genealogies and Phylogenies
by Graeme Finlay
Cambridge University Press (2013)
ISBN 978-1-107-04012-0

Graeme Finlay is a professor in the Department of Molecular Medicine and Pathology at the University of Auckland, Auckland, New Zealand. This is an excellent book on retroviruses, transposons, pseudogenes, and new (de novo) genes. Those topics are very well described at a fairly sophisticated level with an emphasis on their adaptive roles. Junk DNA is not discussed even though most of the sequences Finlay discusses are junk. The emphasis is on the possible evolutionary significance of co-opted sequences of pseudogenes giving the impression that they aren't junk [Human Evolution: Genes, Genealogies and Phylogenies by Graeme Finlay]. I agree with Norman Johnson when he says that the book is hyperadaptationist in tone [Making sense of the human genome].

Ancestors in Our Genome: The new science of human evolution
by Eugene E. Harris
Oxford University Press (2015)
ISBN 978-0-19-997803-8

Eugene Harris is a professor of Biological Sciences and Geology at City University of New York, New York (USA). He has written an excellent analysis of modern human evolution from a molecular evolution perspective. His description of some complex techniques; such as selective sweeps and coalescence are very good. His explanation of the difference between gene trees and species trees is excellent. The science is well above the level of some of the dumbed-down books at the top of this list. This is the best book I've ever read on the subject of random genetic drift. Harris understands that most of our genome is junk. Buy this book.

Inside the Human Genome: A case for non-intelligent design
by John C. Avise
Oxford University Press (2010)
ISBN 978-0-19-539343-9

John Avise is a molecular evolutionary biologist at the University of California, Irvine. His goal in this book is to demonstrate that our genome is sloppy and disorganized. It doesn’t look like it was intelligently designed. Avise writes from the perspective of someone who is opposed to intelligent design creationism but favors accommodation between science and religion. He is non-committal, but skeptical, about the view that most of our genome is junk DNA [Shoddy But Not "Junk"?]. The book is pretty good if you're looking for evidence to refute Intelligent Design Creationists. It's a whole lot better than his more recent book on 70 breakthroughs or paradigm shifts in biology [John Avise doesn't understand the Central Dogma of Molecular Biology].

Drawing the Map of Life: Inside the Human Genome Project
by Victor K. McElheny
Basic Books (2010, updated in the paperback edition 2012)
ISBN 978-0-465-02895-5

Victor McElheny is a science writer based in the United States. He has written an entertaining, and accurate, account of the Human Genome Project from an historical perspective. The book does not cover the implications of the genome sequence and it does not explain the science behind the work. McElheny thinks that the results of the human genome sequence were revolutionary—especially the surprisingly small number of gene. He thinks there’s very little junk DNA.

A Brief History of Everyone Who Ever Lived: The Stories in Our Genes
by Adam Rutherford
Weidenfeld & Nicholson (2016)
ISBN 978-0-297-60937-7

Adam Rutherford is a science writer with a Ph.D. in developmental genetics. While working at Nature in 2012 he was a prominent member of the team that helped hype the ENCODE results. His current view of the human genome is that we just don't know what most of it is doing. He doesn't spend any time at all on the evidence for junk DNA, nor does he explain how controversial the topic is. Nevertheless, the other parts of the book are excellent, especially the parts on our ancestors. I recommend it in spite of it's shortcomings.

The Mysterious World of the Human Genome
by Frank Ryan
William Collins (2015)
ISBN 978-0-00-754906-1

Frank Ryan is a physician, a science writer, and a leading member of The Third Way. His book purports to explain the human genome but it does nothing of the sort. Instead, Ryan has fallen for every bit of hype, and every "revolution," that has been promoted in the past 17 years since publication of the human genome sequence (e.g. alternative splicing, pervasive transcription, epigenetics, etc.). He rejects junk DNA, misinterprets the Central Dogma, and doesn't understand what a gene is. It's no wonder that the genome appears "mysterious" to someone like Frank Ryan since he ignores most of the relevant scientific literature while focusing on the "fact" that recent discoveries have challenged everything we thought we knew [Another failure: The Mysterious World of the Human Genome.

The Gene: An Intimate History
by Siddhartha Mukherjee
Scribner (2016)
ISBN 978-1-4767-3350-0

This is a very big book by Siddhartha Mukherjee. You might recognize the name—he's a physician who won a Pulitzer Price for an earlier book: The Emperor of all Maladies: A Biography of Cancer. In spite of the title of his latest book, you will get through 592 pages without knowing what a gene is and many other things [What is a "gene" and how do genes work according to Siddhartha Mukherjee?] [Siddhartha Mukherjee tries to correct his book]. Mukherjee is not interested in the human genome because his focus is on genes, genetic disease, and the future of genetic manipulation. Mathew Cobb has reviewed that part of his book [On the heredity trail].

Herding Hemingway's Cats: Understanding How Our Genes Work
by Kat Arney
Bloomsbury Sigma (2016)
ISBN 978-1-4729-1004-2

Here's what I wrote earlier about this book [Herding Hemingway's Cats by Kat Arney]. "Kat Arney is a science writer based in the UK. She has a Ph.D. from the University of Cambridge where she worked on epigenetics and regulation in mice. She also did postdoc work at Imperial College in London. Her experience in the field of molecular biology and gene expression shows up clearly in her book where she demonstrates the appropriate skepticism and critical thinking in her coverage of the major advances in the field." Buy this book.

The Society of Genes
by Itai Yanai and Martin Lercher
Harvard University Press (2016)
ISBN 978-0-674-42502-6

Itai Yanai is a professor at new York University, School of Medicine (New York, NY, USA) and Martin Lercher is a professor at Heinrich Heine University in Düsseldorf, Germany. I like this book because the authors get their facts right and they understand evolution, drift, and junk DNA. Unfortunately, this advantage is somewhat tainted by an over-emphasis on the cooperative nature of protein-coding genes—the society of genes. This gives rise to a somewhat adaptationist view of evolution and a misplaced admiration of Richard Dawkins and The Selfish Gene. Don't let this dissuade you from buying this book because there's lots of other good stuff in it. I recommend it.

The Age of Genomes: Tales from the Front Lines of Genetic Medicine
by Steven Monroe Lipkin with John R. Luma
Beacon Press (2016)
ISBN 978-0-8070-7458-9

Steven Monroe Lipkin is a clinical geneticist at Cornell Medical College in New York. This is a book about the limitations of genetics and why we have to be cautious about analyzing our genome. I highly recommend this book because it debunks some of the myths surrounding genetic testing. The author doesn't have much to say about the gross organization of human genome since he focuses on protein-coding genes.

Postgenomics: Perspectives on Biology after the Genome
edited by Sarah S. Richardson and Hallam Stevens
Duke University Press (2015)
ISBN 978-0-8223-5922-7

This book is a series of 12 essays by different authors discussing "new ways of thinking" about biology in light of recent advances in genomics. Some of the essays are ridiculous, such as Evelyn Fox Keller's diatribe against junk DNA and praise of John Mattick. Some of the essays are brilliant, such as the one by Rachel Ankeny and Sabina Leonelli on the importance of genome curation. Most of them are somewhere in between. There's no serious discussion of junk DNA and the controversy over function. The general tone of the book is that the human genome sequence contained lots of surprises and revelations and we still don't know what most of it does.

Adam and the Genome: Reading Scripture after Genetic Science
by Dennis R. Venema and Scot McKnight
Brazos Press (2017)
ISBN 978-8158-7433-948

There are two parts to this book. The first part is written by Dennis Venema, a Christian evangelical who teaches biology at Trinity Western University in Langley, British Columbia (Canada). He is an active member of BioLogos. His goal is to show that biology is compatible with Christianity and incompatible with Intelligent Design Creationism.1 He presents the science in a very straightforward and readable format that I greatly admire. Moreover, he gets it right, including evolution and the fact that our genome is full of junk. If you want a good overview of modern molecular genetics then this is the book to buy. The second part of the book is written by Scott McKnight, a professor of New Testament at Northern Baptist Theological Seminary in Lombard, Illinois (USA). Its purpose is to explain why the science in the first part of the book is compatible with the Biblical story of Adam and Eve. This is apologetics at its worst. I did not enjoy this part of the book.

Making Sense of Genes
by Kostas Kampourakis
Cambridge University Press (2017)
ISBN 978-1-107-12813-2

The best short review of this book is supplied by the author in the last chapter.

"Here is the take-home message of this book: Genes were initially conceived as immaterial factors with heuristic values for research, but along the way they acquired a parallel identity as DNA segments. The two identities never converged completely, and therefore the best we can do so far is to think of genes as DNA segments that encode functional products. There are neither 'genes for' characters nor 'genes for' diseases. Genes do nothing on their own, but are important resources for our self-regulated organism. If we insist in asking what genes do, we can accept that they are implicated in the development of characters and disease, and that they account for variation in characters in particular populations. Beyond that, we should remember that genes are part of an interactive genome that we have just begun to understand, the study of which has various limitations. Genes are not our essences, they do not determine who we are, and they are not the explanation of who we are and what we do. Therefore we are not the prisoners of any genetic fate. This is what the present book has aimed to explain."

Most of the book is an essay against genetic determinism in the style of Richard Lewontin. If you are interested in that argument then you should read this book. If you are interested in real facts about genes and the history of gene definitions then you will be sorely disappointed because the author has fallen for the ENCODE hype. Similarly, if you want to know about genomes and junk DNA don't read this book. The author takes his cues from Junk DNA by Nessa Carey and The Deeper Genome by John Parrington [see Making Sense of Genes by Kostas Kampourakis].

1. As you might imagine, the Intelligent Design Creationists are not happy about this. In recent months they have attacked Dennis repeatedly on their websites [e.g. Adam and the Genome and Citation Bluffing]. This is strange since we are told repeatedly that ID has nothing to do with Christianity or god(s) or Genesis.