Thesis defense: 50th anniversary

Today is the 50th anniversary of my Ph.D. oral defense. The event took place in the Department of Biochemical Sciences at Princeton University back in 1974. It began with a departmental seminar. When the seminar was over I retired with my committee to a small classroom for the oral exam.

I don't remember everyone who was on my committee. My Ph.D. supervisor (Bruce Alberts) was there, as was my second reader, Abe Worcel. I know Uli Laemmli was there and so was Arnie Levine. I'm pretty sure the external member of the committee was Nancy Nossal from NIH in Bethesda, MD (USA). It's a bit of a blur after all these years.

I remember being fairly confident about the exam. After five and a half years I was pretty sure that everyone on my committee wanted to get rid of me and the easiest way to do that was to let me pass. Bruce stood to gain $3000 per year of research money and Uli was going to get back the basement of his house where I had been living for the past month after getting kicked out of the married graduate students housing project for taking too long to complete my thesis.

The toughest questions were from Uli Laemmli, which should not come as a surprise to anyone who knows him. He has this annoying habit of expecting people to understand the basic physics and chemistry behind the biochemical sciences. Fortunately, my inability to answer most of his questions didn't deter him from voting to pass me.

Read more »

Science misinformation is being spread in the lecture halls of top universities

Should universities remove online courses that contain incorrect or misleading information?

There are lots of scientific controversies where different scientists have conflicting views. Eventually these controversies will be solved by normal scientific means involving evidence and logic but for the time being there isn't enough data to settle a genuine scientific controversy. Many of us are interested in these controversies and some of us have chosen to invest time and effort into defending one side or the other.

But there's a dark side of science that infects these debates—false or misleading information used to support one side of a legitimate controversy. To give just one example, I'm frustrated at the constant reference to junk DNA being defined as non-coding DNA. Many scientists believe that this was the way junk DNA was defined by its earliest proponents and then they go on to say that the recent discovery of functional non-coding DNA refutes junk.

I don't know where this idea came from because there's nothing in the scientific literature from 50 years ago to support such a ridiculous claim. It must be coming from somewhere since the idea is so widespread.

Where does misinformation come from and how is it spread?

Read more »

How do proteins move around amidst the jumble of molecules inside a living cell?

I've been reading Philip Ball's book on "How Life Works" and I find it increasingly frustrating because he consistently describes things that he's "discovered" that biochemists like me must have missed. Here's an example from pages 231-232.

He presents a cartoon image of a cell showing that it's full of all kinds of molecules packed closely together, then he says,

Read more »

Nils Walter disputes junk DNA: (9) Reconciliation

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the ninth and last post in the series. I'm going to discuss Walker's view on how to tone down the dispute over the amount of junk in the human genome. Here's a list of the previous posts.


"Conclusion: How to Reconcile Scientific Fields"

Walter concludes his paper with some thoughts on how to deal with the controversy going forward. I'm using the title that he choose. As you can see from the title, he views this as a squabble between two different scientific fields, which he usually identifies as geneticists and evolutionary biologists versus biochemists and molecular biologists. I don't agree with this distinction. I'm a biochemist and molecular biologist, not a geneticist or an evolutionary biologist, and still I think that many of his arguments are flawed.

Let's see what he has to say about reconciliation.

Science thrives from integrating diverse viewpoints—the more diverse the team, the better the science.[107] Previous attempts at reconciling the divergent assessments about the functional significance of the large number of ncRNAs transcribed from most of the human genome by pointing out that the scientific approaches of geneticists, evolutionary biologists and molecular biologists/biochemists provide complementary information[42] was met with further skepticism.[74] Perhaps a first step toward reconciliation, now that ncRNAs appear to increasingly leave the junkyard,[35] would be to substitute the needlessly categorical and derogative word RNA (or DNA) “junk” for the more agnostic and neutral term “ncRNA of unknown phenotypic function”, or “ncRNAupf”. After all, everyone seems to agree that the controversy mostly stems from divergent definitions of the term “function”,[42, 74] which each scientific field necessarily defines based on its own need for understanding the molecular and mechanistic details of a system (Figure 3). In addition, “of unknown phenotypic function” honors the null hypothesis that no function manifesting in a phenotype is currently known, but may still be discovered. It also allows for the possibility that, in the end, some transcribed ncRNAs may never be assigned a bona fide function.

First, let's take note of the fact that this is a discussion about whether a large percentage of transcripts are functional or not. It is not about the bigger picture of whether most of the genome is junk in spite of the fact that Nils Walter frames it in that manner. This becomes clear when you stop and consider the implications of Walter's claim. Let's assume that there really are 200,000 functional non-coding genes in the human genome. If we assume that each one is about 1000 bp long then this amounts to 6.5% of the genome—a value that can easily be accommodated within the 10% of the genome that's conserved and functional.

Now let's look at how he frames the actual disagreement. He says that the groups on both sides of the argument provide "complementary information." Really? One group says that if you can delete a given region of DNA with no effect on the survival of the individual or the species then it's junk and the other group says that it still could have a function as long as it's doing something like being transcribed or binding a transcription factor. Those don't look like "complimentary" opinions to me.

His first step toward reconciliation starts with "now that ncRNAs appear to increasingly leave the junkyard." That's not a very conciliatory way to start a conversation because it immediately brings up the question of how many ncRNAs we're talking about. Well-characterized non-coding genes include ribosomal RNA genes (~600), tRNA genes (~200), the collection of small non-coding genes (snRNA, snoRNA, microRNA, siRNA, PiWi RNA)(~200), several lncRNAs (<100), and genes for several specialized RNAs such as 7SL and the RNA component of RNAse P (~10). I think that there are no more than 1000 extra non-coding genes falling outside these well-known examples and that's a generous estimate. If he has evidence for large numbers that have left the junkyard then he should have presented it.

Walter goes on to propose that we should divide non-coding transcripts into two categories; those with well-characterized functions and "ncRNA of unknown function." That's ridiculous. That is not a "agnostic and neutral term." It implies that non-conserved transcripts that are present at less that one copy per cell could still have a function in spite of the fact that spurious transcription is well-documented. In fact, he basically admits this interpretation at the end of the paragraph where he says that using this description (ncRNA of unknown function) preserves the possibility that a function might be discovered in the future. He thinks this is the "null hypothesis."

The real null hypothesis is that a transcript has no function until it can be demonstrated. Notice that I use the word "transcript" to describe these RNAs instead of "ncRNA" or "ncRNA of unknown phenotypic function." I don't think we lose anything by using the word "transcript."

Walter also address the meaning of "function" by claiming that different scientific fields use different definitions as though that excuses the conflict. But that's not an accurate portrayal of the problem. All scientists, no matter what field they identify with, are interested in coming up with a way of identifying functional DNA. There are many biochemists and molecular biologists who accept the maintenance definition as the best available definition of function. As scientists, they are more than willing to entertain any reasonable scientific arguments in favor of a different definition but nobody, including Nils Walter, has come up with such arguments.

Now let's look at the final paragraph of Walter's essay.

Most bioscientists will also agree that we need to continue advancing from simply cataloging non-coding regions of the human genome toward characterizing ncRNA functions, both elementally and phenotypically, an endeavor of great challenge that requires everyone's input. Solving the enigma of human gene expression, so intricately linked to the regulatory roles of ncRNAs, holds the key to devising personalized medicines to treat most, if not all, human diseases, rendering the stakes high, and unresolved disputes counterproductive.[108] The fact that newly ascendant RNA therapeutics that directly interface with cellular RNAs seem to finally show us a path to success in this challenge[109] only makes the need for deciphering ncRNA function more urgent. Succeeding in this goal would finally fulfill the promise of the human genome project after it revealed so much non-protein coding sequence (Figure 1). As a side effect, it may make updating Wikipedia and encyclopedia entries less controversial.

I agree that it's time for scientists to start identifying those transcripts that have a true function. I'll go one step further; it's time to stop pretending that there might be hundreds of thousands of functional transcripts until you actually have some data to support such a claim.

I take issue with the phrase "solving the enigma of human gene expression." I think we already have a very good understanding of the fundamental mechanisms of gene expression in eukaryotes, including the transitions between open and closed chromatin domains. There may be a few odd cases that deviate from the norm (e.g. Xist) but that hardly qualifies as an "enigma." He then goes on to say that this "enigma" is "intricately linked to the regulatory roles of ncRNAs" but that's not a fact, it's what's in dispute and why we have to start identifying the true function (if any) of most transcripts. Oh, and by the way, sorting out which parts of the genome contain real non-coding genes may contribute to our understanding of genetic diseases in humans but it won't help solve the big problem of how much of our genome is junk because mutations in junk DNA can cause genetic diseases.

Sorting out which transcripts are functional and which ones are not will help fill in the 10% of the genome that's functional but it will have little effect on the bigger picture of a genome that's 90% junk.

We've known that less than 2% of the genome codes for proteins since the late 1960s—long before the draft sequence of the human genome was published in 2001—and we've known for just as long that lots of non-coding DNA has a function. It would be helpful if these facts were made more widely known instead of implying that they were only dscovered when the human genome was sequenced.

Once we sort out which transcripts are functional, we'll be in a much better position to describe the all the facts when we edit Wikipedia articles. Until that time, I (and others) will continue to resist the attempts by the students in Nils Walter's class to remove all references to junk DNA.


Walter, N.G. (2024) Are non‐protein coding RNAs junk or treasure? An attempt to explain and reconcile opposing viewpoints of whether the human genome is mostly transcribed into non‐functional or functional RNAs. BioEssays:2300201. [doi: 10.1002/bies.202300201]

Nils Walter disputes junk DNA: (8) Transcription factors and their binding sites

I'm discussing a recent paper published by Nils Walter (Walter, 2024). He is arguing against junk DNA by claiming that the human genome contains large numbers of non-coding genes.

This is the seventh post in the series. The first one outlines the issues that led to the current paper and the second one describes Walter's view of a paradigm shift/shaft. The third post describes the differing views on how to define key terms such as 'gene' and 'function.' In the fourth post I discuss his claim that differing opinions on junk DNA are mainly due to philosophical disagreements. The fifth, sixth, and seventh posts address specific arguments in the junk DNA debate.

Read more »

What really happened between Rosalind Franklin, James Watson, and Francis Crick?

That's part of the title of podcast by Kat Arney who interviews Matthew Cobb [Double helix double crossing? What really happened between Rosalind Franklin, James Watson and Francis Crick?].

Matthew Cobb is one of the world's leading experts on the history of molecular biology.

The way it’s usually told, Franklin was effectively ripped off and belittled by the Cambridge team, especially Watson, and has only recently been restored to her rightful place as one of the key discoverers of the double helix. It’s a dramatic narrative, with heroes, villains and a grand prize. But, as I found out when I sat down for a chat with Matthew Cobb, science author and Professor of Zoology at the University of Manchester, the real story is a lot more nuanced.

Photo 51 did not belong to Rosalind Franklin and it had (almost) nothing to do with solving the structure of DNA. Franklin and Wilkins would never have gotten the structure on their own. Crick and Watson did not "steal" any data. Whether they behaved ethically is debatable.


Donald Voet (1938-2023)

I just learned that Don Voet died on April 11th, 2023. Don and Judy Voet were the authors of one of the most successful biochemistry textbooks of all time and for a long time they were the editors of the journal Biochemistry and Molecular Biology Education (BAMBED). I've known Don for over thirty years and we met often at conferences.

He will be greatly missed. Here's an excerpt from the obituary from the American Society for Biochemistry and Molecular Biology written by my old friend Charlotte Pratt who collaboated with me on my textbook and with Don and Judy Voet on theirs [Don Voet (1938-2023)].

Don’s work over the years demonstrated his conviction that biochemical knowledge has limited value unless it is transmitted fully and honestly to the next generation of scientists. His writing style was intentionally aimed at students of all levels, never dumbed down, and straightforward — a way to invite readers to enter a conversation among professional scientists.

Ever collegial, Don insisted on dropping names into the text, referring to the discoveries of specific researchers wherever possible and borrowing figures from the original publications rather than rendering simplified versions. In cases where visual information was lacking, Don created his own molecular graphics, at a time when modeling software was not accessible to amateurs.


Happy DNA Day 2023!

It was 70 years ago today that the famous Watson and Crick paper was published in Nature along with papers by Franklin & Gosling and Wilkins, Stokes, & Wilson. Threre's a great deal of misinformation circulating about this discovery so I wrote up a brief history of the events based largely on Horace Freeland Judson's book The Eighth Day of Creation. Every biochemistry and molecular biology student must read this book or they don't qualify to be an informed scientist. However, if you are not a biochemistry student then you might enjoy my short version.

Some practising scientists might also enjoy refreshing their memories so they have an accurate view of what happened in case their students ask questions.

The Story of DNA (Part 1)

Where Rosalind Franklin teaches Jim and Francis something about basic chemistry.

The Story of DNA (Part 2)

Where Jim and Francis discover the secret of life.

Here's the latest version of Rosalind Frankin's contribution written by Matthew Cobb and Nathaniel Comfort: What Rosalind Franklin truly contributed to the discovery of DNA's structure. If you want to know the accurate version of her history then this is a must-read. Cobb is working on a biography of Crick and Comfort is writing a biography of Watson.

Here are some other posts that might interest you on DNA Day.



How many enhancers in the human genome?

In spite of what you might have read, the human genome does not contain one million functional enhancers.

The Sept. 15, 2022 issue of Nature contains a news article on "Gene regulation" [Two-layer design protects genes from mutations in their enhancers]. It begins with the following sentence.

The human genome contains only about 20,000 protein-coding genes, yet gene expression is controlled by around one million regulatory DNA elements called enhancers.

Sandwalk readers won't need to be told the reference for such an outlandish claim because you all know that it's the ENCODE Consortium summary paper from 2012—the one that kicked off their publicity campaign to convince everyone of the death of junk DNA (ENCODE, 2012). ENCODE identified several hundred thousand transcription factor (TF) binding sites and in 2012 they estimated that the total number of base pairs invovled in regulating gene expression could account for 20% of the genome.

How many of those transcription factor binding sites are functional and how many are due to spurious binding to sites that have nothing to do with gene regulation? We don't know the answer to that question but we do know that there will be a huge number of spurious binding sites in a genome of more than three billion base pairs [Are most transcription factor binding sites functional?].

The scientists in the ENCODE Consortium didn't know the answer either but what's surprising is that they didn't even know there was a question. It never occured to them that some of those transcription factor binding sites have nothng to do with regulation.

Fast forward ten years to 2022. Dozens of papers have been published criticizing the ENCODE Consortium for their stupidity lack of knowledge of the basic biochemical properties of DNA binding proteins. Surely nobody who is interested in this topic believes that there are one million functional regulatory elements (enhancers) in the human genome?

Wrong! The authors of this Nature article, Ran Elkon at Tel Aviv University (Israel) and Reuven Agami at the Netherlands Cancer Institute (Amsterdam, Netherlands), didn't get the message. They think it's quite plausible that the expression of every human protein-coding gene is controlled by an average of 50 regulatory sites even though there's not a single known example any such gene.

Not only that, for some reason they think it's only important to mention protein-coding genes in spite of the fact that the reference they give for 20,000 protein-coding genes (Nurk et al., 2022) also claims there are an additional 40,000 noncoding genes. This is an incorrect claim since Nurk et al. have no proof that all those transcribed regions are actually genes but let's play along and assume that there really are 60,000 genes in the human genome. That reduces the average number of enhancers to an average of "only" 17 enhancers per gene. I don't know of a single gene that has 17 or more proven enhancers, do you?

Why would two researchers who study gene regulation say that the human genome contains one million enhancers when there's no evidence to support such a claim and it doesn't make any sense? Why would Nature publish this paper when surely the editors must be aware of all the criticism that arose out of the 2012 ENCODE publicity fiasco?

I can think of only two answers to the first question. Either Elkon and Agami don't know of any papers challenging the view that most TF binding sites are functional (see below) or they do know of those papers but choose to ignore them. Neither answer is acceptable.

I think that the most important question in human gene regulation is how much of the genome is devoted to regulation. How many potential regulatory sites (enhancers) are functional and how many are spurious non-functional sites? Any paper on regulation that does not mention this problem should not be published. All results have to interpreted in light of conflicting claims about function.

Here are some example of papers that raise the issue. The point is not to prove that these authors are correct - although they are correct - but to show that there's a controvesy. You can't just state that there are one million regulatory sites as if it were a fact when you know that the results are being challenged.

"The observations in the ENCODE articles can be explained by the fact that biological systems are noisy: transcription factors can interact at many nonfunctional sites, and transcription initiation takes place at different positions corresponding to sequences similar to promoter sequences, simply because biological systems are not tightly controlled." (Morange, 2014)

"... ENCODE had not shown what fraction of these activities play any substantive role in gene regulation, nor was the project designed to show that. There are other well-studied explanations for reproducible biochemical activities besides crucial human gene regulation, including residual activities (pseudogenes), functions in the molecular features that infest eukaryotic genomes (transposons, viruses, and other mobile elements), and noise." (Eddy, 2013)

"Given that experiments performed in a diverse number of eukaryotic systems have found only a small correlation between TF-binding events and mRNA expression, it appears that in most cases only a fraction of TF-binding sites significantly impacts local gene expression." (Palazzo and Gregory, 2014)

One surprising finding from the early genome-wide ChIP studies was that TF binding is widespread, with thousand to tens of thousands of binding events for many TFs. These number do not fit with existing ideas of the regulatory network structure, in which TFs were generally expected to regulate a few hundred genes, at most. Binding is not necessarily equivalent to regulation, and it is likely that only a small fraction of all binding events will have an important impact on gene expression. (Slattery et al., 2014)

Detailed maps of transcription factor (TF)-bound genomic regions are being produced by consortium-driven efforts such as ENCODE, yet the sequence features that distinguish functional cis-regulatory sites from the millions of spurious motif occurrences in large eukaryotic genomes are poorly understood. (White et al., 2013)

One outstanding issue is the fraction of factor binding in the genome that is "functional", which we define here to mean that disturbing the protein-DNA interaction leads to a measurable downstream effect on gene regulation. (Cusanovich et al., 2014)

... we expect, for example, accidental transcription factor-DNA binding to go on at some rate, so assuming that transcription equals function is not good enough. The null hypothesis after all is that most transcription is spurious and alterantive transcripts are a consequence of error-prone splicing. (Hurst, 2013)

... as a chemist, let me say that I don't find the binding of DNA-binding proteins to random, non-functional stretches of DNA surprising at all. That hardly makes these stretches physiologically important. If evolution is messy, chemistry is equally messy. Molecules stick to many other molecules, and not every one of these interactions has to lead to a physiological event. DNA-binding proteins that are designed to bind to specific DNA sequences would be expected to have some affinity for non-specific sequences just by chance; a negatively charged group could interact with a positively charged one, an aromatic ring could insert between DNA base pairs and a greasy side chain might nestle into a pocket by displacing water molecules. It was a pity the authors of ENCODE decided to define biological functionality partly in terms of chemical interactions which may or may not be biologically relevant. (Jogalekar, 2012)


Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A. V., Mikheenko, A., et al. (2022) The complete sequence of a human genome. Science, 376:44-53. [doi:10.1126/science.abj6987]

The ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57-74. [doi: 10.1038/nature11247]

Big diagram of metabolic pathways

The contents of this diagram is not in my scope, but it is a very big, detailed diagram of metabolic pathways. Many steps, many arrows.

Tags: ,