Where Does the Genetic Code Come From? An Interview with Dr. Charles Carter, Part II

“Translating the genetic code is the nexus connecting pre-biotic chemistry to biology.” — Dr. Charles Carter

Last week we discussed the general question of how the genetic code evolved, and noted that the idea of the code as merely a frozen accident — an almost completely arbitrary key/value pairing of codons and amino acids — is not consistent with the evidence that has been amassed over the past three decades. Instead, there are deeper patterns in the code that go beyond the obvious redundancy of synonymous codons. These patterns give us important clues about the evolutionary steps that led to the genetic code that was present in the last universal common ancestor of all present-day life.

Charles Carter and his colleague Richard Wolfenden at the University of North Carolina Chapel Hill recently authored two papers that suggest the genetic code evolved in two key stages, and that those two stages are reflected in two codes present in the acceptor stem and anti-codon of tRNAs.

In the first part of my interview with Dr. Carter, he reviewed some of previous work in this field. In the present installment, he comments on the important results that came out of his two recent studies with Dr. Wolfenden. But before we continue with the interview, let’s review the main findings of the papers.

The key result is that there is a strong relationship between the nucleotide sequence of tRNAs, specifically in the acceptor stem and the anti-codon, and the physical properties of the amino acids with which those tRNAs are charged. In other words, tRNAs do more than merely code for the identity of amino acids. There is also a relationship between tRNA sequence and the physical role performed by the associated amino acids in folded protein structures. This suggests that, as Dr. Carter summarized it, “Our work shows that the close linkage between the physical properties of amino acids, the genetic code, and protein folding was likely essential from the beginning, long before large, sophisticated molecules arrived on the scene.” Perhaps it also suggests – this is my possibly unfounded speculation – that today’s genetic code was preceded by a more coarse-grained code that specified sets of amino acids according to their physical functions, rather than their specific identity.

How do the papers show this? To oversimplify a bit, the logic of the two studies proceeds in two steps. The first step is to determine the relevant amino acid properties. The second is to ask whether those properties correlate with tRNA sequence.

The first paper examined amino acid hydrophobicities and their temperature dependence. The rationale for choosing hydrophobicity as a relevant physical property is that it is closely related to the role different amino acid structures play in 3D protein structures – more hydrophobic residues are buried in globular cores, and so on. But in order to understand if hyrdophobicity played a similar role in protein structure during the early evolution of the genetic code — in relatively warm conditions of the RNA (or RNA plus peptide) World — Wolfenden and his team determined amino acid hydrophobicities at high temperatures. They concluded that the temperature dependence of hydrophobicities “are such that they would have tended to minimize the otherwise disruptive effects of a changing thermal environment on the evolution of protein structure.”

In the second paper, Carter and Wolfenden used this data to build a regression model of the relationship between the physical properties of amino acids (as defined by two key free energies) and the tRNA sequences. The regression analysis shows that there are two independent tRNA codes:

“The anticodon encodes the hydrophobicity of each amino acid side-chain as represented by its water-to-cyclohexane distribution coefficient, and this relationship holds true over the entire temperature range of liquid water. The acceptor stem codes preferentially for the surface area or size of each side-chain, as represented by its vapor-to-cyclohexane distribution coefficient.”

So what does this tell us about the evolution of the genetic code? In comments to the University of North Carolina, Dr. Carter explained it nicely:

Dr. Wolfenden established physical properties of the twenty amino acids, and we have found a link between those properties and the genetic code. That link suggests to us that there was a second, earlier code that made possible the peptide-RNA interactions necessary to launch a selection process that we can envision creating the first life on Earth.

For more detail, you can go read the papers or check out the good non-technical summary by the UNC news office. The news piece includes additional comments by Drs. Wolfenden and Carter, some of which I’ve quoted above.

And now on to part two of my interview with Dr. Carter:

MW: You describe two codes in tRNA sequences that specify amino acid properties, one in the acceptor stem and one in the anti-codon. How do these two codes work?

CC: This is a wonderful question, one of many to which we do not claim to have an answer. We are pretty confident that the relationships we describe are real and that they therefore signal a multi-stage evolution from stereochemical coding first to indirect coding via an adaptor RNA related to tRNA, and finally to the ability to read a blueprint in mRNA by recognizing the appropriate triplet of bases (which is the first stage that can be described as “genetic”). Here are several thoughts:

a. What we report are actually strong correlations; they are novel, but by themselves they lack explanatory power.

b. “How they work…” has several different connotations. At one level, they work simply by the strong correlations, but that is not what you intended for me to answer. At the level of the modern code, the answer is also pretty clear, as the 3D structures have been determined for many key intermediates in the translation of messages.

c. An important component of the modern translation system that we more or less assume in our treatment is the ribosome, which is essentially an RNA enzyme for making peptide bonds combined with a complex information processor that reads mRNA. Although we do not comment extensively on this aspect of the generation of the code, the evolution of the ribosome itself must have played an important role. There are two schools of thought on the evolution of the ribosome. One holds that the peptide bond-making machine (the 50S subunit) came first (Petrov, A. S., et al., 2014, and Petrov, A. S. & Williams, L. D., 2015). The other holds that the decoding 30S subunit preceded the appearance of the large ribosomal subunit (Harish, A. & Caetano-Anollés, G., 2012).

d. The key mystery in my mind is whether, and if so, how molecules smaller than tRNA that contained only the acceptor stem might have been used to align amino acids in somewhat the same way that the acceptor stems are aligned within the 50S ribosomal subunit, but using various aspects of the pairing of acceptor stems, rather than downloading this task to the anticodon within the 30S subunit. It is quite difficult for me to imagine how such alignments might have obeyed sequences in an RNA blueprint. Yarus has, however, described one possible model for the evolution of templating by messenger RNA in the reference cited above in 2b.
 
MW: You and Dr. Wolfenden decided to measure an important amino acid property, hydrophobicity, at a much higher temperature than is typical – 100˚ C. Why did you and Dr. Wolfenden do these measurements at this higher temperature, and what did you discover about hydrophobicity?

CC: This is a wonderful question for my co-author, and I’ll give you my interpretation. Dr. Wolfenden is somewhat more convinced than I am that life began almost as soon as the earth cooled to allow liquid water. There is much magnificent (and controversial!) thermodynamics associated with what happens to hydrophobicities at 100 C, especially because dyed-in-the-wool physical chemists claim that the hydrophobic effect, per se, goes away at 100 C. One thing that seems certain is that the range of hydrophobicities for different amino acids narrows substantially at higher temperatures. However, they narrow in ways that do not disrupt the correlations we established with the coding properties of the anticodon.
 
MW: What do your results tell us about how the genetic code evolved? If we discovered DNA-based life elsewhere, say, on Mars, how similar would the Martian genetic code be to the one on earth — assuming life originated independently in both places?
 
CC: I myself am something of a “terrestrial chauvinist” by which I mean that the long arm of coincidence means something. I believe that much of what we have learned about life on earth is so closely aligned with (perhaps unknown) physical laws that we would recognize life elsewhere in the universe as being very much like life here. In particular, I am pretty confident that it would be carbon based, and the chemical free energy would be exchanged via phosphate esters, as it is on earth. It would be based on polypeptides and polynucleotides, because the former have an incredible manifold of functional variation in the space of folded proteins for controlling differential binding and catalysis, whereas nucleic acids have rock solid information storage capacity. The likelihood that their respective thermodynamically stable helical structures are structurally complementary leads me to believe that these two polymers are uniquely suited for life. Further, the properties of the four nucleic acid bases appear to be unique enough that life elsewhere in the universe would be made from the same four bases.

Ironically, and despite my terrestrial chauvinism, I really don’t think we know enough yet to make educated guesses on the question you pose here. Although our work and the previous work by Delarue do thaw the frozen accident somewhat, I think we know too little to say with any confidence that the code that evolved on earth has such advantages that it would always win the competition elsewhere. Many studies on the code have shown that although it is extremely robust to mutation (Freeland, S. J. & Hurst, L. D., 1998), it cannot be unique, as the combinatorial space of codes is so vast.


Filed under: Curiosities of Nature Tagged: evolution, genetic code

Where Does the Genetic Code Come From? An Interview with Dr. Charles Carter, Part I.

“I’m more and more inclined to think that we can actually penetrate at least some of the steps by which nature invented the code.” — Charles Carter

The genetic code is one of biology’s few universals*, but rather than being the result of some deep underlying logic, it’s often said to be a “frozen accident” — the outcome of evolutionary chance, something that easily could have turned out another way. This idea, though it’s often repeated, has been challenged for decades. The accumulated evidence shows that the genetic code isn’t as arbitrary as we might naively think. And more importantly, this evidence also offers some tantalizing clues to how the genetic code came to be.

This origins of the genetic code has long been a research focus of University of North Carolina biophysicist Charles Carter, and his UNC enzymologist colleague Richard Wolfenden. They authored a pair of recent papers that suggest behind the genetic code are actually two codes, reflecting key steps in its evolution. Dr. Carter kindly agreed to answer some questions about the papers, which present some interesting results that add to the growing pile of evidence that the genetic code is much less accidental that it may seem.

These papers deal with the machinery that implements the genetic code. Conceptually the code is simple: it is a set of dictionary entries or key-value pairs mapping codons to amino acids. But to make this mapping happen physically, you need, as Francis Crick correctly hypothesized back in 1958, an adapter. That adapter, as most of our readers know, is tRNA, a nucleic acid molecule that is “charged” with an amino acid.

But the existence of tRNAs creates another coding problem: how does the right tRNA get paired with the correct amino acid? The answer to this question is at the heart of the origin of the genetic code, and it’s the subject of these two recent papers. More about this story, as well as the first part of my interview with Dr. Carter, is below the fold.

So how do you get the correct codon/amino acid pairings on a tRNA? This, as you’ll remember from your biochemistry courses, is accomplished through a set of enzymes called tRNA synthetases that “charge” the tRNAs with their corresponding amino acid. tRNA synthetases are central to the secret of how the genetic code evolved. As Dr. Carter noted in a piece about “Thawing the ‘Frozen Accident'”, “The emergence of the genetic code was inseparable from the ancestry of the RNA adaptors and protein catalysts that implement it now.” He’s been interested in exactly that problem — how early RNAs and protein catalysts developed into the universal coding system we have today.

The striking result in the recent work by Carter and Wolfenden has to do with how tRNAs are recognized by certain tRNA synthetases. Their results indicate that tRNAs carry two codes: the well-known one in the anti-codon (the part that directly matches genetic code codons), and a second one in the “acceptor stem.” (See the figure below.)

tRNA.001

As it turns out, these two codes aren’t arbitrary, as you might expect from a purely frozen accident perspective. Instead, the nucleic acid sequence of the acceptor stem and the anti-codon both code for distinct physical properties of amino acids. In other words, the codon/amino acid pairings reflect the different physical roles that different amino acids play in the structure of full, folded proteins. Or as Carter and Wolfenden put it:

These and other results suggest that genetic coding of 3D protein structures evolved in distinct stages, based initially on the size of the amino acid and later on its compatibility with globular folding in water.

And now, I’ll hand the mic over to Dr. Carter, who explains how he has approached this question. He also discusses some of the research into the origins of the genetic code conducted over the past several decades, citing some key papers that would make a great start for those who want to dig deeper. On Monday, we’ll dive into the details of the latest paper by Carter and Wolfenden, and present the second half of our interview.

MW: How did you come to this project – what led you to investigate the connection between the physical properties of amino acids and the sequences of their corresponding tRNAs? Is this a question you’ve focused on before, or did this emerge out of a different line of inquiry?
 
CC: I’ve been interested in the roles of polypeptide and RNA structure in the origin of life since my 1974 paper describing a stereochemical model for interactions of antiparallel extended beta polypeptide chains and RNA. That model led to my interest in the aminoacyl-tRNA synthetases, and hence to the existence of two such families that appeared to be unrelated to each other. Three earlier observations raised my curiosity about the coding properties of tRNA acceptor stem bases:

a. Others noted that synthetases and their cognate tRNAs both have two recognizable interacting modules, and had demonstrated that intact synthetases would acylate “minihelices” derived from tRNA acceptor stems. They suggested that synthetase catalytic domains might have functioned during an earlier stage of evolution to acylate the tRNA acceptor stems. They called this type of recognition an “operational RNA code”. I wondered how that code might have worked, and realized that the first step was to see if acceptor stem bases formed a code related to the properties of the amino acids.

b. My own work on aminoacyl-tRNA synthetases produced “Urzymes”, which are modified forms of the structurally invariant cores shared by all members of an enzyme superfamily that can be expressed separately and that retain major fractions of the catalytic activity of the modern enzymes. Urzymes from Class I and II aminoacyl-tRNA synthetases are roughly 10-30% the size of the full-length enzymes. This means that they are too small to recognize the tRNA anticodon, even though they acylate cognate tRNAs. That observation validated the suggestion that there was an operational code in the acceptor stem.

c. Richard Giegé had published a rather extensive survey of the specific bases in tRNA that were recognized by synthetases, and most of these “identity elements” were located either in the acceptor stem or in the anticodon. Thus, the database necessary to pose the question of how tRNA coding discriminates between different amino acids was suitably complete. I had become adept at using the regression methods necessary to look for coding relationships between amino acid properties and the identity elements summarized by Giegé. I began simply by tabulating all properties I could find of the 20 amino acids. My initial discovery was that the acceptor stem bases, which I had hoped would be correlated with hydrophobicity, were instead correlated strongly with amino acid masses, whereas the anticodon was closely correlated with their hydrophobicities. It became clear that the physical properties of the amino acids studied by my colleague, Dick Wolfenden furnished a compelling and experimentally based pair of independent attributes. In particular, he and I discovered that the vapor-to-cyclohexane transfer free energies were tightly correlated with amino acid masses (i.e., via their volumes). The questions I wanted to address were thus a natural fit with my curiosity, aptitudes, resources, and colleagues.
 
MW: It’s not obvious to me why the genetic code shouldn’t be almost completely arbitrary. It’s often been cited as an example of a frozen accident — nearly universal in among organisms, but simply the result of a chance evolutionary outcome. Aside from the redundancy of synonymous codons, which reduce the impact of mutations, from a naive perspective we wouldn’t expect the DNA codon sequence or the tRNA sequence to be related to the physical properties of amino acids.
 
And yet your work, building on previous studies, shows that there is a strong relationship — that the both the anti-codon and acceptor stem sequence correlate with the role of amino acids in folded proteins. Why is nucleic acid sequence so closely related to the physical properties of amino acids?

CC: There are lots of challenging questions wrapped up inside this one, and we’re beginning, I think, to be able at least to think about how to go about answering them. Of course, the evolution of life is indeed a probabilistic process—a game of chance—and for that reason it is a “frozen accident” at some level. However, I’m more and more inclined to think that we can actually penetrate at least some of the steps by which nature invented the code.

a. At a basic level one should appreciate the fact that the purpose of the genetic code is to code for protein structures. Thus, it should not be surprising that Wolfenden first identified correlations between the physical properties of amino acids, protein folding, and the genetic code.

b. Michael Yarus has used the selection of oligonucleotides from complex combinatorial libraries to demonstrate the existence of RNA aptamers that bind to specific amino acids, and to an intriguing extent, these short RNA molecules often contain either the appropriate codons or anticodons. These correlations appear with frequencies much in excess of that expected for random correlations, so they must be related in some fashion to the genesis of the code. However, there are two puzzling aspects of this work (i) cognate triplet associations have been identified for 7 different amino acids activated by Class I synthetases, but only 1 amino acid activated by a Class II synthetase. One might have expected a more balanced result. (ii) codons and anticodons are identified with essentially equal frequency for the 8 amino acids studied. That ambiguity points toward a role for double-stranded RNA in the stereochemical stage of code development, much as the sense/antisense coding of the two aminoacyl-tRNA synthetases does.

c. Marc Delarue had published a remarkable analysis of how the universal genetic code might have become settled by a series of binary choices in successive bases of the anticodon, leading at each step to specification of codons for one new Class I and one new Class II amino acid. That paper furnishes a paradigm that avoids the frozen accident to some extent, and was quite influential in how I thought about the problem. In particular, the redundancy of synonymous codons may have resulted from the successive recruitment of groups of tRNAs to the same amino acid from earlier stages of lower specificity as the code became defined. See my earlier commentary on this point.

d. James Zull identified a curious and fundamental aspect of the code by noting that the codons for amino acids that contribute to cores in folded proteins are actually always anticodons for amino acids associated with the surfaces of proteins. This inversion symmetry implies that proteins coded by opposite strands of the same gene are in some sense “inside out” (Chandrasekaran, et al., 2013).

e. Wolfenden and I have now actually thrown something of a monkey wrench into the notion described in (2) by pointing out that the coding properties in the acceptor stem likely preceded those in the anticodon bases. How that conundrum is eventually resolved should be fun to witness.

Stay tuned for part II.

tRNA image by Yikrazuul via Wikimedia Commons.

*OK, the genetic code, like everything else in biology, has exceptions, but these are clearly derivatives of the original code.


Filed under: Curiosities of Nature Tagged: evolution, genetic code

How to Use the Genetic Code for Passwords

  Need a password for a new device or service? Try the genetic code. Messenger RNA triplets and the amino acids they specify provide nearly endless password possibilities. And it’s timely — the People’s Choice for Science magazine’s Breakthrough of … Continue reading »

The post How to Use the Genetic Code for Passwords appeared first on PLOS Blogs Network.

Science Caturday: One Code, Two Code…

Like a DNA nucleotide, this LOLcat is capable of playing multiple roles. It is good for creation vs. evolution, and so much more. Global warming vs. something else? You are covered. Homeopathy vs. physics? Done. Duons vs. the genetic code? In the bag.

Duons vs. the genetic code? What is a duon?

Good question. A duon is a DNA nucleotide that can do two roles. This perhaps makes it a rather lame nucleotide. DNA nucleotides have a lot of potential tasks they can do (eg, help encode an amino acid, be part of a protein binding site, indicate a splice site, etc) as part of their role storing information in our cells. The idea that a nucleotide might be subject to evolutionary pressure from several different tasks simultaneously is nothing new.

There is, as Emily Willingham points out at Forbes, no real “duon” controversy outside the minds of the folks that wrote the press release (and, perhaps, John Stamatoyannopoulos, if the press release quote is accurate, which I suspect it might be based on his advocacy for the ENCODE Consortium’s “junk DNA is functional” boondoggle). These researchers have provided some evidence to support the hypothesis that evolutionarily conserved codon bias (using one codon, of the several possible for an amino acid, in the genetic code more than expected by chance) is due to selection to maintain transcription factor binding sites.

This is not an unreasonable hypothesis, but it is hardly shocking, hardly requires a new term, and is hardly a controversy.


Filed under: Science Caturday Tagged: ENCODE, evolution, Gene expression, genetic code, Genetics

Science Caturday: One Code, Two Code…

Like a DNA nucleotide, this LOLcat is capable of playing multiple roles. It is good for creation vs. evolution, and so much more. Global warming vs. something else? You are covered. Homeopathy vs. physics? Done. Duons vs. the genetic code? In the bag.

Duons vs. the genetic code? What is a duon?

Good question. A duon is a DNA nucleotide that can do two roles. This perhaps makes it a rather lame nucleotide. DNA nucleotides have a lot of potential tasks they can do (eg, help encode an amino acid, be part of a protein binding site, indicate a splice site, etc) as part of their role storing information in our cells. The idea that a nucleotide might be subject to evolutionary pressure from several different tasks simultaneously is nothing new.

There is, as Emily Willingham points out at Forbes, no real “duon” controversy outside the minds of the folks that wrote the press release (and, perhaps, John Stamatoyannopoulos, if the press release quote is accurate, which I suspect it might be based on his advocacy for the ENCODE Consortium’s “junk DNA is functional” boondoggle). These researchers have provided some evidence to support the hypothesis that evolutionarily conserved codon bias (using one codon, of the several possible for an amino acid, in the genetic code more than expected by chance) is due to selection to maintain transcription factor binding sites.

This is not an unreasonable hypothesis, but it is hardly shocking, hardly requires a new term, and is hardly a controversy.


Filed under: Science Caturday Tagged: ENCODE, evolution, Gene expression, genetic code, Genetics

Scouting for recent pubs on origin/evolution of the genetic code – suggestions wanted

A student in my Intro Bio class is interested in learning more about the origin and evolution of the genetic code. I am looking for some relatively recent papers to suggest to her. I have found the following:


Other suggestions wanted ...

---------------------------
A suggestion from Twitter