Degeneracy of the Genetic Code has Played an Important Role in Evolution of Organisms

Genetic information transferred from DNA to mRNA by transcription is transmitted to protein through translation on ribosome. During the translation, the universal or standard genetic code written with triplet base sequence is used, which specifies a kind of amino acid, like as that GUC, CAC, AAG and UUC correspond to Val, His, Lys and Phe, respectively. As well known, the genetic code is degenerate mainly at the third codon position, for example GGN, GCN and GUN, 4 codons each, code for Gly, Ala and Val, respectively (Figure 1). However, it is totally unknown about the reason why the genetic code is degenerate.


Introduction
Genetic information transferred from DNA to mRNA by transcription is transmitted to protein through translation on ribosome.During the translation, the universal or standard genetic code written with triplet base sequence is used, which specifies a kind of amino acid, like as that GUC, CAC, AAG and UUC correspond to Val, His, Lys and Phe, respectively.As well known, the genetic code is degenerate mainly at the third codon position, for example GGN, GCN and GUN, 4 codons each, code for Gly, Ala and Val, respectively (Figure 1).However, it is totally unknown about the reason why the genetic code is degenerate.
The genetic code occupies a core position connecting genetic function with catalytic function in the fundamental life system (Figure 2).This means that the genetic code is not only a simple representation of triplet base sequence with an amino acid, but also a key element when one wants to make clear formation process of the fundamental life system, which is composed of gene (genetic function), genetic code and protein (catalytic function), leading to solving the riddle of the origin of life.
In this Short Communication, I describe that the degeneracy of the genetic code plays an important role in producing entirely new (EntNew) gene and the corresponding EntNew protein from nonstop frame on anti-sense strand of GC-rich gene (GC-NSF(a)), according to the GC-NSF(a) hypothesis, which I have proposed [1][2][3] (Figure 3).That would be the significance of the genetic code.

Abstract
The genetic code is degenerate mainly at the third codon position.However, the reason of the degeneracy still remains unknown.On the other hand, we have proposed GC-NSF(a) hypothesis, assuming that entirely new gene has emerged from non-stop frame on antisense codon sequence of GC-rich gene (GC-NSF(a)).Amino acid sequence of a protein encoded by GC-NSF(a) widely varies without any change of amino acid sequence of a protein produced from the corresponding gene on sense strand.Therefore, various protein sequences with an amino acid sequence can be repeatedly examined, if a required function is found on a surface of the protein from the GC-NSF(a).This indicates that the degeneracy of the genetic code has largely contributed to produce entirely new protein from GC-NSF(a).Thus, entirely new proteins having a required function could be produced effectively.The degeneracy of the genetic code made it possible for organisms to prosper on Earth.Organisms, which did not use a degenerate genetic code, would extinguish from this planet.This is one significant point of the genetic code.

Degeneracy of the Genetic Code has Played an Important Role in Evolution of Organisms
Copyright: © 2016 Ikehara

Theories proposed for new gene formation
Organisms on the Earth have evolved gradually and steadily, and sometimes abruptly, over several billion years.The evolution of lives should have tightly coupled to acquisition of new function required to adapt to new environment.
Two main routes for creation of new genes have been proposed so far.One is gene duplication theory proposed by Ohno [4], predicting that after duplication of a gene, one duplicate may acquire a new adaptive function, while the other duplicate retains the original function.The second is exon-shuffling theory proposed by Gilbert, et al. [5], assuming that new functional gene is created as a new combination of exons shuffled among two or more genes.
However, both of the theories do not explain the most fundamental problem on the creation of EntNew gene or the first ancestor gene in a family, from which various descendant genes could be derived.This means that the most important and key problem about formation of a gene family remains unsolved.
On the contrary, I have proposed GC-NSF(a) hypothesis on creation of EntNew gene [1][2][3].The hypothesis assumes that EntNew gene, which is totally different from any previously existing gene, is created from GC-NSF(a) after gene duplication (Figure 3).I would like to emphasize here that GC-NSF(a) hypothesis is not a purely theoretical idea but one attained by database analysis of microbial genes and proteins, which have been obtained by experiments.

The reason why EntNew gene can be produced from GC-NSF(a)
The reason why EntNew genes can be produced according to the GC-NSF(a) hypothesis is as follows.
1. Protein encoded by GC-NSF(a) satisfies the six conditions for formation of water-soluble globular protein, which were obtained as average ranges (average value +/standard deviation) of six properties (hydropathy, α-helix, β-sheet and turn/coil formabilities, and acidic amino acid and basic amino acid compositions) of water-soluble globular proteins encoded by genomes of seven extant microorganisms carrying chromosomal DNA with widely different GC content [6].This indicates that a polypeptide chain produced from GC-NSF(a) would be folded into water-soluble globular structure at a high probability.
2. The reason why protein encoded by GC-NSF(a) well satisfies the six conditions is partly because GC-NSF(a) is similar to SNS repeating codon sequence or (SNS) n , which also satisfies the six conditions [7].S and N mean G or C and any one of four bases.
3. GC-NSF(a) codes for a protein, which has completely different amino acid sequence from any previously existing protein but has a similar amino acid composition to those of proteins encoded by GC-rich genome (Figure 4).The protein encoded by GC-NSF(a) has a nearly random amino acid sequence in protein 0 th -order structure.Protein 0 th -order structure is a specific amino acid composition, in which even random joining of amino acids produces water-soluble globular protein at a high probability.One of such kind of protein 0 th -order structures is GC-NSF(a)-encoding amino acid composition similar to SNS-encoding amino acid composition (Figure 4).Other protein 0 th -order structures are SNS-encoding ten amino acids (  Of course, sequence diversity of a protein encoded by GC-NSF(a) is much smaller than ~10 130 , since amino acid composition of the protein is extremely deviate from that of a protein containing about equal amount of 20 amino acids.However, the diversity is larger than ~10 100 , the still extraordinary large diversity value of a protein encoded by (SNS) 100 , similar to codon sequence of GC-NSF(a) (Figure 4).Note that SNS codes for 10 amino acids.
5. In addition, the degeneracy at the third codon position increases the probability that EntNew protein is produced from GC-NSF(a), because the EntNew protein could be generated without any change of amino acid sequence encoded by a GC-rich gene on sense strand.
The base replacement at the third codon position of GCrich gene corresponds to the first codon position of GC-NSF(a) (Figure 5).If a required function were detected on a surface of the protein encoded by one of various GC-NSF(a)s, the protein could be newly born and successively evolve to mature protein (8).

Degeneracy of the Genetic Code has Played an Important Role in Evolution of Organisms
Copyright: © 2016 Ikehara Therefore, degeneration of the genetic code is quite important for production of EntNew protein; because every amino acid sequence appeared upon base replacement could be examined until a required catalytic activity is found.

Discussion
Base at the third codon position on a sense strand can be replaced widely without any change of amino acid sequence, because of degenerate of the genetic code at the position.On the other hand, this causes to change amino acid sequence encoded by GC-NSF(a) at a high probability, since the change of base at the first codon position induces amino acid replacement of a protein encoded by GC-NSF(a).Nevertheless, amino acid composition, not amino acid sequence, of the protein does not largely deviate from the protein 0 th -order structure.The reason is because GC- mutation pressure acting on GC-rich gene of a microorganism carrying GC-rich genome remains to keep the composition nearly in SNS-encoding amino acid composition, which is also one of the protein 0 th -order structures (Figure 4).Thus, it is considered that EntNew protein having a required function could be produced effectively owning to the degeneracy of the genetic code, and that organisms on this planet, which have used the degenerate genetic code, have prospered, although it might be as a result.Inversely saying, organisms, which did not use a degenerate genetic code, would extinguish from the Earth, even if the organisms once emerged on this planet.The genetic code of organisms prospering on this planet must be degenerate, for whatever reason for the degeneracy.
Then, from when did the genetic code degenerate?According to my scenario, GNC-SNS primitive genetic code hypothesis [6], on the origin and evolution of the genetic code, it can be supposed that the degeneracy started at the evolutionary step from the first genetic code GNC to the next code GNS, which encodes five amino acids, GNC-encoding [GADV]-amino acids plus Glu [E].The GNS code might trigger to evolve it to SNS code and to produce effectively EntNew gene/protein from antisense codon sequence ((SNS) n sequence) of (SNS) n gene, as similarly as EntNew Gene/ protein emerged from GC-NSF(a) (Figure 3).

Figure 1 :
Figure 1: Degeneracy of the genetic code.Almost all extant organisms on the Earth are using so called universal or standard genetic code, which degenerates mainly at the third codon position of the code.

Figure 2 :
Figure 2: Position of genetic code in fundamental life system.Genetic code occupies a core position connecting genetic function composed of DNA and mRNA with catalytic function carried out protein and metabolism in the life system.

4 .
As well known, sequence diversity of even a small protein composed of 100 amino acids is extraordinary large, as reaching to about 20 100 = ~10 130 .The tremendously large diversity makes it possible to produce protein without any homology with previously existing proteins or an EntNew protein from a GC-NSF(a), because the probability which an amino acid sequence of a protein arisen from a GC-NSF(a) coincides with any one of previously existing proteins is just about nil, as ~1/10 130 .

Figure 3 :
Figure 3: GC-NSF(a) hypothesis.The hypothesis assumes that entirely new gene is created from GC-NSF(a) after gene duplication.Thick line and broken line indicate a gene on sense strand and non-stop frame on antisense strand, respectively.

Figure 4 :Figure 5 :
Figure 4: Base compositions of GC-NSF(a) at three codon positions.Base compositions at three codon positions of both GC-NSF(a) and GC-rich gene are similar to each other and can be approximated to SNS.Base compositions at the three codon positions were obtained as aver age values of seven genes of Pseudomonas aeruginosa carrying GC-rich genome.