DNA sequence space

From: Pim van Meurs <pimvanmeurs@yahoo.com>
Date: Thu Sep 29 2005 - 13:00:59 EDT

Cornelius Hunter wrote:

> Actually, the idea that the full design space need not be searched is
> weak, and the idea that the search is not random is non evolutionary.

Huh, the search is non random in the sense that selection guides it.

> The only way I know that the full design space would not need to be
> searched would be if that space was largely filled with useful,
> functioning designs. But this clearly is not the case.

In fact, you need to distinguish between sequence space and design
space. For RNA it indeed seems that space is largely filled with
functional designs and that the non-neutral distance between such
designs is quite small

> It certainly is true that there is flexibility in known genomes. One
> can make all sorts of changes and still have a functioning genome. But
> this should not be confused with any idea that functioning genomes are
> common in the DNA space. Quite the opposite. The bottom line is a
> search through the DNA space would have to cover the majority of the
> space before obtaining appreciable probabilities of hitting on
> functioning genomes.

That begs the question, especially given the available evidence.
Cornelius, I presented you with the necessary references about RNA. Have
you already forgotten?

Recent nature news also shows some relevant and interesting
characterisics that contradict your notion about DNA space.

First an article on the evolution of complex biological systems

Structural biology: Origins of chemical biodefence "The idea that
complex biological systems can evolve through a series of simple, random
events is not universally accepted. The structure of a vital immune
protein shows how such evolution can occur at a molecular level. "

then two papers on the protein fold

Nature 437, 512-518 (22 September 2005) | doi: N10.1038/nature03991

Evolutionary information for specifying a protein fold
Michael Socolich Steve W. Lockless William P. Russ, Heather Lee, Kevin H. Gardner and Rama Ranganathan

AbstractClassical studies show that for many proteins, the information required for specifying the tertiary structure is contained in the amino acid sequence. Here, we attempt to define the sequence rules for specifying a protein fold by computationally
creating artificial protein sequences using only statistical information encoded in a multiple sequence alignment and no tertiary structure information.
Experimental testing of libraries of artificial WW domain sequences shows that a simple statistical energy function capturing coevolution between amino acid residues is necessary and sufficient to specify sequences that fold into native structures. The artificial proteins show thermodynamic stabilities similar to natural WW domains, and structure determination of one artificial protein shows excellent agreement with the WW fold at atomic resolution. The relative simplicity of the information used for creating sequences suggests a marked reduction to the potential complexity of the protein-folding problem.

Nature 437, 579-583 (22 September 2005) | doi: 10.1038/nature03990
Natural-like function in artificial WW domains
William P. Russ, Drew M. Lowery, Prashant Mishra, Michael B. Yaffe and Rama Ranganathan
Protein sequences evolve through random mutagenesis with selection for optimal fitness1. Cooperative folding into a stable tertiary structure is one aspect of fitness, but evolutionary selection ultimately operates on function, not on structure. In the accompanying paper2, we proposed a model for the evolutionary constraint on a small protein interaction module (the WW domain) through application of the SCA, a statistical analysis of multiple sequence alignments3, 4. Construction of artificial protein sequences directed only by the SCA showed that the information extracted by this analysis is sufficient to engineer the WW fold at atomic resolution. Here, we demonstrate that these artificial WW sequences function like their natural counterparts, showing class-specific recognition of proline-containing
target peptides5, 6, 7, 8. Consistent with SCA predictions, a distributed network of residues mediates functional specificity in WW domains. The ability to recapitulate natural-like function in designed sequences shows that a relatively small quantity of sequence information is sufficient to specify the global energetics of amino acid
A somewhat older paper (2004)
Simulating protein evolution in sequence and structure space Current Opinion in Structural Biology Volume 14, Issue 2 , April 2004, Pages 202-207
Yu Xia and Michael Levitt 
Naturally occurring proteins comprise a special subset of all plausible 
sequences and structures selected through evolution. Simulating protein 
evolution with simplified and all-atom models has shed light on the 
evolutionary dynamics of protein populations, the nature of evolved 
sequences and structures, and the extent to which today’s proteins are 
shaped by selection pressures on folding, structure and function. 
Extensive mapping of the native structure, stability and folding rate in 
sequence space using lattice proteins has revealed organizational 
principles of the sequence/structure map important for evolutionary 
dynamics. Evolutionary simulations with lattice proteins have 
highlighted the importance of fitness landscapes, evolutionary 
mechanisms, population dynamics and sequence space entropy in shaping 
the generic properties of proteins. Finally, evolutionary-like 
simulations with all-atom models, in particular computational protein 
design, have helped identify the dominant selection pressures on 
naturally occurring protein sequences and structures.
Expanding protein universe and its origin from the biological Big Bang
PNAS | October 29, 2002 | vol. 99 | no. 22 | 14132-14136
Nikolay V. Dokholyan , Boris Shakhnovich and Eugene I. Shakhnovich 
The bottom-up approach to understanding the evolution of organisms is by studying molecular evolution. With the large number of protein structures identified in the past decades, we have discovered peculiar patterns that nature imprints on protein structural space in the course of evolution. In particular, we have discovered that the universe of protein structures is organized hierarchically into a scale-free network. By understanding the cause of these patterns, we attempt to glance at the very origin of life.
There are many good papers on this topic available for reading.
OK one more for good measure
The Emergence of Scaling in Sequence-Based Physical Models of Protein 
Eric J. Deeds* and Eugene I. Shakhnovichy
Biophysical Journal Volume 88 June 2005 3905–3911 3905
ABSTRACT It has recently been discovered that many biological systems, 
when represented as graphs, exhibit a scale-free topology. One such 
system is the set of structural relationships among protein domains. The 
scale-free nature of this and other systems has previously been 
explained using network growth models that, although motivated by 
biological processes, do not explicitly consider the underlying physics 
or biology. In this work we explore a sequence-based model for the 
evolution protein structures and demonstrate that this model is able to 
recapitulate the scale-free nature observed in graphs of real protein 
structures. We find that this model also reproduces other statistical 
feature of the protein domain graph. This represents, to our knowledge, 
the first such microscopic, physics-based evolutionary model for a 
scale-free network of biological importance and as such has strong 
implications for our understanding of the evolution of protein 
structures and of other biological networks.
Hope this helps clarify some of the issues.
Received on Thu Sep 29 13:02:43 2005

This archive was generated by hypermail 2.1.8 : Thu Sep 29 2005 - 13:02:43 EDT