Evolution of proteins in sequence space

From: pruest@pop.dplanet.ch
Date: Fri Aug 10 2001 - 10:22:53 EDT

  • Next message: pruest@pop.dplanet.ch: "Evolution of proteins in sequence space"

    Wayne Dawson wrote (WD, 6 Aug 2001 11:21:49 EDT):
    > Peter Ruest wrote (PR, 02 Aug 2001 17:18:27 +0200):
    > > Basically, any sequence within the transastronomically huge
    > > combinatorial space of the L^20 possible sequences of proteins of length
    > > L would be accessible during evolution, if there is a mutational path
    > > which leads from an existing sequence to the target considered and which
    > > does not contain any intermediates which are selected against (or even
    > > lethal). In order to evaluate this mechanism of evolution and the
    > > probability of its success, we should have an idea about the frequency
    > > of useful sequences in sequence space. This information has been
    > > missing, but now some indications about it are available.
    WD: > You should at least write a mental note somewhere that correlation
    effects in a polymer are *not* limited to single peptides, nor single
    nucleotides, nor any other monomer that you can name. Typically,
    nearest neighboring monomers tend to be coupled due to the lack of free
    rotation about their bond axes. For nucleotides, this includes
    correlation between the aromatic rings. For peptides, it is more
    complicated because you have more interactions: hydrophobic,
    hydrophilic, and acid/base interactions. In any case, you can (and
    should) expect a polymer to have correlation between its nearest
    neighbors. <

    PR: I assume you meant "amino acid" when you wrote "peptide". I agree
    with your comment, as I have also dealt with the problem of the
    interdependence between amino acid positions before. I certainly do not
    claim each one of the L^20 _formal_ sequence possibilities would
    correspond to a _physically possible_ protein. All that is needed for my
    discussion of the two papers I dealt with is the formal protein
    configurational space, as non-viable amino acid sequences may be
    specified by possible DNA sequences, but are then weeded out anyway.

    > (snip) <

    In your post of 8 Aug 2001 10:15:24 EDT, you corrected the following
    WD: > I thought over what I wrote here and I think this needs to be
    > > Hence, I would be inclined to argue that the number of degrees
    > > of freedom have been greatly overestimated in L^20, and
    > > L^(20/3) is a more realistic estimate of the odds involved.
    > > That is admittedly still a big number for any long protein
    > > chain, and may still lead to astronomically huge odds, but
    > > certainly not _as_ huge.
    > I was reasoning that the distinguishability of the different peptides is reduced by the extended persistence length, but that should have been worked out from the following.
    > (1) The persistence length affects the *base* of the expression L' = L/3 (approximately).
    > (2) On the other hand, the exponent (n') of the expression should be largely defined by the basic chemistry of the interacting side chains: hydrophobic, hydrophilic acid, base. That allows a maximum of say 8 categories
    > hydrophobic: weakly -> moderately -> strongly
    > hydrophilic: weakly -> moderately -> strongly
    > acidic
    > basic
    > I think weakly hydrophobic/philic is really the same thing (Gly for example), but perhaps a special class involving steric interactions (e.g., Trp or Pro) could also be invoked, so perhaps a maximum of 8 classes of truly *distinguishable* peptides is reasonable in this case.
    > Of course there are some examples where a single peptide change can be lethal, but more often the changes are far less pernicious tending only to accumulate noticeable problems in old age. In any case, polymorphism in the human genome makes such things as the CD4 receptor more vulnerable to HIV infection in some groups, and less so in others, so variation in proteins is not something particularly profound.
    > Thus, I think 8 represents an estimate of the chemically *distinguishable* set of peptides in a sequence which means the exponent in the expression is probably about 8. Smaller values are probably too small, but I also don't see a lot of reason to argue that there should be more categories in such a rough estimation procedure. Certainly 20 is pushing it.
    > This means that a reasonable estimate on the upper bounds for the odds of getting a correct sequence are probably around (L/3)^8. Again, this can be a large number for L large. <

    PR: Again, I agree with you that interdependencies between amino acids
    at different positions eliminates the _physically_ possible occurrence
    of a large part of the _formally_ possible sequences. The only thing I
    am not so sure of is the specific formula you are deriving - both for
    the persistence length and for the number of amino acid categories.
    Depending on the particular cases, some of the equivalences you assume
    may not always apply, increasing (or occasionally decreasing) the number
    of distinguishable physically possible proteins. Also, long-range
    interactions between amino acids in a protein are common.
    WD: > Since there is as yet no evidence of intelligent life elsewhere in
    the universe, the probability of this process progressing to the point
    where intelligent life can emerge is clearly small. Perhaps "bacteria"
    levels of "life" may exist elsewhere but even that remains questionable
    if the exponent really implies "inevitable" as some people might wish to
    think. <

    PR: I would even doubt the feasibility of natural self-organization of
    matter forming viable "proto-bacteria", apart from divine guidance
    WD: > In that sense, a chance in a trillion is not to far out of reason
    to allow possibility in God's formation economy, but not mere
    inevitability. Since I have enough problems with my own ego and
    submitting to Christ's call in my life, and I'm sure I am not alone in
    that regard, that seems like God's divine wisdom in action. <

    PR: Here, I continue with your post of 6 Aug 2001 11:21:49 EDT,
    discussing what I wrote primarily about the darwinian evolution of
    already existing, viable genomes, not about the emergence of the first

    > > (snip)
    > >
    > > As the human genome contains an estimated 30,000 genes, and the number
    > > of different protein folds is estimated to be a few thousand, we may, as
    > > a very rough approximation, assume that there are less than 10^4
    > > basically different protein families in the biosphere, within each of
    > > which a number of similar proteins can be derived from each other by
    > > feasible evolutionary paths.
    > >
    > > The question is whether each of the 10^4 different protein families can
    > > be similarly derived from one or very few initial sequences, or by
    > > random mutational walks. If a novel enzyme or other functional protein
    > > is to arise, which is not easily derivable by a few selected mutations
    > > from an already existing one, we need a mutational random walk. The
    > > probability of finding any sequence with the activity required is about
    > > 10^-11. If, at a given moment in the evolution of a species, any one of
    > > 10^4 different novel activities will prove advantageous, the probability
    > > of finding any such sequence is about 10^-7.
    WD: > I am still not sure myself exactly what to make of the folds. Do
    they represent a language? If so, to what extent: are they mere
    commands or is there something more? By (admittedly rather dangerously
    poor) analogy, the early 8008 processor functioned successfully with
    only 17 instructions. Hence, if the "function" of a protein is quite
    limited, then the required "instruction set" could also be quite small.

    PR: Once life is here, the letters, in the language metaphor, could
    correspond to the nucleotides, the words to the amino acids, the
    sentences or instructions (commands) to the proteins. If we assume that
    one protein can easily evolve into a slightly different one (within the
    same protein family or fold), we need not be overly concerned about this
    variability. But evolution into a different family (or fold) is
    presumably much more difficult. The 8008 processor analogy is not useful
    - unless a viable self-replicating "organism" consisting of just a few
    (such as 17) types of proteins (or RNAs) only is really synthesized in
    the lab. Restricting the size of the instruction set or the
    functionalities of the instructions is of no use if this kills the
    WD: > So one thing that seems to need clarification is the level of
    complexity of a given protein. There is a big difference between the
    complexity of a human language, and the complexity of a simple computer
    program carrying out a small instruction set. Likewise, how many
    instructions are actually necessary is not fully clear to me. In that
    case, it is not so much the _number_ of folds, but what the folds
    actually _do_ that needs to be defined clearly. <

    PR: This is exactly what we need to know. (1) What is the minimal set of
    proteins (or RNAs) required for life? and (2) what is the minimal
    complexity of each of these polymers? At present, no one has the
    slightest idea about this. All we know is the complexity of the simplest
    organisms living today, which is of the order of a few hundred proteins
    of at least a few dozen required amino acid positions of at least 8
    (according to your estimate) different types, that is, very much more
    complex than any computer language, and clearly way beyond random-walks
    and self-organization.
    > > These estimates assume that directed evolution in the lab is a valid
    > > model for natural evolution. Of course, this is not the case, as in
    > > directed evolution one does not have to bother about the viability of
    > > each intermediate organism in a linear sequence of point mutations, but
    > > only about the isolated activity of a new protein sequence after several
    > > or many mutations. Directed evolution jumps around in sequence space,
    > > whereas natural evolution is limited to single-step paths, and none of
    > > these steps must go downhill on the fitness surface.
    > >
    > > How, then, is it possible that any one of the 10^3 or 10^4 basically
    > > different protein folds (families) arose (anywhere in the biosphere),
    > > let alone all of them? If there was the need for 10^3 different searches
    > > with probabilities of around 10^-10, it seems a hopeless proposition.
    > > (And the few million years available for the formation of the first
    > > viable organism appear transastronomically inadequate.)
    > >
    > > The only possibility of a way out seems to be to claim that every single
    > > one of the different protein families used in the biosphere are
    > > intimately connected in sequence space, such that simple linear
    > > sequences of point mutations, with all intermediates naturally selected,
    > > will do for all proteins. In this case, more than 99.999999999% (eleven
    > > nines altogether) of sequence space is barren for life and was never
    > > visited by any sequence during evolution. Whether this is a feasible
    > > proposition will have to be shown experimentally.
    > >
    > > This still leaves us with the mystery of the origin of the first living
    > > organism capable of natural evolution.
    > >
    > > But the very interesting finding of the two papers mentioned is that the
    > > protein sequence space is extremely sparsely populated with useful
    > > sequences. This makes evolution (which, for theological reasons, I
    > > believe has happened) an astonishingly marvelous process.
    WD: > You are beginning to rant again here. I can agree to some extent
    that the laboratory conditions _somewhat_ favor the expectations of the
    experimentalist. When these ideal conditions are removed, and these
    materials have to compete with all the other crud in a vat full of brown
    tar, it is not particularly clear that the results will be favorable. <

    PR: Apparently, my last sentence, "This makes evolution (which, for
    theological reasons, I believe has happened) an astonishingly marvelous
    process", has misled you into thinking I am happy with the usual
    evolutionary speculations. This is not at all the case. On the contrary,
    on the scientific level, I fully agree with your skepticism. The only
    way we can expect favorable evolutionary results in by way of divine
    guidance (or providence). In my post of 28 Nov 2000 17:30:17 +0100 (ASA
    digest vol #1889), I explained what I mean by this (extended version to
    appear in PSCF, Sep 2001).
    WD: > I think it also pertinent to say here that often the one thing
    that seems seriously lacking in these exchanges (perhaps more so from
    the evolution side) is reverence for how astoundingly lucky we really
    are to even have the privilege to think about where we came from. YEC
    folk err greatly in other ways, but I recognize that (in part) this is
    because they respect the Lord. In much the same way, I'm sure this is
    probably at the heart of ID arguments, viz., by invoking evolution, we
    seem to be denying the Lord's providence in our lives. I think it fair
    enough to say that ignoring the Lord is folly, and I understand that I
    have regularly come up short on more than this account alone. <

    PR: Without the Lord's providence, evolution is clearly incapable of
    achieving what it is usually believed to do. But autonomous evolution is
    not what God created. The most powerful and perfect computer does
    nothing whatsoever without the appropriate software, input data, and
    starting command.
    WD: > That being said, I am not fully decided on this matter, but I
    would contend that there are a lot of curious properties in polymers
    that allow for interesting possibilities. The abiogenesis arguments
    although persuasive *may* turn out to be wrong, but they are certainly
    arguments that can be tested and a testable hypothesis is something that
    a scientist can work on. "Give up" arguments are not (or at least, not
    until the funding runs out).
    > As I currently see it, the major problems that currently plague an abiogenesis scenario are probably as follows.
    > (1) A power source for running an RNA world. RNA does not appear to have a very large diversity of catalytic activity (at least compared to proteins). Without an engine and something to burn, the RNA world would "run out of gas" rather quickly. Introducing proteins brings us back to the chicken or egg question and greatly increases the complexity of the prebiotic world.
    > (2) The "replicaters" in a prebiotic world. If proteins must be an integral part of the abiogenesis process, the transcription machinery becomes more complicated as well. There have been a few attempts at replicaters for RNA (I suspect mostly inadequate), but if this must include the replication of proteins, then the difficultly of making "first base" becomes far more insurmountable.
    > (3) Even if we can eventually find a way to explain (1) and (2), let's not forget that life is an astoundingly lucky privilege and we should not forget to honor the Lord. Our call to follow Christ is in no way diminished whether life came about by probabilities or miracles. Life itself is itself a "miracle," and it is blessing that we *can* even chose to follow.
    > by Grace we proceed,
    > Wayne

    PR: Fully in agreement. And I would add, the most important ingredient
    lacking (apart from divine guidance/providence), is a source for the
    information needed to define the option to be chosen at each of the
    myriad crucial but random-looking events.


    Dr Peter Ruest			Biochemistry
    Wagerten			Creation and evolution
    CH-3148 Lanzenhaeusern		Tel.:	++41 31 731 1055
    Switzerland			E-mail:	<pruest@dplanet.ch
     - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    	In biology - there's no free lunch -
    		and no information without an adequate source.
    	In Christ - there is free and limitless grace -
    		for those of a contrite heart.

    This archive was generated by hypermail 2b29 : Fri Aug 10 2001 - 10:22:23 EDT