Re: Evolution of proteins in sequence space

From: Lawrence Johnston (
Date: Fri Aug 03 2001 - 10:19:20 EDT

  • Next message: bivalve: "Re: Wheel of God"

    Peter - Thanks x 10^6 for that beautiful analysis of our situation in sequence space. It
    looks to me like this leaves us with two options:

    1, we adopt Van Til's hypothesis of ultra-smart atoms or 2. Assume that Someone has been
    injecting huge amounts of information into the Universe, from outside. Other options?

    Peter Ruest said:
    > Proteins may evolve in two basically different modes. One mode is by a
    > sequence of point mutations. The other mode is by genetic recombination
    > of preexisting modules or fragments. (Let me ignore deletions which
    > presumably are deleterious in the vast majority of cases - except
    > perhaps for some occasional deletions of entire codons.) Each of the new
    > sequences produced must then be accepted (and fixed in the population)
    > by natural selection or by random drift (if it is lost, it does not
    > contribute to evolution). Novel sequence information is generated in the
    > first case, series of several point mutations, only.
    > Basically, any sequence within the transastronomically huge
    > combinatorial space of the L^20 possible sequences of proteins of length
    > L would be accessible during evolution, if there is a mutational path
    > which leads from an existing sequence to the target considered and which
    > does not contain any intermediates which are selected against (or even
    > lethal). In order to evaluate this mechanism of evolution and the
    > probability of its success, we should have an idea about the frequency
    > of useful sequences in sequence space. This information has been
    > missing, but now some indications about it are available.
    > Keefe A.D., Szostak J.W., "Functional proteins from a random-sequence
    > library", Nature 410 (2001), 715-718, generated a library of 6x10^12
    > proteins, each containing 80 contiguous random amino acids, and enriched
    > those proteins that bound to ATP. They found four new families of
    > ATP-binding proteins unrelated to each other and unrelated to the
    > natural ones. The selectively enriched substitutions were distributed
    > over 62 of the 80 randomized amino acids, and a core domain of 45 amino
    > acids sufficient for ATP-binding was defined. Keefe et al. estimated
    > that roughly 1 in 10^11 of all random-sequence proteins have ATP-binding
    > activity.
    > Silverman J.A., Balakrishnan R., Harbury P.B., "Reverse engineering the
    > ([beta]/[alpha])8 barrel fold", Proceedings of the National Academy of
    > Sciences USA 98 (2001), 3092-3097, analyzed the most commonly occurring
    > fold among protein catalysts, the TIM (triosephosphate isomerase) barrel
    > consisting of 8 analogous units of beta sheet, loop, alpha helix, and
    > turn, which together form a barrel accommodating a variable active site,
    > used in a large family of different enzymes. Silverman et al. applied
    > combinatorial mutagenesis of 182 amino acid positions in the barrel and
    > functional selection for TIM activity in E.coli, requiring a minimal
    > threshold of 10^-4 of wild-type activity. They estimate that fewer than
    > 1 in 10^10 of the sequences in their degenerate library are able to
    > complement in vivo.
    > Thus, the two estimates agree quite well, even though they are derived
    > in very different ways. If we look at protein sequence space, less (how
    > much?) than 1 in 10^10 sequences is a triosephosphate isomerase enzyme,
    > and 1 in 10^11 sequences binds ATP, which is a partial activity of many
    > enzymes.
    > As the human genome contains an estimated 30,000 genes, and the number
    > of different protein folds is estimated to be a few thousand, we may, as
    > a very rough approximation, assume that there are less than 10^4
    > basically different protein families in the biosphere, within each of
    > which a number of similar proteins can be derived from each other by
    > feasible evolutionary paths.
    > The question is whether each of the 10^4 different protein families can
    > be similarly derived from one or very few initial sequences, or by
    > random mutational walks. If a novel enzyme or other functional protein
    > is to arise, which is not easily derivable by a few selected mutations
    > from an already existing one, we need a mutational random walk. The
    > probability of finding any sequence with the activity required is about
    > 10^-11. If, at a given moment in the evolution of a species, any one of
    > 10^4 different novel activities will prove advantageous, the probability
    > of finding any such sequence is about 10^-7.
    > These estimates assume that directed evolution in the lab is a valid
    > model for natural evolution. Of course, this is not the case, as in
    > directed evolution one does not have to bother about the viability of
    > each intermediate organism in a linear sequence of point mutations, but
    > only about the isolated activity of a new protein sequence after several
    > or many mutations. Directed evolution jumps around in sequence space,
    > whereas natural evolution is limited to single-step paths, and none of
    > these steps must go downhill on the fitness surface.
    > How, then, is it possible that any one of the 10^3 or 10^4 basically
    > different protein folds (families) arose (anywhere in the biosphere),
    > let alone all of them? If there was the need for 10^3 different searches
    > with probabilities of around 10^-10, it seems a hopeless proposition.
    > (And the few million years available for the formation of the first
    > viable organism appear transastronomically inadequate.)
    > The only possibility of a way out seems to be to claim that every single
    > one of the different protein families used in the biosphere are
    > intimately connected in sequence space, such that simple linear
    > sequences of point mutations, with all intermediates naturally selected,
    > will do for all proteins. In this case, more than 99.999999999% (eleven
    > nines altogether) of sequence space is barren for life and was never
    > visited by any sequence during evolution. Whether this is a feasible
    > proposition will have to be shown experimentally.
    > This still leaves us with the mystery of the origin of the first living
    > organism capable of natural evolution.
    > But the very interesting finding of the two papers mentioned is that the
    > protein sequence space is extremely sparcely populated with useful
    > sequences. This makes evolution (which, for theological reasons, I
    > believe has happened) an astonishingly marvellous process.

    All God's best, Larry Johnston

    "He has made everything beautiful in its time. He has also set
     eternity in the hearts of men" - - Ecclesiastes 3:11, NIV trans

    Lawrence H. Johnston home:917 E. 8th st.
    professor of physics, emeritus Moscow, Id 83843
    University of Idaho (208) 882-2765 =====================

    This archive was generated by hypermail 2b29 : Fri Aug 03 2001 - 10:17:54 EDT