Evolution of proteins in sequence space

From: pruest@pop.dplanet.ch
Date: Fri Aug 03 2001 - 14:25:45 EDT

  • Next message: D. F. Siemens, Jr.: "Re: Evolution of proteins in sequence space"

    David Campbell wrote:
    > It's good to see more realistic estimates of probability. A few considerations that will affect the calculations:
    > PR>Basically, any sequence within the transastronomically huge combinatorial space of the L^20 possible sequences of proteins of length L would be accessible during evolution, if there is a mutational path which leads from an existing sequence to the target considered and which does not contain any intermediates which are selected against (or even lethal). <
    > DC: Do we have any idea of the relative proportions of theoretically useful versus non-useful proteins that are inaccessible due to these reasons?

    PR: Both of the papers I mentioned [Keefe A.D., Szostak J.W.,
    "Functional proteins from a random-sequence library", Nature 410 (2001),
    715-718; Silverman J.A., Balakrishnan R., Harbury P.B., "Reverse
    engineering the ([beta]/[alpha])8 barrel fold", Proceedings of the
    National Academy of Sciences USA 98 (2001), 3092-3097] mostly deal with
    experimentally functional proteins which would probably be inaccessible
    to natural evolution, although it may be difficult or impossible to
    determine how many are in this category. Now, if 1 in 10^11 proteins is
    a useful ATP binder and less than 1 in 10^10 is a useful triosephosphate
    isomerase, this gives us the first estimate of the proportion of
    possibly useful versus probably non-useful proteins of a _given
    specificity_ or protein family. If there are 10^4 different protein
    families, this first experimentally based estimate of the proportion you
    are asking about would be around 10^-7. Of course, the uncertainties are
    still huge. But that's the only relevant data I have come across until

    I am not sure I understand what you mean by "theoretically useful".
    There might be protein families which have never occurred in the
    biosphere, but which might be useful to human technology. But I do not
    think our present knowledge of protein structure-function relationship
    is sufficient to explore this question theoretically.
    > PR>As the human genome contains an estimated 30,000 genes, and the number of different protein folds is estimated to be a few thousand, we may, as a very rough approximation, assume that there are less than 10^4 basically different protein families in the biosphere, within each of which a number of similar proteins can be derived from each other by feasible evolutionary paths. <
    > DC: Are there other theoretical biospheres using a different set of basic proteins?

    PR: We know that on earth there is only one biosphere. At least no one
    has come across any organism not genetically related to our biosphere,
    as far as I know. And to-date, extraterrestrial life is 100%
    speculation. According to Hugh Ross's probability estimates, I don't
    expect there to be any second life-supporting planet in the universe
    [Ross H., "Big Bang Refined by Fire" (Pasadena, CA: Reasons to Believe,
    > PR>These estimates assume that directed evolution in the lab is a valid model for natural evolution. Of course, this is not the case, as in directed evolution one does not have to bother about the viability of each intermediate organism in a linear sequence of point mutations, but only about the isolated activity of a new protein sequence after several or many mutations. Directed evolution jumps around in sequence space, whereas natural evolution is limited to single-step paths, and none of these steps must go downhill on the fitness surface. <
    > DC: On the other hand, the directed evolution in the lab started from a random set of proteins, whereas evolution of a new protein sequence from an existing sequence starts with an already functional protein, from which basic structural units can be retained.

    PR: Yes, Keefe et al. started with a random set, but searched for a
    single functional unity only (ATP binding). But Silverman et al. started
    with a functional TIM, and then jumped around in sequence space. But
    they also investigated some specific one-step mutations. In natural
    evolution, you may start with a functional sequence, but if you want to
    find a novel protein family, you may need a sequence of _non-selected_
    mutations until you reach the first minimal activity of the novel
    functionality. In this respect, the function(s) of the starting protein
    may not be of much help.

    > DC: This relates closely to the question of how easy it is to produce a protein of a different family via mutation. If it is not that difficult, then an enormous number of starting points are feasible.

    PR: There may occasionally be a feasible mutational path leading from
    one fold or family to a different one, but if a novel activity is to be
    found, this as yet non-existent activity cannot be selected for, as long
    as it has not been produced, to a minimal extent, through a mutational
    _random walk_. And the longer such a random path is, the less probable
    it will be. In my post of 22 Sep 2000 on "Random origin of biological
    information" (ASA digest V1 #1804), I presented an estimate that such
    non-selected random walks could not be longer than 2 new specific amino
    acids, on average. And I doubt that this would be enough for a novel

    > DC: Note also that small decreases in fitness, and sometimes large ones, may still be viable. The level of selective pressure will determine how far downhill each step may be without being eliminated by selection.
    > Dr. David Campbell

    PR: In a small population, yes, but there you don't have much variation
    to play with, so this caveat may not help much. In a large population,
    even neutral mutations often may not penetrate to fixation.

    Thank you, David, for your comments!


    Dr Peter Ruest			Biochemistry
    Wagerten			Creation and evolution
    CH-3148 Lanzenhaeusern		Tel.:	++41 31 731 1055
    Switzerland			E-mail:	<pruest@dplanet.ch
     - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    	In biology - there's no free lunch -
    		and no information without an adequate source.
    	In Christ - there is free and limitless grace -
    		for those of a contrite heart.

    This archive was generated by hypermail 2b29 : Fri Aug 03 2001 - 14:26:21 EDT