David Campbell wrote:
> It's good to see more realistic estimates of probability. A few considerations that will affect the calculations:
> PR>Basically, any sequence within the transastronomically huge combinatorial space of the L^20 possible sequences of proteins of length L would be accessible during evolution, if there is a mutational path which leads from an existing sequence to the target considered and which does not contain any intermediates which are selected against (or even lethal). <
> DC: Do we have any idea of the relative proportions of theoretically useful versus non-useful proteins that are inaccessible due to these reasons?
PR: Both of the papers I mentioned [Keefe A.D., Szostak J.W.,
"Functional proteins from a random-sequence library", Nature 410 (2001),
715-718; Silverman J.A., Balakrishnan R., Harbury P.B., "Reverse
engineering the ([beta]/[alpha])8 barrel fold", Proceedings of the
National Academy of Sciences USA 98 (2001), 3092-3097] mostly deal with
experimentally functional proteins which would probably be inaccessible
to natural evolution, although it may be difficult or impossible to
determine how many are in this category. Now, if 1 in 10^11 proteins is
a useful ATP binder and less than 1 in 10^10 is a useful triosephosphate
isomerase, this gives us the first estimate of the proportion of
possibly useful versus probably non-useful proteins of a _given
specificity_ or protein family. If there are 10^4 different protein
families, this first experimentally based estimate of the proportion you
are asking about would be around 10^-7. Of course, the uncertainties are
still huge. But that's the only relevant data I have come across until
I am not sure I understand what you mean by "theoretically useful".
There might be protein families which have never occurred in the
biosphere, but which might be useful to human technology. But I do not
think our present knowledge of protein structure-function relationship
is sufficient to explore this question theoretically.
> PR>As the human genome contains an estimated 30,000 genes, and the number of different protein folds is estimated to be a few thousand, we may, as a very rough approximation, assume that there are less than 10^4 basically different protein families in the biosphere, within each of which a number of similar proteins can be derived from each other by feasible evolutionary paths. <
> DC: Are there other theoretical biospheres using a different set of basic proteins?
PR: We know that on earth there is only one biosphere. At least no one
has come across any organism not genetically related to our biosphere,
as far as I know. And to-date, extraterrestrial life is 100%
speculation. According to Hugh Ross's probability estimates, I don't
expect there to be any second life-supporting planet in the universe
[Ross H., "Big Bang Refined by Fire" (Pasadena, CA: Reasons to Believe,
> PR>These estimates assume that directed evolution in the lab is a valid model for natural evolution. Of course, this is not the case, as in directed evolution one does not have to bother about the viability of each intermediate organism in a linear sequence of point mutations, but only about the isolated activity of a new protein sequence after several or many mutations. Directed evolution jumps around in sequence space, whereas natural evolution is limited to single-step paths, and none of these steps must go downhill on the fitness surface. <
> DC: On the other hand, the directed evolution in the lab started from a random set of proteins, whereas evolution of a new protein sequence from an existing sequence starts with an already functional protein, from which basic structural units can be retained.
PR: Yes, Keefe et al. started with a random set, but searched for a
single functional unity only (ATP binding). But Silverman et al. started
with a functional TIM, and then jumped around in sequence space. But
they also investigated some specific one-step mutations. In natural
evolution, you may start with a functional sequence, but if you want to
find a novel protein family, you may need a sequence of _non-selected_
mutations until you reach the first minimal activity of the novel
functionality. In this respect, the function(s) of the starting protein
may not be of much help.
> DC: This relates closely to the question of how easy it is to produce a protein of a different family via mutation. If it is not that difficult, then an enormous number of starting points are feasible.
PR: There may occasionally be a feasible mutational path leading from
one fold or family to a different one, but if a novel activity is to be
found, this as yet non-existent activity cannot be selected for, as long
as it has not been produced, to a minimal extent, through a mutational
_random walk_. And the longer such a random path is, the less probable
it will be. In my post of 22 Sep 2000 on "Random origin of biological
information" (ASA digest V1 #1804), I presented an estimate that such
non-selected random walks could not be longer than 2 new specific amino
acids, on average. And I doubt that this would be enough for a novel
> DC: Note also that small decreases in fitness, and sometimes large ones, may still be viable. The level of selective pressure will determine how far downhill each step may be without being eliminated by selection.
> Dr. David Campbell
PR: In a small population, yes, but there you don't have much variation
to play with, so this caveat may not help much. In a large population,
even neutral mutations often may not penetrate to fixation.
Thank you, David, for your comments!
-- -------------------------------------------------------------- Dr Peter Ruest Biochemistry Wagerten Creation and evolution CH-3148 Lanzenhaeusern Tel.: ++41 31 731 1055 Switzerland E-mail: <firstname.lastname@example.org - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - In biology - there's no free lunch - and no information without an adequate source. In Christ - there is free and limitless grace - for those of a contrite heart. --------------------------------------------------------------
This archive was generated by hypermail 2b29 : Fri Aug 03 2001 - 14:26:21 EDT