Emergence of information out of nothing?

From: Peter Ruest (pruest@pop.mysunrise.ch)
Date: Sat May 18 2002 - 12:55:02 EDT

  • Next message: george murphy: "Re: Science, Women, and Paul"

    Glenn,

    At your request, I append - after the current discussion - some
    statements extracted from my following posts, from our last discussion
    about "Random origin of biological information":
    Date: Fri, 22 Sep 2000 13:51:34 +0200 (ASA-digest V1 #1804)
    Date: Sun, 24 Sep 2000 09:19:09 +0200 (ASA-digest V1 #1806)
    Date: Wed, 27 Sep 2000 21:03:58 +0200 (ASA-digest V1 #1812)
    Date: Mon, 02 Oct 2000 20:18:36 +0200 (ASA-digest V1 #1818)

    It's quite voluminous, though, and if Terry snips it out, check the
    archive.

    I am sorry this is a long post, as in your answers you often branch out
    into many side trails, making the whole discussion somewhat confusing.
    Yet I dare not snip out things, lest you again misunderstand me. So I'll
    just comment wherever I can't agree with what you say. But remember that
    the whole long argument started (04 May 2002 16:46:30 +0200) with my
    simple claim that we have to distinguish between:
    (I) Maximum information carrying capacity;
    (II) Functional information relevant for biological systems.
    Your reaction was that Shannon information is the only valid
    information, and that (II) has nothing to do with information.

    > > Glenn wrote (8 May 2002 21:45:23 -0700):
    > >>You use the example of Yockey's work with cytochrome c below. He
    >found there
    > >> were 10^93 different cytochrome c's which would perform the work of that
    > >> molecule. What he didn't prove was the possibility of huge
    >numbers of other
    > >> FAMILIES of proteins which will also do that job.
    >
    > I replied:
    > >Agreed - in principle. Yet, if there are any other families (let alone
    > >huge numbers) which will perform the same function (in the same
    > >organismal environment), I find it strange that no such example has been
    > >found to date, as far as I know.
    >
    > You replied (13 May 2002 21:03:42 -0700):
    > I believe I cited an example to you earlier. The work of Szostack and
    > Ellington points in that direction. This work is cited below.

    That's artificial selection of RNA function in vitro, rather than
    spontaneous emergence of minimal protein function without selection in
    vivo (or in the prebiotic world).

    > >You may say this is because just one
    > >family happened to be present in the universal common ancestor and was
    > >inherited by the whole biosphere. But such an argument would just push
    > >the problem back to the origin of life: if there were such huge numbers
    > >of unrelated possibilities, why was there an universal common ancestor
    > >at all, rather than a huge number of unrelated ones?
    >
    > This would point to one origin of life, not multiple origins of life. That
    > is why. If life evolved many times, then there should be many lineages under
    > this assumption. But with one origin of life, from which all others
    > evolved, then having one family is quite reasonable, explainable
    >and expected.

    Yes, what I wrote underlines that there was one origin of life, not
    multiple ones, and having one family is expected. But the point I made
    is that this fact strongly argues against your assumption that huge
    numbers of synonymous families are possible. If that were the case, you
    would expect multiple families, even if there were only one origin of
    life.

    > >> Given the work of Joyce
    > >> and others (which you didn't mention in your reply) they have
    >found that if
    > >> you choose a function and search for it with random molecules and random
    > >> mutation, you can find any given function with a probability of 1 in 10^14
    > >> to 1 in 10^18. I cite this:
    > >> Andrew Ellington and Jack W. Szostak "used small organic dyes as the
    > >> target. They screened 10 13 random-sequence RNAs and found molecules
    > >> that bound tightly and specifically to each of the dyes.
    > >> "Recently they repeated this experiment using random-sequence
    > >> DNAs and arrived at an entirely different set of dye-binding
    > >> molecules. ...
    > >> "That observation reveals an important truth about directed
    > >> evolution (and indeed, about evolution in general): the forms
    > >> selected are not necessarily the best answers to a problem in some
    > >> ideal sense, only the best answers to arise in the evolutionary
    > >> history of a particular macromolecule."~Gerald F. Joyce, "Directed
    > >> Evolution," Scientific America, Dec. 1992, p. 94-95.
    > >p.48 not 94
    > >> And I cite this:
    > >> "We designed a pool of random sequence RNAs, using the minimal ATP
    > >> apatamer as a core structure. By creating a pool that was
    > >> predisposed to bind ATP specifically and with high affinity we hoped
    > >> to increase the likelihood of generating molecules with ATP-dependent
    > >> kinase activity. The ATP apatamer core was surrounded by three
    > >> regions of random sequence of 40, 30 and 30 nucleotides in length,
    > >> respectively. The ATP-binding domain itself was mutagenized such
    > >> that each base had a 15% chance of being non-wild-type, to allow for
    > >> changes in the apatamer sequence that might be required for optimal
    > >> activity. To increase the likelihood of finding active molecules, we
    > >> attempted to create a pool containing as many different molecules as
    > >> possible. Because it is difficult to obtain an acceptable yield from
    > >> the sysnthesis of a single oligonucleotide of this lenght (174
    > >> nucleotides), we made two smaller DNA templates and linked them
    > >> together to generate the full-lenght DNA pool. Transcription of this
    > >> DNA yielded between 5 x 10^15 and 2 x 10^16 different RNA
    > >> molecules."~Jon R. Lorsch and Jack W. Szostak, "In Vitro Evolution of
    > >> New Ribozymes with Polynucleotide Kinase Activity," Nature, 371,
    > >> Sept. 1994, p. 31
    > >> We can act as if the probability is very low to find a given
    >functionality,
    > >> like YECs act as if the earth is young, but acting like it isn't going to
    > >> change the fact that functionality is found much more readily than
    > >> anti-evolutionary activists want to believe.
    > >
    > >Glenn, you know very well that I am neither a YEC nor an
    > >anti-evolutionary activist (cf.
    > >http://www.asa3.org/ASA/PSCF/1999/PSCF12-99Held.html). All I insist on
    > >is that an adequate mechanism for producing evolutionary novelty is as
    > >yet elusive.
    >
    > This has nothing to do with YEC, it has to do with multiple families of
    > biopolymers being able to perform the same function. And the probability
    > argument which you are using IS an anti-evolutionary argument. Very few
    > pro-evolutionists are worried about it because they know the data I just
    > posted but which you failed to give comment.

    Just like in Sept. 2000, when we discussed this last time, you keep
    talking about RNA artificial selection in vitro, rather than protein
    natural selection in vivo or prebiotic random walk emergence of minimal
    function, which is very different, and I explained why. I know that very
    few pro-evolutionists are worried about this, but this is no factual
    argument, but an appeal to authority - which I would not expect from
    you.

    > Pay attention to the issue at hand. I am saying that ignoring the data I
    > posted above is exactly LIKE, ANALOGOUS, SIMILAR to the way the YECs act.
    > And indeed, you skipped right by it without any comment.

    The issue at hand is random evolution of novel protein functionality,
    and, in particular, the first minimal functionality of a novel protein,
    before natural selection can set in. This has nothing to do with
    artificial selection of RNA in vitro, particularly if some of the
    functionality selected is already present. It's you who are evading the
    issue, not I.

    > >>So, given that I am mentioning this work for a second time, will
    >you respond
    > >> to it's import now?
    > >
    > >You have not mentioned these papers (if I remember correctly), but
    > >similar ones, and I responded in detail. But I may do it again, giving
    > >you a new example if you insist. A. Lombardi, et al., "Miniaturized
    > >metalloproteins: Application to iron-sulfur proteins", PNAS 97 (2000),
    > >11922, attempted to design a minimal redox enzyme, but haven't achieved
    > >their goal as yet. Their dimeric undecapeptide can hold an iron atom,
    > >but is unstable, being too small to shield off the environmental water.
    > >The invariant of their (intelligently designed) construct amounts to at
    > >least 5 specific amino acid occupations, which is too much to be
    > >attainable by an evolutionary process without selection.
    >
    > What Lombardi is doing is not at all what Joyce, Szostak and Ellington are
    > doing. Lombardi is trying to shrink the proteins down to miniature versions,
    > of smaller length.

    This is exactly my point, see above. These miniature proteins are the
    ones which may give indications about the origin of semantic or
    functional biological information, about which I was talking (case a). I
    did not want to deal with improvement of a preexisting functionality
    (case b), because there you may just be taking over some "information"
    from the environment by means of selection. And I did not doubt there
    are some RNA functions (case c) that are not very difficult to find (if
    you do have RNA!), just as Joyce, Szostak and others have found, even if
    you are looking for a function not yet present in the starting mix.
    Again, we have no means of telling whether any information has emerged
    de novo. With proteins, there is a way of dealing with semantic
    information (II), cf. Yockey's book. With RNA, I know of no similarly
    promising way of dealing with functional information, because residue
    conservation is much less clearly definable (you have only 4
    nucleotides, and there is the additional complication of base pairing).

    So, I am looking for examples of case a, but you keep pointing to
    examples of case b and/or case c.

    > The article starts with:
    > "Miniaturized proteins are peptide-based synthetic models of natural
    > macromolecular systems. They contain a minimum set of constituents necessary
    > for an accurate reconstruction of defined structures and for a fine-tuned
    > reproduction of defined functions" A. Lombardi, et al., "Miniaturized
    > metalloproteins: Application to iron-sulfur proteins", PNAS 97 (2000), 11922.
    >
    > Joyce and the others are producing similar functionality from similar length
    > molecules. Your Lombardi article is a rabbit trail full of red herrings. It
    > is a non-sequitur response. It is taking something different and throw it
    > out there which confuses rather than illuminates.

    No, Glenn, they are not at all similar. There are fundamental
    differences between proteins and RNAs. Structure-function relationships
    are completely different; and with proteins, you need the
    genotype-phenotype code translation - to just mention two factors. You
    questioned my concept of semantic biological information, but you refuse
    to consider my definition of it. I don't see anything relevant to this
    type of information in the RNA artificial selection work - although it
    certainly is of interest in other respects. It's just not applicable to
    what I said and you questioned.

    > >> >This only works because you first give me the book, which contains all
    > >> >the relevant semantic information. With the signal, you just send me
    > >> >ln(3) bits of information, not lots.
    > >>
    > >>I believe that is exactly what I said in my note. I haven't sent
    >you lots of
    > >> shannon information, but I have sent you lots of colloquial information.
    > >
    > >[Sorry, I should have written log2(3), instead of ln(3).] How can you
    > >transmit colloquial information (in the book) without any Shannon
    > >information? Whatever you transmit through whatever medium can be
    > >measured by Shannon information (which, however, also includes all the
    > >uninteresting noise and the irrelevant part of the colloquial
    > >information).
    >
    > Either log or ln will work with Shannon entropy. One merely uses a different
    > constant.

    When you are talking about a certain amount of information, it does
    matter which log you use. It's only in fractions (comparisons) that the
    constant drops out.

    > And you need to study up on Shannon entropy because you are not
    > even getting signal to noise correct. A signal is what you want to
    > transmit. It is the sequence you have in your hand. That sequence does not
    > have noise. Noise is what happens to the signal as it goes through the
    > system. It is the differences at the reception end from what was actually
    > sent. Generally people try not to transmit the noise from the start of the
    > system and they only want to transmit signal. To consciously transmit noise
    > would be like blowing a big fan on a microphone when Pavarotti comes up to
    > sing. He won't be happy and neither will the audience. However, in the act
    > of transmitting Pavarotti's voice, system noise, phase distortion, amplitude
    > absorption, frequiency dispersion etc, etc, all occur and that is the noise.

    I agree with all these principles. But I wasn't using "signal" and
    "noise" in this technical sense at all. In your example of the book and
    the pointer following it, I was talking of the colloquial information in
    the book, which contained all the information you wanted me to know,
    namely the operational variant you afterwards pointed to, but also the
    two other variants which were not to be executed and which therefore
    were redundant. Furthermore, any colloquial text usually contains some
    redundant writing. This is what I called "all the uninteresting noise
    and the irrelevant part", hoping you would understand what I meant. If
    you transmit anything more than the absolutely minimal algorithm needed
    to produce the message intended to be transmitted, you are transmitting
    noise. Now, neither colloquial book contents nor a pointer consisting of
    an 8-bit character is free of redundancy or noise in this sense.

    > >> >You want to keep the signal small
    > >> >in order to transmit it fast, therefore it cannot carry all the semantic
    > >> >information you want me to have for executing your plan, so you transmit
    > >> >the large amount of information beforehand and make the signal nothing
    > >> >but a pointer to one of the 3 large texts you transmitted beforehand.
    > >>
    > >> You miss my point. You had stated that semantic information is related to
    > >> SHannon information. I gave you a case where that wasn't the case. Shannon
    > >> information isn't related to semantic information.
    > >
    > >It _is_ related, see above, just not 1-to-1. I didn't miss your point,
    > >but your example doesn't work. Without the book the transmitted pointer
    > >is of no use at all. Its Shannon information remains the same, but its
    > >semantic information is zero. With the book and the pointer, the total
    > >Shannon information transmitted is huge, the semantic information just
    > >equivalent to what you wanted to have me know at the end, namely about
    > >one third of the semantic information in the book.
    >
    > Lets use Lucien's excellent example of 328945 in decimal or 504F1 in hex,
    > 1202361 in octal, or 1010000010011110001 in binary. The shannon entropy is
    > different for each sequence but each sequence has the same meaning.
    > If I do my math correctly, 1010000010011110001 has
    > H = -K .4 ln(.4) + K .6 ln(.6)= K(.366+ .306) = .672 K
    > 504F1 has
    > H = -K 5 x .2 x ln(.2))= 1.609 K
    > 1202361 has
    > H = (.357 + .357 + .277 +.277 +.277) K = 1.545 K
    > And 328945 has
    > H = 6 x (1/6)ln(1/6) = 1.791 K
    > Same meaning different Shannon entropies because the Shannon entropy has
    > absolutely NOTHING to do with meaning, semantic meaning, function or
    > anything other than a measure of how easy or hard it will be to transmit a
    > given sequence.
    > If you disagree, then please show the mathematics showing that Shannon
    > entropy is related to meaning and how.

    I don't dispute these calculations at all. But again and again, I have
    emphasized that we have to distinguish between
    (I) Maximum information carrying capacity;
    (II) Functional information relevant for biological systems.
    Shannon entropy is related to (I), not directly to (II). Meaning
    (biological or otherwise) is found in (II) and is a function of a
    functional system like a given language or biological system. (I), which
    is a function of sequence length and alphabet size, specifies nothing
    but a maximum amount of functional information (II) which can be stored
    in a given sequence having a maximal capacity (I). Never have I claimed
    a 1-to-1 correspondence between a "value" of (I) and a "value" of (II).

    You may compute the Shannon entropy of a given DNA sequence (4-letter
    alphabet) or a given protein (20-letter alphabet). You'll get different
    values, even for a length ratio of 3:1. Or you may compute protein
    sequence entropies by taking into considerations further restrictions
    given by biological circumstances like available monomer frequencies,
    sequence restrictions, amino acid frequencies at given positions within
    protein families, etc. Do you want to call these Shannon entropies?
    Yockey does [H.P. Yockey, "Information theory and molecular biology"
    (Cambridge Univ.Press, 1992, ISBN 0-521-35005-0)]: "... the entropy of
    the genome is the Shannon entropy or the Kolmogorov-Chaitin algorithmic
    entropy" (p.261). The algorithmic entropy has to do with the shortest
    possible algorithm generating a given sequence. "The entropy that is
    applicable to the case of the evolution of the genetic message is ...
    the Shannon entropy of information theory or the Kolmogorov-Chaitin
    algorithmic entropy" (p.312). "... _highly organized_ sequences ... have
    a large Shannon entropy and are embedded in the portion of the Shannon
    entropy scale also occupied by _random sequences_" (p.313).

    Yockey also shows the connection to meaningful biological information:
    "Let us consider evolution as a communication system from past to
    present. At some time in the history of life the first cytochrome c
    appeared. As a result of drift, random walk and natural selection, this
    ancestor genetic message was communicated along the dendrites of a
    fractal ... representing a phylogenetic tree ... Some dendrites lead to
    modern organisms, the sequence having changed with time. Thus the
    original genetic message of the common ancestor specifying cytochrome c,
    regarded as an input, has many outcomes that nevertheless carry the same
    specificity. The evolutionary processes can be considered as random
    events along an ergodic Markov chain ... that have introduced
    uncertainty in the original genetic message. This uncertainty is
    measured by the conditional entropy in the same manner as the
    uncertainty of random genetic noise is measured ... Since the
    specificity of the modern cytochrome c is preserved, although many
    substitutions have been accepted, this conditional entropy may be
    subtracted from the source entropy ..., to obtain the mutual entropy or
    information content needed to specify at least one cytochrome c sequence
    ... The information content of the sequence that determines at least one
    cytochrome c molecule is the sum of the information content of each
    site. The total information content is a measure of the complexity of
    cytochrome c" (p.132). For the mathematical formulation, please refer to
    Yockey's book.

    > >Homonyms may be difficult to find in biology! They occasionally occur in
    > >our languages, even within the same language.
    >
    > See Szostak and Ellington above and Joyce. They are finding homonyms in
    > biology but you don't seem to want to discuss them.

    This is in vitro RNA chemistry using some biochemical molecules. It may
    not have much to do with biology. The RNA world is completely
    hypothetical, and we have no idea how it might have emerged. Presumed
    natural evolutionary processes in it are completely different from known
    evolutionary processes in living organisms.

    > >> And American english doesn't
    > >>have terms like 'jobworthy', or 'puckle' or 'bobbies' as English
    >english and
    > >>Doric english do. How do you quantify the clear and obvious (to
    >me) semantic
    > >> meaning when you don't know the semantic meaning. And because of this,
    > >> semantic information becomes SUBJECTIVE not OBJECTIVE. It has nothing
    > >> whatsoever do do with ambiguity. Puckle is a clearly defined word with no
    > >> imprecision.
    > >> Hearing German means nothing to me because I don't know the language. I
    > >> can't even tell if someone using a gutteral language is really speaking
    > >> German. I can have an idea that they are, but that doesn't mean that they
    > >> are. Thus I can't OBJECTIVELY determine meaning without being in on the
    > >> private agreement about what sounds mean what.
    > >
    > >It's the same with biological functions we don't understand yet. I never
    > >claimed to understand all biological functionality, even of a single
    > >enzyme. But I claim that biological molecules _do_ have precise
    > >functions - and therefore semantic information -, just as linguistic
    > >words do. Meaning is relative to a specific language, as you maintain,
    > >and it's the same with biological "words", but this doesn't eliminate
    > >information for the system that "knows" the appropriate language. And
    > >that's what counts in biology.
    > >
    > >> >> It is the same problem as trying to determine which of the following
    > >> >> sequences has meaning.
    > >> >> ni ru gua wo shou bu de bu dui jiao wo hao hao?
    > >> >[I skip some of your long "message"]
    > >> >> 7ZPTF0)WNO1%OSYYCP20NFGlP#DOWN:AQ[OVV,JFUsyjdyj
    > >> >> If you can tell which has meaning, then you can determine biological
    > >> >> functionality.
    > >> >
    > >> >Which meaning? Which functionality? What language or code? I.e. I agree
    > >> >that meaning or biological functionality is not derivable from the
    > >> >sequence alone, but must be found by the knowledge of the language or
    > >> >biological observations.
    > >>
    > >> The very fact that you have to ask what meaning, what language what code
    > >> admits of the fact that meaning isn't objectively determinable.
    >
    > Of my meaning test you wrote:
    > >I don't think you seriously require knowledge of Chinese for anyone who
    > >wants to think about biology... ;-)
    >
    > No, one doesn't need to know chinese to think about biology. But if you are
    > going to claim that meaning is related to shannon entropy or that one can
    > tell meaning via shannon's entropy, then I would suggest my test is an
    > appropriate test of that hypothesis. One can't predict meaning by looking at
    > a sequence any more than one can predict functionality by looking at a
    > single molecule. Indeed, one can't even predict IF there is a function. If
    > you could then you could pass my test WITHOUT knowing Chinese, which I speak
    > very poorly (hen bu hou).

    Of course, today one cannot predict biological function (if any) from a
    sequence alone. I never claimed this. However, as researchers are
    getting better at understanding the biological systems which can "read"
    and express such sequences in the appropriate functional environment, a
    measure of meaningful prediction will emerge. This is what the new field
    of proteomics is all about. This confirms the relationship between
    information (I) and information (II).

    > >> The concept is useless, empty and misleading. It does nothing
    >for us other
    > >> than make us feel like we are really being scientific when in fact we
    > >> aren't.
    > >
    > >Maybe we'd better talk about this again after you had a look at Yockey's
    > >book. Otherwise, we may not get any productive discussion.
    >
    > I have my notes from Yockey's book which includes lots of info from that
    > part of the book. Why don't we give it a go?
    >
    > >> Can you cite an experiment which shows that the same is not applicable to
    > >>proteins? I mean experimental data not merely someones opinion.
    >Afterall RNA
    > >> is related to DNA and DNA makes proteins.
    > >
    > >I did that last time we discussed this, if I remember correctly. Of
    > >course, you realize that positive results (which are feasible with
    > >artificial selection of RNA in vitro) are published, but this is usually
    > >not done (or not possible!) with negative results (with natural
    > >selection of proteins in vivo). Thus, we can at most expect to find
    > >partial results, such as the one by Lombardi I cited above.
    >
    > Lombardi is irrelevant as I noted above. They aren't even doing the same
    > thing. And why don't you refresh my memory about what you said. I haven't
    > been on this list very much for the past 2 years and my memory isn't that
    > good.
    >
    > glenn

    Lombardi is very relevant. But I'll be happy to look at more relevant
    work in the field of minimal amino acid placement requirements for a
    specific protein function (including possible homonyms) if you can
    provide the references.

    Peter

    ...............................................................................

    At your request, I append some statements extracted from my following
    posts:
    Date: Fri, 22 Sep 2000 13:51:34 +0200 (ASA-digest V1 #1804)
    Date: Sun, 24 Sep 2000 09:19:09 +0200 (ASA-digest V1 #1806)
    Date: Wed, 27 Sep 2000 21:03:58 +0200 (ASA-digest V1 #1812)
    Date: Mon, 02 Oct 2000 20:18:36 +0200 (ASA-digest V1 #1818)

    Date: Fri, 22 Sep 2000 13:51:34 +0200 (ASA-digest V1 #1804)

    > [snip]

    But let's look more closely at what really happens in evolution! Hubert
    P. Yockey ("A calculation of the probability of spontaneous biogenesis
    by information theory", J.theoret.Biol. 67 (1977), 377) compared the
    then known sequences of the small enzyme cytochrome c from different
    organisms. He found that 27 of the 101 amino acid positions were
    completely invariant, 2 different amino acids occurred at 14 positions,
    3 at 21, etc., more than 10 nowhere. Optimistically assuming that the
    101 positions are mutually independent and that chemically similar amino
    acids can replace each other at the variable positions without harming
    the enzymatic activity, he calculated that 4 x 10^61 different sequences
    of 101 amino acids might have cytochrome c activity. But this implies
    that the probability of spontaneous emergence of any one of them is only
    2 x 10^(-65), which is way too low to be considered reasonable (it is
    unlikely that these numbers would change appreciably by including all
    sequences known today). A similar situation applies to other enzymes,
    such as ribonucleases.

    Thus, a modern enzyme activity is extremely unlikely to be found by a
    random-walk mutational process. But "primitive" enzymes, near the origin
    of life, may be expected to have much less activity and to be much less
    sensitive to variation. Unfortunately, before someone synthesizes a set
    of "primitive" cytochromes c, we have no way of knowing the effects of
    these factors.

    What we can do, however, is to estimate how many invariant sites can be
    expected to be correctly occupied by means of a random walk before a new
    enzyme activity becomes selectable by darwinian evolution (of course,
    such an invariant set may be distributed among more sites which are
    correspondingly more variable, without affecting the conclusions). So,
    let's start with some extremely optimistic assumptions (cf. P. Rst,
    "How has life and it's diversity been produced?" PSCF 44 (1992), 80):

    Let's assume that all of the Earth's biomass consists of the most
    efficient biosynthesis "machines" known, bacteria, and all of them
    continually churn out test sequences for a new enzyme function, which
    doesn't exist yet in any organism. They start with random sequences or
    sequences having a different function. Natural selection starts only
    after a minimal enzymatic activity of the type wanted is discernable. In
    today's biosphere, t = 10^16 moles of carbon are turned over yearly,
    there are n = 10^14 bacteria per mole of carbon, each bacterium having b
    = 4.7 x 10^6 base pairs in its DNA. This yields R = tnb = 4.7 x 10^36
    nucleotide replications per year on Earth.

    In protein biosynthesis, there are c = 61/20 = 3.05 codons per amino
    acid, a = 2.16 mutations per amino acid replacement (geometric average
    of all possible shortest mutational walks in the modern code table), a
    mutation rate of 1 mutation in m = 10^8 nucleotides replicated.
    Therefore, r = 1/(c(3/m)^a) = 5.8 x 10^15 nucleotide replications are
    required for 1 specific amino acid replacement (the factor 3 represents
    the codon length in the triplet code).

    In order to get s specific amino acid replacements, r^s nucleotide
    replacements are needed, and the average waiting period for 1 hit
    anywhere on Earth is W = (r^s)/R. For s = 1, W = 4 x 10^(-14) seconds;
    for s = 2, W = 4 minutes; for s = 3, W = 40 billion years!

    Thus the minimal set for a starting enzymatic activity cannot contain
    more than 2 specific amino acid occupations! Of course, for the origin
    of life, biosynthesis "machines" like bacteria were not yet available,
    and certainly not in an amount equalling today's biomass! Does it still
    sound reasonable to assume that biological information is easily
    generated by random processes? Or is there something wrong with the
    model underlying the above estimate?

    If God used only random processes and natural selection when He created
    life 3.8 billion years ago, we should be able to successfully simulate
    it in a computer. You may even cheat: the genome sequences of various
    non-parasitic bacteria and archaea are available. The challenge stands.
    By grace alone we proceed, to quote Wayne.

    ..............

    Date: Sun, 24 Sep 2000 09:19:09 +0200 (ASA-digest V1 #1806)

    > [snip]

    Glenn:
    You are right, there IS randomness in all these 21-letter sequences, no
    matter whether they were generated by encrypting a meaningful phrase or
    by running a random number generator, and ANY meaningful 21-letter
    message can be generated from ANY of the 26^21 possible sequences if the
    right key is found.

    But this fact does NOT imply that meaning or semantics can arise
    spontaneously by random processes, without some intelligent input of
    information. Either this happens when the sender encrypts his message
    and gives the key to the designated receiver, or when an eavesdropper
    searches for meaning, using very much intelligence and effort in the
    process.

    Do such encrypted messages really tell us anything about the process of
    evolution? There, we have a random number generator alright, and we have
    natural selection. But for finding meaning, natural selection isn't as
    patient and powerful as an intelligent cryptographer with his computer.
    In the evolutionary process, the only possible natural source of
    information is the environment. But the extraction of this information
    is extremely slow, probably only a fraction of a bit per generation -
    when any useful mutants are available at all. And if they are, they must
    penetrate the entire population before being fixed. For small selective
    advantages and large populations, the mutation still risks being lost by
    random drift.

    If we compare this process with the huge amount of information in
    today's biosphere, I'm pretty sure 4 billion years is by far too little
    time. It is estimated that about 1000 different protein folds exist in
    living organisms, comprising about 5000 different protein families (Wolf
    Y.I., Grishin N.V., Koonin E.V. "Estimating the number of protein folds
    and families from complete genome data", J.Molec.Biol. 299 (2000),
    897-905). When we compare the prebiotic Earth with today's biosphere as
    a whole, each of these folds, families and individual proteins with
    their functions had to arise at least once somewhere. There is NO
    evidence that all or most of them could be derived from one or a few
    initial sequences through step-by-step mutation, each of the
    intermediates being positively selected, and this within a few billion
    years.

    In my post, I was discussing the evolution of functional proteins in a
    DNA-RNA-protein world, not evolution in an RNA world. I never talked
    about ribozymes (I did mention ribonucleases, but these are protein
    enzymes). I know about the in vitro selection of functional ribozymes,
    but I do not consider these as valid models of evolution at all. They
    just are techniques for finding active ribozymes among as many sequences
    as possible. Of course, mutagenizing steps generate new diversity, but
    the selection procedures most certainly are NOT natural. What we can
    learn from some of these experiments is the frequency of a given
    ribozyme activity among the pool of RNA sequences supplied (which
    usually is just a very tiny sample of all possible sequences, and of
    unknown bias).

    Further problems of the ribozyme work are: (1) Usually artificial
    "evolution" tapers off at activities several orders of magnitude lower
    than natural ribozymes (not to speak of protein enzymes) (cf. Bartel &
    Szostak, Science 261, 1411). (2) We don't yet know whether there ever
    was an RNA world. (3) We don't know whether it would be viable at all.
    (4) We don't know how it could have arisen by natural processes. Leslie
    E. Orgel, one of the pioneers in this field, wrote (Trends Bioch.Sci. 23
    (1998), 491):

    "There are three main contending theories of the prebiotic origin of
    biomonomers [1. strongly reducing primitive atmosphere, 2. meteorites,
    3. deep-sea vents]. No theory is compelling, and none can be rejected
    out of hand ... The situation with regard to the evolution of a
    self-replicating system is less satisfactory; there are at least as many
    suspects, but there are virtually no experimental data ... [There is] a
    very large gap between the complexity of molecules that are readily
    synthesized in simulations of the [suspected] chemistry of the early
    earth and the molecules that are known to form potentially replicating
    informational structures ... Several alternative scenarios might account
    for the self-organization of a self-replicating entity from prebiotic
    organic material, but all of those that are well formulated are based on
    hypothetical chemical syntheses that are problematic ... I have
    neglected important aspects of prebiotic chemistry (e.g. the origin of
    chirality, the organic chemistry of solar bodies other than the earth,
    and the formation of membranes) ... There is no basis in known chemistry
    for the belief that long sequences of reactions can organize
    spontaneously - and every reason to believe that they cannot."

    Against this background, I think it is moot, at present, to speculate
    about the probabilities of evolutionary steps in an RNA world. We DO
    know, on the other hand, how the microevolutionary mechanisms work in
    our world. This is why I chose to deal with this only, rather than with
    ribozymes.

    You are right in pointing out that Yockey revised his probability
    estimate for cytochrome c (now iso-1-cytochrome c) in his book
    "Information theory and molecular biology" (Cambridge: Cambridge
    Univ.Press, 1992). On p.254, he gives the probability of accidentally
    finding any one of the presumably active iso-1-cytochromes c as 2 x
    10^(-44), which is 21 orders of magnitude better than his 1977 estimate
    for cytochrome c. But I think most of this difference is NOT due to new
    experimental evidence (e.g. new sequences), but to his refined
    calculating method, taking into account adjusted probabilities for the
    individual amino acids, to find their "effective number", so it is
    hardly likely that this new estimate will increase any more. As 10^(-44)
    is still much too low to be of any use, I didn't think it worth while to
    try to present his much more complicated new procedure.

    One problem which remains is his assumption that there are no
    interdependencies between the different amino acid occupations within
    the sequence. On p.141, he even cites one observed case where the
    equivalence prediction of his procedure fails. We don't know how many
    more there are. Such interdependencies would reduce the overall
    probability massively.

    Furthermore, Yockey deals with modern cytochromes c (and some artificial
    derivatives) only, which are the result of a few billion years of
    optimization. A "primitive" enzyme may be more easily accessible. The
    only reason I quoted him was that we have NO information about ANY
    "primitive" enzyme.

    The important point is to find cases where natural selection does NOT
    work (yet), because then only we can do meaningful probability
    calculations, which apply only to random walks without selection of
    intermediate steps. The case I considered was the origin of a new
    enzymatic activity which did not exist before (anywhere in the
    biosphere, e.g. a new one of those 1000 folds, and using wildly
    over-optimistic assumptions). As soon as a minimal activity has arisen,
    natural selection can attack and speed up evolution by unknown amounts.
    This is another reason why the artificial ribozyme selection experiments
    are irrelevant in this connection.

    By the way, I would still be very interested to hear any comments about
    the model I calculated, from you, Glenn, or anyone else!

    In both of the cases you quote, an initial catalytic activity of the
    type selected for was present initially (gamma-thiophosphate transfer in
    Lorsch J.R., Szostak J.W., Nature 371 (1994), 31, and
    oligoribonucleotide linkage in Bartel D.P., Szostak J.W., Science 261
    (1993), 1411), and the same applies, as far as I know, to all other in
    vitro ribozyme selection experiments done to date.

    Thus, on both counts, random-path mutagenization to generate a
    previously non-existing activity and natural vs. intelligent selection,
    in vitro ribozyme selection experiments are NOT valid models of the
    crucial steps in darwinian evolution, and the artificial ribozyme
    figures of 10^(-16) or 10^(-13) are irrelevant. The apocryphal joke
    about a horse's teeth is therefore quite inappropriate. We do NOT
    dispose af ANY experimental or observational data about these critical
    steps which would indicate whether macroevolution by natural means alone
    is plausible or not - even quite apart from the origin of life itself.

    .............

    Date: Wed, 27 Sep 2000 21:03:58 +0200 (ASA-digest V1 #1812)

    > [snip]

    You keep misunderstanding what I argued. There are (at least) five
    different types of search processes that have surfaced in our
    discussion:

    (a) search for a meaningful letter sequence among random ones,
    (b) artificial selection of a functional ribozyme from a collection of
    random RNA sequences,
    (c) evolution of a functional ribozyme in RNA world organisms,
    (d) evolution of a protein by mutation of the DNA and natural selection
    of the protein,
    (e) a random DNA mutational walk finding a minimally active protein.

    I fully agree with you that both (a) and (b) are relatively easy, and
    certainly successfully doable (although you may be overestimating the
    fraction of letter sequences representing a recognizable meaning - but I
    don't know). These are the only two types you have been dealing with up
    to now. As we don't know anything about the feasibility of an RNA
    world, it is too uncertain to speculate about the chances for success of
    (c). But suppose there was a viable RNA world, I assume (c) might not
    have been much more difficult than (b) - apart from needing more time.
    So we may also agree on (c). With (d), there is an additional layer of
    complexity between the mutable genotype (DNA) and the selectable
    phenotype (protein), namely translation using a triplet code and a 64:21
    code table. So, numerical estimates derived from (a) or (b) cannot be
    applied immediately. In (a) and (b) each individual string or molecule
    has to be considered an "organism", while in (d), an organism is very
    much more complex, and consequently, there usually are very much fewer
    of them in a population capable of exchanging information. But we know
    from experiments that the process, microevolution, works. As expected,
    it is much slower than (b), and its progress usually levels off quite
    rapidly, because the starting enzymes we can work with are already
    pretty well optimized for their job. So, I don't hesitate to concede
    that (d) also is workable and has been going on for the past 3.8 billion
    years.

    Where we part company, for the moment, is with case (e), which you have
    never considered in our discussion, although my argument focussed on
    this case alone, from the beginning, with the calculated model of the
    probability of a random walk leading to a minimal enzyme activity within
    the geologically available time. What's so different about case (e)? As
    the activity wanted does not yet exist, not even to a minimal degree,
    there is nothing to select, and natural selection of intermediates in
    the mutational random walk just is not possible - by definition. Both in
    (a) and (b), and presumably in (c), some activity or meaning is present
    in the sample collection from the beginning, or can be generated
    relatively easily by mutagenization. In (d), it is present by
    definition, because (e) is its precursor.

    A question which remains, of course, is the amount of semantic
    information at the transition point between (e) and (d). If this is just
    a few bits, my problem doesn't exist. What we can do is to try to define
    an upper and a lower limit for this transition point. Presumably, the
    two limits are very far from each other, but this is the best we can do
    for the moment. For the upper limit we may look at the amount of
    semantic information required for a modern (i.e. a known) enzyme. This
    is what Yockey did. To find a lower limit, we may estimate how much
    semantic (specified) information can be generated in a random walk and
    how much time this would take. And that's exactly what I tried to
    present for discussion in my first post. But you dismissed my
    (tentative) conclusion out of hand, without discussing it, by referring
    to cases (a) and (b), which cannot be compared with it at all.

    > [snip]

    All this is just Shannon information. For a string of length L and 4
    nucleotides, the maximum amount of information corresponds to 4^L
    possibilities. This may be called information potential. But none of
    this tell us anything about usable or semantic information or meaning in
    the sense of specification of biological function. Mutations add nothing
    to the semantic information until you test them by the environment.

    > [snip]

    Your calculation omits some very crucial details about how an organism
    functions and how the biosphere communicates. Before you apply natural
    selection, you have no semantic or functional information whatever. Your
    string of a huge amount of Shannon information (which equals amount of
    randomness or entropy) is nothing but raw material for selection, bit by
    bit. First you need a functioning organism coded by the string (how do
    you get that?), then you can start testing each of the other bits
    against the environment in which this organism lives - a rather slow
    process. Furthermore, it's no use having all these bits randomly
    distributed in 10 million bags (species), or even further spread out
    among the individuals of a species. Biology only works if the right
    information is in the right place at the right time. Each individual
    must have all the information it requires. That will slow down the
    process tremendously. For each bit of information, you must consider
    that it can be input into the biosphere almost anywhere on earth. One
    bit improves cytochrome c in a fish on an Australian shelf, the next one
    improves a kinase in a worm in Canadian soil, the next one improves an
    ATPase in a heterotrophic bacterium 1 km below the surface in a Siberian
    rock, etc. This may help if each of the functionalities needed is
    already in place in each organism and is just made a little bit better.
    To make use of the improvements, the other organisms of the same species
    would have to trade their genes among themselves, which is not a matter
    of seconds, nor even of a few years. And if other species should profit,
    the trade between species or even higher taxa is much slower. But, most
    importantly, how about the origin of new functionalities by process (e)?
    This last factor might easily transcend any estimate for process (d) by
    a transastronomical magnitude.

    > [snip]

    No, you misunderstood. You may want to read the Wolf et al. paper. Their
    1000 protein folds don't concern the problem of folding specific
    proteins into their native configurations. Different proteins whose
    sequences are somewhat similar and which have somewhat similar functions
    are grouped into protein families and these into less similar
    superfamilies. Different superfamilies which, despite unrecognizable
    sequence similarity fold into the (almost) same 3-dimensional structure
    (or "fold") belong to the same "fold". And of these folds, there are an
    estimated 1000. How each individual sequence folds into its own specific
    native conformation when exiting from the ribosome is an entirely
    different question. So I'll just snip out your comments on this.

    > [snip]

    This fits in very nicely with Yockey's cytochrome c estimate. Now, using
    his "effective number of amino acids" 17.621, we get 17.621^92 = 4.3 x
    10^114 possible sequences, and the probability of finding any one of the
    10^57 [lambda] repressor sequences is 0.23 x 10^(-57), rather low!

    > [snip]

    > In other words, there are lots and lots of proteins which will perform the
    > function they studied also. Why is this never really raised and discussed by
    > the anti-evolutionists?

    At least for the last 20 years, this has been taken into consideration
    by critics of evolution (e.g. in my papers at the 1988 Tacoma, WA,
    conference about Sources of Information Content in DNA, and in PSCF 44
    (June 1992), 80). But nevertheless, even with this caveat, asking
    questions about the feasibility of evolution is not accepted in the
    established big journals (in the early 80's, I tried J. of theoretical
    Biology, Nature, Origins of Life, Philosophy of Science, and a German
    journal, all in vain). It is not politically correct to question the
    possibility of evolution. The editors' justifications of refusal were
    quite evasive. As you see, even the huge numbers of possibly active
    sequences are by far not sufficiently huge.

    > [snip]

    > absolutely conserved. The results reveal that high level of
    >degeneracy in the
    > information that specifies a particular protein fold."~John F. Reidhaar-Olson
    > and Robert T. Sauer, "Functionally Acceptable Substitutions in Two [alpha]-
    > helical Regions of [lambda] Repressor," Proteins: Structure, Function, and
    > Genetics, 7:315, 1990. p. 306

    These artificial mutations were targeted intelligently to specific small
    sequence regions to be tested, which makes it practical to recover
    biologically active mutants. Thus, this is not an experimental
    simulation of darwinian evolution. If you want to use these results for
    probability estimates, you have to factor this in.

    > [snip]

    > And before you say that there is an invariant region that must be as it is in
    > order to assure protein function, have you ruled out that other sequences in
    > other protein folded structures can't perform the same thing?

    The sequences of the same fold are already taken into consideration in
    the 10^57 sequences. Whether there are sequences of different folds with
    the same activity is not known. If I remember correctly, cases of
    different folds having the same activity are extremely rare, if they
    exist at all.

    > [snip]

    What I meant with "unknown bias" is this: the starting pool of RNAs was
    certainly about random (within the limits of biochemical precision), but
    this was only a minute fraction of all possible sequences. Whatever is
    contained therein has a greater chance of being selected than sequences
    not in the starting pool, which just might, but need not, be formed by
    later mutagenesis. And Lorsch & Szostak (Nature 371 (1994), 31), for
    instance, indicate that their starting pool already contained the ATP
    binding site required, "which greatly increased the odds of finding
    catalytically active sequences". Furthermore, they suggest it would be
    better to mix, match and modify small functional domains.

    > [snip]

    > > Further problems of the ribozyme work are: (1) Usually artificial
    > > "evolution" tapers off at activities several orders of magnitude lower
    > > than natural ribozymes (not to speak of protein enzymes) (cf. Bartel &
    > > Szostak, Science 261, 1411). (2) We don't yet know whether there ever
    > > was an RNA world. (3) We don't know whether it would be viable at all.
    > > (4) We don't know how it could have arisen by natural processes. Leslie
    > > E. Orgel, one of the pioneers in this field, wrote (Trends Bioch.Sci. 23
    > > (1998), 491):
    >
    > All arguments from ignorance and all arguments that we will never know
    > therefore we can beleive what we want. Is there anything positive
    >that you can
    > offer from your point of view about what data we should observe in
    >some future
    > experiment that would prove that evolution is incompatible with the evidence.
    > By this, I don't mean the other guy's failure. I want to see if you have
    > anything you can predict that if found would be amazing and support your view
    > that randomness plays no role in living systems.

    The don't-knows are Orgel's! (you clipped out his very relevant comments
    I quoted.)You don't want to claim he hasn't done anything worth while,
    during several decades of work, to solve these questions, do you? It's
    not just one "guy's failure", but the failure of a whole field of
    research, in ALL research groups having had a try at it. Orgel is one of
    the leaders in the field.

    > [snip]

    .............

    Date: Mon, 02 Oct 2000 20:18:36 +0200 (ASA-digest V1 #1818)

    > [snip]

    As a basis for discussion, I repeat the definition of the 5 different
    cases:
    > > (a) search for a meaningful letter sequence among random ones,
    > > (b) artificial selection of a functional ribozyme from a collection of
    > > random RNA sequences,
    > > (c) evolution of a functional ribozyme in RNA world organisms,
    > > (d) evolution of a protein by mutation of the DNA and natural selection
    > > of the protein,
    > > (e) a random DNA mutational walk finding a minimally active protein.

    The problem we keep running into is that you assume that (a) and (b) are
    representative for (d) and (e), which I contest. I group the points
    discussed under different headings, A **** etc.:

    A **** Is it necessary to distinguish (a) and (b) from (d) and (e)?

    > I raised that only as a response to your contention that proteins wouldn't
    > behave as does an RNA. I think the evidence says that they do.

    They don't: a nucleotide is worth 2 bits, an amino acid about 4.3 bits
    which can only be selected as a whole. This may not amount to much
    difference if each mutational step is selected individually, but
    whenever you have intermediates without functional improvement, the
    probability factors are multiplied at each step. RNA can be made by
    "organisms" consisting of 1 RNA molecule each, in a soup containing RNA
    polymerase and 4 nucleotide triphosphates, whereas a selection system
    doing translation of DNA (on which mutation works) across RNA into
    protein (on which selection works) requires a bacterium. You may
    mutagenize RNA at rates of 10^(-4), perhaps also at 10^(-3) per
    nucleotide and generation, but a bacterium will hardly survive such
    treatments (the usual, i.e. naturally optimized, mutation rate is
    10^(-8)). This rate also multiplies in each time a step leads to an
    unselected intermediate.

    > [snip]

    That different sequences of the same protein family (having recognizable
    sequence similarities) often have the same function (but in different
    organisms or environments!) is clear. The experimental evidence for
    different folds having the same function, however, is very meager if
    they occur at all (I don't know of any example, although it might be
    feasible occasionally).

    > > This is what Yockey did. To find a lower limit, we may estimate how much
    > > semantic (specified) information can be generated in a random walk and
    > > how much time this would take. And that's exactly what I tried to
    > > present for discussion in my first post. But you dismissed my
    > > (tentative) conclusion out of hand, without discussing it, by referring
    > > to cases (a) and (b), which cannot be compared with it at all.
    >
    > It ignores the possibility I discuss above about different families of
    > solutions. With the RNA experiments, we have already seen the same experiment
    > run twice yeilding totally different sequences that perform the same function
    > exactly as I illustrated in the sentences above.

    RNAs aren't proteins, although both can be specified by DNA. And
    sentences can be compared even less with proteins. They are analogous
    because sentences, RNA, and proteins all may contain coded information,
    but an analogy may not be used to transfer ALL details. Christ being a
    vine doesn't mean he is literally rooted in the ground.

    > [snip]

    B **** What is the frequency of active RNA's in ribozyme selection (b)?

    > The question is how efficient is nature at finding solutions.
    > The experiments with biopolymers that I have cited clearly show that
    > functionality occurs at a rate of 10^-13 or so. In the case of one
    >of Joyce's
    > RNAs the classical probability argument would say that he had
    >something like a
    > 1 chance in 10^236 of finding a useful sequence. But Joyce has been showing
    > that he can find functionality in a vat of 10^13 ribozymes. Surely that must
    > cause the anti-evolutionist pause because at that rate, there are
    >10^223 or so
    > different sequences that will perform a given function. I really fail to see
    > how someone can not see the implication of this except for theological
    > reasons.

    To which paper are you referring? We would have to look at the details.
    Exactly the opposite conclusion was drawn in C.Wilson, J.W.Szostak,
    Nature 374 (1995), 777: "A pool of 5 x 10^14 different random sequence
    RNAs was generated... On average, any given 28-nucleotide sequence has a
    50% probability of being represented... Remarkably, a single sequence
    accounted for more than 90% of the selected pool... This result
    indicates that there are relatively few solutions to the problem of
    binding biotin." The probability of accidentally hitting on a functional
    combination composed of L nucleotides is 4^L, no matter how large N, the
    length of the randomized sequence is. Your conclusion that with N=392
    (10^236 different sequences), finding one active sequence among 10^13
    (L=22) implies that there are 10^236/10^13 = 10^223 active sequences of
    length 392 is formally correct but completely irrelevant, as the
    392-22=370 other nucleotide positions add nothing at all to the
    functionality. If L=370, instead, a completely different overall
    probability results. Your insistence on the 10^13 to 10^14 figure is
    entirely arbitrary. That this same figure keeps popping up in different
    experiments may just mean that this amount of RNA is practical to work
    with. Even in RNA selection, probabilities depend very much on the
    length of the RNA sequence selected, WHICH function is being selected,
    as well as other details. So you cannot generalize. And especially, you
    cannot draw conclusions regarding natural selection in a DNA-to-protein
    organism from results of artificial RNA selection.

    C **** In what sense is meaning compatible with randomness?

    > > I fully agree with you that both (a) and (b) are relatively easy, and
    > > certainly successfully doable (although you may be overestimating the
    > > fraction of letter sequences representing a recognizable meaning - but I
    > > don't know). These are the only two types you have been dealing with up
    > > to now. As we don't know anything about the feasibility of an RNA
    > > world, it is too uncertain to speculate about the chances for success of
    > > (c).
    >
    > As I have said at least twice before, I am not discussing the RNA
    >world. I am
    > merely pointing out that the classical anti-evolutionary position
    >which claims
    > (erroneously) that randomness is incompatible with meaning or specificity is
    > clearly false.

    Randomness, entropy, Shannon information deal with statistical
    properties of sequences. From the sequence alone, it is impossible to
    say whether it has meaning, specificity, biological functionality. This
    must be tested in a replicating system or organism. Randomness does NOT
    generate meaning, we need selection to recognize meaning. If we have a
    mutational path consisting of one or more steps, AND none of the
    intermediate mutants (for paths of >1 steps) represents an improvement
    on the wild type (starting sequence), the increase in meaning or
    functional information corresponds to the improvement observed in the
    final mutant of the path with respect to the wild type. Where does this
    information increment come from? From the information contained in the
    environment? Did it emerge accidentally? From God's guidance? It's
    impossible to be sure as far as science is concerned. All we can do is
    calculate the probability of the random walk mutational path; if it is
    something like 10^(-13) or larger, we hardly care. If it's 10^(-130),
    would you like to say there is no problem about randomness generating
    meaning?!

    > [snip]

    D **** Is darwinian evolution (d) faithfully modelled by ribozyme
    selection (b)?

    > > In the evolutionary process, the only possible natural source of
    > > information is the environment. But the extraction of this information
    > > is extremely slow, probably only a fraction of a bit per generation -
    > > when any useful mutants are available at all. And if they are, they must
    > > penetrate the entire population before being fixed. For small selective
    > > advantages and large populations, the mutation still risks being lost by
    > > random drift.
    >
    > Having looked at informational flow calculations for the genome, like those
    > Spetner published in Nature in 1964, I am not at all impressed with his
    > calculations. There is most assuredly more than 1 bit of
    >information generated
    > per generation. This is especially true in long sequences in which many
    > mutations occur during a generation.

    How do you know? Each intermediate organism must be viable in order to
    contribute to the evolution of its genome. In bacterial evolution
    experiments you sometimes find single-step mutants being selected, but
    double-step mutants through a non-selected intermediate have not been
    documented, to my knowledge. With RNA, viability in a non-selected state
    is not an issue. Multiple mutations in the same RNA molecule between
    selections (in vitro) are easily possible, but whether they are in the
    DNA coding for a bacterium has not been demonstrated. It is just
    assumed.

    > > Furthermore, it's no use having all these bits randomly
    > > distributed in 10 million bags (species), or even further spread out
    > > among the individuals of a species. Biology only works if the right
    > > information is in the right place at the right time. Each individual
    > > must have all the information it requires. That will slow down the
    > > process tremendously. For each bit of information, you must consider
    > > that it can be input into the biosphere almost anywhere on earth. One
    > > bit improves cytochrome c in a fish on an Australian shelf, the next one
    > > improves a kinase in a worm in Canadian soil, the next one improves an
    > > ATPase in a heterotrophic bacterium 1 km below the surface in a Siberian
    > > rock, etc. This may help if each of the functionalities needed is
    > > already in place in each organism and is just made a little bit better.
    > > To make use of the improvements, the other organisms of the same species
    > > would have to trade their genes among themselves, which is not a matter
    > > of seconds, nor even of a few years. And if other species should profit,
    > > the trade between species or even higher taxa is much slower.
    >
    > First off, bacteria have sex with other bacteria of different species all the
    > time. There is a blizzard of genetic material that flows through the
    > biological world, trading genomes and genes. (see La Ronde, Scientific
    > American June 1994 P. 28-29

    This reference is incorrect: I couldn't find it. I am not disputing that
    genes are traded rapidly among bacteria. What I emphasized is that a NEW
    mutant gene representing an improvement, which first is present as only
    ONE molecule in the biosphere, has to spread to all individuals and to
    all species which are to profit from it. We are talking of thousands of
    positive mutations required to build up each of thousands of efficient
    proteins, the set of which is basically the same today in virtually all
    species. Your simple calculation is not realistic, because you assume
    that the moment a helpful mutation is available anywhere on earth it can
    be used immediately as a basis for further improvements anywhere else on
    earth.

    > > A question which remains, of course, is the amount of semantic
    > > information at the transition point between (e) and (d). If this is just
    > > a few bits, my problem doesn't exist. What we can do is to try to define
    > > an upper and a lower limit for this transition point. Presumably, the
    > > two limits are very far from each other, but this is the best we can do
    > > for the moment. For the upper limit we may look at the amount of
    > > semantic information required for a modern (i.e. a known) enzyme.
    >
    > Oxytocin has only 8 amino acids. Several others have that also. An enzyme
    > does not a priori have to have a long sequence.

    Oxytocin is a biologically active peptide, not an enzyme. There are lots
    of small, but biologically active things, down to ions like Ca++. Active
    peptides usually aren't even translated from an mRNA (I'm not sure about
    oxytocin), but synthesized by rather large enzyme complexes. Enzymes and
    other biologically active proteins have sizes of usually a few hundred,
    and up to a few thousand amino acids. They often are composed of domains
    with their own tertiary structure, where domains are usually around 100
    amino acids. As an enzyme has to fold into a more or less fixed steric
    structure, in order to very specifically hold one or more substrates and
    catalyze a very specific reaction, it cannot be too short.

    > So tell me what exactly is your definition of 'primitive' enzymes? How would
    > you recognize one? What objective criteria would you use? Is Oxytocin
    > primitive because it is short? Or are the enzymes of cyanobacteria primitive
    > because cyanobacteria are so old?

    A "primitive" enzyme (or enzyme of "minimal activity") would be just
    above the transition from process (e) to (d). Such transitions would
    happen anytime during the history of life, whenever a basically novel
    activity was emerging, from the origin of life to the origin of humans.
    If we had such an enzyme, we would detect that it has a small activity,
    but we still would not know if a precursor was already active (apart
    from a probably unpracticable exhaustive mutant search). To find out by
    what mutational random-walk it originated would probably be hard.

    E **** Some misunderstandings in the scientific realm:

    > > > "Extrapolating to the rest of the protein indicates that there should be
    > > > about 10^57 different allowed sequences for the entire 92-residue domain.
    > >
    > > This fits in very nicely with Yockey's cytochrome c estimate. Now, using
    > > his "effective number of amino acids" 17.621, we get 17.621^92 = 4.3 x
    > > 10^114 possible sequences, and the probability of finding any one of the
    > > 10^57 [lambda] repressor sequences is 0.23 x 10^(-57), rather low!
    >
    > And once again, it ignores the data found by Szostak and colleagues that a
    > repeat of the same selection experiment yields vastly different sequences to
    > solve the same biological problem.

    You yourself brought in this example (Reidhaar-Olson & Sauer, 1990), in
    order to refute Yockey's result. Szostak's ribozyme results are a
    different case.

    > [snip]

    > > Whatever is
    > > contained therein has a greater chance of being selected than sequences
    > > not in the starting pool, which just might, but need not, be formed by
    > > later mutagenesis. And Lorsch & Szostak (Nature 371 (1994), 31), for
    > > instance, indicate that their starting pool already contained the ATP
    > > binding site required, "which greatly increased the odds of finding
    > > catalytically active sequences". Furthermore, they suggest it would be
    > > better to mix, match and modify small functional domains.
    >
    > The ATP is irrelevant as far as the frequency of the functionality is
    > concerned.

    You are contradicting Lorsch & Szostak concerning their own work!

    > > The don't-knows are Orgel's! (you clipped out his very relevant comments
    > > I quoted.) You don't want to claim he hasn't done anything worth while,
    > > during several decades of work, to solve these questions, do you? It's
    > > not just one "guy's failure", but the failure of a whole field of
    > > research, in ALL research groups having had a try at it. Orgel is one of
    > > the leaders in the field.
    >
    > So we base our position upon other people's failure. Most scientific theories
    > are based upon positive experimental support, not other people's
    >failure. This
    > is the wrong approach for Christians to take. If we depend upon failure, what
    > happens when they finally succeed?

    If the ribozyme selection results would constitute any positive
    experimental support for the early evolution of life, do you think Orgel
    would not see it?

    > [snip]

    F **** Some misunderstandings in the theological/philosophical realm:

    > > Your calculation omits some very crucial details about how an organism
    > > functions and how the biosphere communicates. Before you apply natural
    > > selection, you have no semantic or functional information whatever. Your
    > > string of a huge amount of Shannon information (which equals amount of
    > > randomness or entropy) is nothing but raw material for selection, bit by
    > > bit. First you need a functioning organism coded by the string (how do
    > > you get that?), then you can start testing each of the other bits
    > > against the environment in which this organism lives - a rather slow
    > > process.
    >
    > I think you keep trying to mix the problem here. I started this thread merely
    > by pointing out that randomness isn't incompatible with semantical meaning. I
    > think I proved this. Now you want to change it to the origin of
    >life where you
    > think you have a better defense for your case. First off, we don't need a
    > functioning organism to to have selection. We merely need
    >reproduction. Now I
    > will freely admit I don't know how the raw molecules would
    >reproduce and right
    > now no one else does either. However, to claim that our lack of knowledge is
    > equivalent to a law of nature seems to rest your case on our continued
    > ignorance. History has shown over and over again that that is a weak place to
    > rest one's case.

    No, I want to focus on case (e), the initial, random-walk search for a
    minimal enzymatic activity in a fully functional DNA-RNA-protein
    organism in which darwinian evolution works. I just have to constantly
    fend off all your linguistic (a) and in vitro ribozyme (b)
    probabilities. Not because I don't like them, but because there really
    are crucial differences between the cases (a) to (e), see at the
    beginning of this post. I never contested that (Shannon) randomness is
    compatible with semantical meaning (phenotypically tested). We need a
    functioning organism for cases (d) and (e), just reproduction for (b).

    > [snip]
    ................................................................................

    -- 
    Dr. Peter Ruest, CH-3148 Lanzenhaeusern, Switzerland
    <pruest@dplanet.ch> - Biochemistry - Creation and evolution
    "..the work which God created to evolve it" (Genesis 2:3)
    



    This archive was generated by hypermail 2b29 : Sat May 18 2002 - 21:36:25 EDT