Re: Random origin of biological information

Date: Sun Sep 24 2000 - 11:17:23 EDT

  • Next message: "Re: What happens when the oil runs out."

    Hi Peter,

    On Sun Sep 24 02:20:20 2000, wrote:

    > Glenn:
    > You are right, there IS randomness in all these 21-letter sequences, no
    > matter whether they were generated by encrypting a meaningful phrase or
    > by running a random number generator, and ANY meaningful 21-letter
    > message can be generated from ANY of the 26^21 possible sequences if the
    > right key is found.
    > But this fact does NOT imply that meaning or semantics can arise
    > spontaneously by random processes, without some intelligent input of
    > information. Either this happens when the sender encrypts his message
    > and gives the key to the designated receiver, or when an eavesdropper
    > searches for meaning, using very much intelligence and effort in the
    > process.
    > Do such encrypted messages really tell us anything about the process of
    > evolution? There, we have a random number generator alright, and we have
    > natural selection. But for finding meaning, natural selection isn't as
    > patient and powerful as an intelligent cryptographer with his computer.

    Once again, you are ignoring the fact that when experimenters make random
    strings of RNA and then search for novel functionality, they find strings to
    perform the task with a frequency of 10^-14 or so. While they are not all
    perfectly efficient they do their task. When it comes to the comparision with
    language, I once calculated that there are over 330,000 ways to convey the
    concept that if you pick your nose you will get warts. I ceased counting
    because I got tired, not because I ran out of ideas. All of these were with
    sequences of 28 letters or less. If you add mispellings, which don't destroy
    meaning (a technique often used in cryptography to foil frequency analysis) I
    could add a thousand ways to mispell each sequence yet still retain its
    meaning. Such mis-spellings would look like: waarts ar spred bi playcing thi
    fingur in thi noz or wurtz arre sbred by plaising da feenger en a nos. THe
    meaning is still there so the sequence performs its function. Thus there are at
    least 330 million sequences for just this concept.

    For the sake of argument, let us suppose that there are 300,000 different ways
    to express the same concept in 21 letters or less. And lets assume that each
    can be misspelled without loss of meaning in 1000 different ways (which may be
    a vast underestimate. And then assume that there are a trillion different
    concepts which have the same traits as what we see. (The human language is so
    flexible that a trillion concepts is not impossible at all.) Then we have 10^21
    different sequences which will perform a useful function. How does that
    compare to the number of possible sequences? with 21 letters there are 26^21
    hwihc is 10^29 so we estimate that useful sequences are found in the range of
    10^-8 or one in 10 million. Is that too low a rate for random processes to
    stumble upon a meaningful sequence? No. At one per second (and my computer
    can do it quicker than this, we should find a meaningful sentence on average
    every 3.2 years. That hardly seems out of the realm of possibility. And it
    certainly is not a rate that would deter evolution over millions of years.

    > If we compare this process with the huge amount of information in
    > today's biosphere, I'm pretty sure 4 billion years is by far too little
    > time.

    Do you have a calculation or is this merely an emotional feeling? Upon what do
    you base your estimate of the total information on earth today? I would suggest
    the following. We know that microbes vastly outnumber us and indeed modern
    research is showing that the vast majority of living matter on earth may
    actually be contained in the rocks below our feet. Let us assume that there
    have been 10 million species on earth and we will give them each a 3 billion
    long nucleotide genome (a bit generous). Yockey, (Molecular Evolution and
    INformation Theory, p. 377-380) points out that there are a maximum of 6 bits
    of information per codon. Thus, we have 20 billion bits of information max in
    the genome of an species and thus there are 2 x 10^17 bits of information in
    the biosphere today. I have seen suggestions that there might have been as many
    as a billion different species over geologic time, so multiply the above by
    100. I will assume (but justify below) that the small addition of bits from the
    individuals of a species is too small to worry about (see below) Is there time
    to generate that info? Of course there is. There is more than enough time. To
    show it I need to take a diversion into info theory.
       Consider the sequence


    That represents a max of 24 bits as we discussed above from Yockey. If we
    allow polyploidy to occur, and we copy this and attach it to itself, we have
    the sequence

    Which now represents an increase of one bit of information. Why one bit?
    becuase the sequence is compressible. It is ordered. Copying itself doesn't
    add to the informational content. Only when you mutate it do you add
    information to the system. (REMEMBER: Information is not that ill-defined word
    we use in English and equivocate to the english word 'meaning'. Information is
    defined by a mathematical equation and has nothing to do with 'meaning' or
    specificity.) Mutations add information to the system because they make the
    sequence LESS compressible.

    Now, because of this fact about copying adding only 1 bit, you get 1 bit of
    information for every clone on earth--plus 20 billion for the first species.
    This is why the additional one bit of information from each individual organism
    isn't enough to worry about.

    So, if the earth has 10^19 bits of information how rapidly does that have to
    develop? 100 bit per second as 10^19 is 100 times the number of seconds in 4.5
    billion years. This is not a rapid rate.

    It is estimated that about 1000 different protein folds exist in
    > living organisms, comprising about 5000 different protein families (Wolf
    > Y.I., Grishin N.V., Koonin E.V. "Estimating the number of protein folds
    > and families from complete genome data", J.Molec.Biol. 299 (2000),
    > 897-905). When we compare the prebiotic Earth with today's biosphere as
    > a whole, each of these folds, families and individual proteins with
    > their functions had to arise at least once somewhere. There is NO
    > evidence that all or most of them could be derived from one or a few
    > initial sequences through step-by-step mutation, each of the
    > intermediates being positively selected, and this within a few billion
    > years.

    If you are going to say that protein folding is too complex to have just
    happened, I would suggest that you take a look at the following:

    "Clearly, a protein cannot sample all of its conformations (e.g., 3100 10^48
    for a 100 residue protein) on an in vivo folding timescale (<1 s). To
    investigate how the conformational dynamics of a protein can accommodate
    subsecond folding time scales, we introduce the concept of the native topomer,
    which is the set of all structures similar to the native structure (obtainable
    from the native structure through local backbone coordinate transformations
    that do not disrupt the covalent bonding of the peptide backbone). We have
    developed a computational procedure for estimating the number of distinct
    topomers required to span all conformations (compact and semicompact) for a
    polypeptide of a given length. For 100 residues, we find 3 10^7 distinct
    topomers. Based on the distance calculated between different topomers, we
    estimate that a 100-residue polypeptide diffusively samples one topomer every 3
    ns. Hence, a 100-residue protein can find its native topomer by random sampling
    in just 100 ms. These results suggest that subsecond folding of modest-sized,
    single-domain proteins can be accomplished by a two-stage process of (i)
    topomer diffusion: random, diffusive sampling of the 3 10^7 distinct topomers
    to find the native topomer (0.1 s), followed by (ii) intratopomer ordering:
    nonrandom, local conformational rearrangements within the native topomer to
    settle into the precise native state." Derek A. Debe, Matt J. Carlson, and
    William A. Goddard III, "The topomer-sampling model of protein folding" PNAS,
    Vol. 96, Issue 6, 2596-2601, March 16, 1999, p. 2596

    "Or results suggest that an average sized protein domain can find its native
    topology without any mechanisms to simplify the conformational search. Thus the
    topomer-sampling model is fundamentally different from folding models that
    insist that regions of correctly folded structure form during the early stages
    of protein folding, before a structure with the native topology has been
    sampled." Derek A. Debe, Matt J. Carlson, and William A. Goddard III, "The
    topomer-sampling model of protein folding" PNAS, Vol. 96, Issue 6, 2596-2601,
    March 16, 1999, p. 2599 "Barron and coworkers have recently used Raman optical
    activity experiments to show that residues in disordered regions in molten
    globule states 'flicker' between the allowed regions of the Ramachandran plot
    at rates of ~10^12 s^-1." Derek A. Debe, Matt J. Carlson, and William A.
    Goddard III, "The topomer-sampling model of protein folding" PNAS, Vol. 96,
    Issue 6, 2596-2601, March 16, 1999, p. 2600

    "We will show that as few as N/24 interresidue restraints reduce the number of
    topologies sufficiently so that a simple residue burial score can identify the
    native topology in a very small set of candidates (typically <5)." Derek A.
    Debe et al, "Protein Fold Determination from Sparse Distance Restraints: The
    Restrained Generic Protein Direct Monte Carlo Method," J. Phys. Chem. B. 103
    (1999):3001-3008, p. 3001

    "We present the generate-and select hierarchy for tertiary protein structure
    prediction. The foundation of this hierarchy is the Restrained Generic Protein
    (RGP) Direct Monte Carlo method. the RGP method is a highly efficient off-
    lattice residue buildup procedure that can quickly generate the complete set of
    topologies that satisfy a very small number of interresidue distance
    restraints. For three restraints uniformly distributed in a 72-residue protein,
    we demonstrate that the size of this set is ~`10^4." Derek A. Debe et al,
    "Protein Fold Determination from Sparse Distance Restraints: The Restrained
    Generic Protein Direct Monte Carlo Method," J. Phys. Chem. B. 103(1999):3001-
    3008, p. 3001

    Protein folding is much simpler than we have heretofore thought. And as usual,
    it was the evolutionists who actually went out and studies the issue. The anti-
    evolutionists were content to thow stones rather than do experiments.

    > In my post, I was discussing the evolution of functional proteins in a
    > DNA-RNA-protein world, not evolution in an RNA world. I never talked
    > about ribozymes (I did mention ribonucleases, but these are protein
    > enzymes). I know about the in vitro selection of functional ribozymes,
    > but I do not consider these as valid models of evolution at all. They
    > just are techniques for finding active ribozymes among as many sequences
    > as possible.

    It is always a bit amazing to me how no experiment is every considered to be
    good evidence of evolution by those who don't like evolution. Why do you think
    that is? The claim that useful variants of long biopolymers are too rare to be
    found is one that is claimed over and over and over again by the anti-
    evolutionary crowd, yet when one points them to an example where usefulness is
    found at a relatively high level of probability, the claim is made that it
    isn't evidence at all. It most assuredly is evidence that the rates of useful
    biopolymers has been vastly underestimated by the anti-evolutionary crowd if
    nothing else.

    But if you want to talk about proteins, as you indicated above consider this:

            "Examination of over 30 residues in the N-terminal domain of [lambda]
    repressor reveals that a surprisingly large number of positions are quite low
    in informational content. Nearly half of the positions examined in helix 1 and
    helix 5 will accept nine or more different residues, and only a few positions
    are absolutely conserved. THis suggests that there is a high level of
    degeneracy in the folding process; that is, there are many possible seqeunces
    that will specify a protein that resembles the N-terminal domain of [lambda]
    repressor. Moreover, if the criterion for neutral mutations were changed from
    the present requirement of 5-10% activity compared to wild type, to the less
    stringent requirement that the protein simply be folded, the level of
    degeneracy would presumably be even higher." p. 315

    "Extrapolating to the rest of the protein indicates that there should be about
    10^57 different allowed sequences for the entire 92-residue domain. Clearly,
    this is an extraordinarily rough calculation, and we do not intend to suggest
    that we can accurately determine how many sequences would actually adopt a
    structure resempling the N-terminal domain of [lambda] repressor. However, the
    calculation does indicate in a qualitative way the tremendous degeneracy in the
    information that specifies a particular protein fold."~John F. Reidhaar-Olson
    and Robert T. Sauer, "Functionally Acceptable Substitutions in Two [alpha]-
    helical Regions of [lambda] Repressor," Proteins: Structure, Function, and
    Genetics, 7:315, 1990 p. 315

    In other words, there are lots and lots of proteins which will perform the
    function they studied also. Why is this never really raised and discussed by
    the anti-evolutionists? The authors continue

    "A method of targeted random mutagenesis has been used to investigate the
    informational content of 25 residue positions in two [alpha]-helical regions of
    the N-terminal domain of [lambda] repressor. Examination of the functionally
    allowed sequences indicates that there is a wide range in tolerance to amino
    acid substituion at these positions. At positions that are buried in the
    structure, there are severe limitations on the number and type of residues
    allowed. At most surface positions, many different residues and residue types
    are tolerated. However, at several surface positions there is a strong
    preference for hydrophilic amino acids, and at one surface position proline is
    absolutely conserved. The results reveal that high level of degeneracy in the
    information that specifies a particular protein fold."~John F. Reidhaar-Olson
    and Robert T. Sauer, "Functionally Acceptable Substitutions in Two [alpha]-
    helical Regions of [lambda] Repressor," Proteins: Structure, Function, and
    Genetics, 7:315, 1990. p. 306

    Degeneracy equals lots and lots of different proteins to perform the same task.
    And before you say that there is an invariant region that must be as it is in
    order to assure protein function, have you ruled out that other sequences in
    other protein folded structures can't perform the same thing?

    Of course, mutagenizing steps generate new diversity, but
    > the selection procedures most certainly are NOT natural.

    Of course they aren't natural as we have had to speed up the process, or are
    you advocating getting one's Ph.D when one is 2 million years old? To study
    things at the rate they naturally occur would require that long in order to do
    the research. This seems to be a silly suggestion that means that we don't
    have to draw any conclusions until we are 2 million years old. And surprise, we
    won't be able to live that long so we can always claim that we aren't seeing

    What we can
    > learn from some of these experiments is the frequency of a given
    > ribozyme activity among the pool of RNA sequences supplied (which
    > usually is just a very tiny sample of all possible sequences, and of
    > unknown bias).

    Not unknown bias. The ribozymes were made randomly. Randomly means no bias. If
    you have a charge of bias in their experimental procedure, then be specific and
    to the point. Vague charges of bias (more in hope than in evidence) to avoid
    the conclusions required by the data is a poor way of avoiding the issue.

    > Further problems of the ribozyme work are: (1) Usually artificial
    > "evolution" tapers off at activities several orders of magnitude lower
    > than natural ribozymes (not to speak of protein enzymes) (cf. Bartel &
    > Szostak, Science 261, 1411). (2) We don't yet know whether there ever
    > was an RNA world. (3) We don't know whether it would be viable at all.
    > (4) We don't know how it could have arisen by natural processes. Leslie
    > E. Orgel, one of the pioneers in this field, wrote (Trends Bioch.Sci. 23
    > (1998), 491):

    All arguments from ignorance and all arguments that we will never know
    therefore we can beleive what we want. Is there anything positive that you can
    offer from your point of view about what data we should observe in some future
    experiment that would prove that evolution is incompatible with the evidence.
    By this, I don't mean the other guy's failure. I want to see if you have
    anything you can predict that if found would be amazing and support your view
    that randomness plays no role in living systems.

    I am asking that you cease doing what all antievolutionists do, which is stone
    throwing, and actually propose a workable system that can be verified. Can you
    do this?

    > Against this background, I think it is moot, at present, to speculate
    > about the probabilities of evolutionary steps in an RNA world. We DO
    > know, on the other hand, how the microevolutionary mechanisms work in
    > our world. This is why I chose to deal with this only, rather than with
    > ribozymes.

    If you will go back and look at what I said, rather than what you thought I
    said, I never applied the ribozyme data to the RNA world. In fact, in this
    entire thread that last sentence is the first time I have used the term RNA
    world. What I have said all along is that useful sequences are found at a far
    higher probabbility than anti-evolutionists have ever admitted. Is that so
    hard to understand?
    > You are right in pointing out that Yockey revised his probability
    > estimate for cytochrome c (now iso-1-cytochrome c) in his book
    > "Information theory and molecular biology" (Cambridge: Cambridge
    > Univ.Press, 1992). On p.254, he gives the probability of accidentally
    > finding any one of the presumably active iso-1-cytochromes c as 2 x
    > 10^(-44), which is 21 orders of magnitude better than his 1977 estimate
    > for cytochrome c.

    The reason I hit you so hard is that I know that you are in the area of biology
    and write as an apologist. I have grown very tired of apologists who insist on
    using 20, 30 and 40 year old data as if it is dogma and can't be change. It
    shows that we are doing sloppy apologetics by not keeping up in the areas about
    which we write. If you and I were 30 years behind our respective fields of
    employment, I can guarentee you that we would both be unemployed. At least I
    know I would be in the oil industry. If we keep up with our fields for the
    sake of our employment, why don't we keep up when we are working for the Lord???

    > One problem which remains is his assumption that there are no
    > interdependencies between the different amino acid occupations within
    > the sequence. On p.141, he even cites one observed case where the
    > equivalence prediction of his procedure fails. We don't know how many
    > more there are. Such interdependencies would reduce the overall
    > probability massively.
    > Furthermore, Yockey deals with modern cytochromes c (and some artificial
    > derivatives) only, which are the result of a few billion years of
    > optimization. A "primitive" enzyme may be more easily accessible. The
    > only reason I quoted him was that we have NO information about ANY
    > "primitive" enzyme.

    Actually that isn't quite true. We find bits and pieces of enzymes in oil. We
    know certain proteins that appear in oil when sponges evolved, others appear
    when diatoms evolved, others when angiosperms evolved, and still others appear
    in oils generated only after grasses appear. We are not totally blind about
    past proteins.

    > By the way, I would still be very interested to hear any comments about
    > the model I calculated, from you, Glenn, or anyone else!

    I thought did a good job so
    I didn't see any reason to respond redundantly.

    > In both of the cases you quote, an initial catalytic activity of the
    > type selected for was present initially (gamma-thiophosphate transfer in
    > Lorsch J.R., Szostak J.W., Nature 371 (1994), 31, and
    > oligoribonucleotide linkage in Bartel D.P., Szostak J.W., Science 261
    > (1993), 1411), and the same applies, as far as I know, to all other in
    > vitro ribozyme selection experiments done to date.

    It is present because it is found in the vat not because it was introduced by
    the experimenter.

    > Thus, on both counts, random-path mutagenization to generate a
    > previously non-existing activity and natural vs. intelligent selection,
    > in vitro ribozyme selection experiments are NOT valid models of the
    > crucial steps in darwinian evolution, and the artificial ribozyme
    > figures of 10^(-16) or 10^(-13) are irrelevant.

    I think you have misunderstood what the experimenters are doing. They are not
    introducing the solution to the vat.


    This archive was generated by hypermail 2b29 : Sun Sep 24 2000 - 11:17:26 EDT