No one believes proteins are the first form of life.

From: Peter Ruest (
Date: Tue May 21 2002 - 12:55:08 EDT

  • Next message: Dick Fischer: "Re: origins theories (in ASA Sci Ed website)"

    Hi Glenn, for the context:

    > >[PR:]... But remember that
    > >the whole long argument started (04 May 2002 16:46:30 +0200) with my
    > >simple claim that we have to distinguish between:
    > >(I) Maximum information carrying capacity;
    > >(II) Functional information relevant for biological systems.

    Glenn Morton wrote (19 May 2002 09:01:04 -0700):
    > >> Pay attention to the issue at hand. I am saying that ignoring the data I
    > >> posted above is exactly LIKE, ANALOGOUS, SIMILAR to the way the YECs act.
    > >> And indeed, you skipped right by it without any comment.

    I thought I'd better just ignore such an unfair and ridiculous reproach.
    Come on, Glenn, don't just throw anyone criticising evolutionary just-so
    stories into the YEC bin! I have never been in the YEC corner. Even if
    you say "exactly LIKE, ANALOGOUS, SIMILAR", you are being unreasonably
    offensive. I'm not conscious of having ignored any data you presented
    without at least saying why I'm skipping it. But if you want to come
    back to such a supposed case, tell me what it was.

    > >The issue at hand is random evolution of novel protein functionality,
    > >and, in particular, the first minimal functionality of a novel protein,
    > >before natural selection can set in. This has nothing to do with
    > >artificial selection of RNA in vitro, particularly if some of the
    > >functionality selected is already present. It's you who are evading the
    > >issue, not I.
    > This is a rigging of the roulette wheel. What you seem to be doing is ruling
    > out any experimental evidence which can be collected today.

    Nonsense! I started this thread and have the right to come back to my
    original statement of the problem you criticized. I'm gladly ready to
    consider "any experimental evidence which can be collected today" - _if_
    it has any relevance for my statement you criticized. I have rather the
    impression that you have still not understood what I wanted to say.

    > If it is in
    > vitro, you say it has nothing to do with the origin of life and the
    > emergence of useful proteins. If we put it in a metal dish and do the same,
    > or in a rock dish, does it suddenly become ok? I doubt you would accept any
    > experiments with zeolites which might show interesting effects merely
    > because they took place on a university campus. We have no time machine
    > with which to return to 3.8 Gyr and watch the process. So what you do by
    > ruling out the discussion of any experimental evidence is firmly plant your
    > head below the ground so that one can not see observational data.

    Please stop such unfair talk which has nothing to do with what I wrote!
    If you resent my skipping such remarks, just try to better understand
    what I am saying!

    > >> >>So, given that I am mentioning this work for a second time, will you
    > >> >> respond to it's import now?
    > >> >
    > >> >You have not mentioned these papers (if I remember correctly), but
    > >> >similar ones, and I responded in detail. But I may do it again, giving
    > >> >you a new example if you insist. A. Lombardi, et al., "Miniaturized
    > >> >metalloproteins: Application to iron-sulfur proteins", PNAS 97 (2000),
    > >> >11922, attempted to design a minimal redox enzyme, but haven't achieved
    > >> >their goal as yet. Their dimeric undecapeptide can hold an iron atom,
    > >> >but is unstable, being too small to shield off the environmental water.
    > >> >The invariant of their (intelligently designed) construct amounts to at
    > >> >least 5 specific amino acid occupations, which is too much to be
    > >> >attainable by an evolutionary process without selection.
    > >>
    > >> What Lombardi is doing is not at all what Joyce, Szostak and Ellington are
    > >> doing.Lombardi is trying to shrink the proteins down to
    >miniature versions,
    > >> of smaller length.
    > >
    > >This is exactly my point, see above. These miniature proteins are the
    > >ones which may give indications about the origin of semantic or
    > >functional biological information, about which I was talking (case a).
    > We weren't discussing the minimum length proteins can be. We were discussing
    > if other families of proteins could perform a given task than the one we
    > find doing it today. That was the issue. Not how short a protein can be.
    > Tell me how short the sequence for 'be' can be and still have you understand
    > what is meant?

    Right, "if other families of proteins could perform a given task than
    the one we find doing it today" is one of the important questions, not
    the length of the protein. I hope you are talking about "synonymous
    families", as I defined them in my last post, "Polyphyly and the origin
    of life".

    Last time, I paid no attention to your adding "of smaller length". The
    small length of such "miniature proteins" is not at all the critical
    point, either with Lombardi or with me. What is important, instead, is
    the minimal number of specified amino acids (what I called the
    "invariant" above) - just as with Yockey's work. The protein may be
    longer, if the identity of the other amino acids adds nothing to the
    functionality. The question is: at how many positions in a protein do I
    need a particular amino acid (or for less stringent positions, any one
    in a given restricted set of amino acids), in order to get the function
    looked for? This is the "amount of specification" required for the

    In contrast to Yockey, I add the theoretical requirement that this
    sequence is not derivable, by evolution with natural selection, from a
    different one with _less_ specification, but having nevertheless some of
    the function. This is another way of saying that it had to be formed by
    means of a strictly random-walk mutational path. In this way, I hope to
    arrive at an estimate of the amount of information II. I would call such
    a protein a "minimal-functionality protein". This additional
    requirement, by the way, is not necessarily beyond experimental testing.

    > And to assume that the original proteins performed precisely
    > the same task as evolved proteins do today is quite a leap.

    I have never said they did. All I assume for the estimate I am looking
    for is that it is a "minimal-functionality" protein. Subsequently, it
    may have evolved further by means of a normal evolutionary pathway with
    non-negative natural selection at each step. During this evolutionary
    path, its function may have been modified, as well as increased. But
    this further path is no longer easily tractable for a determination of
    information II.

    > >I did not want to deal with improvement of a preexisting functionality
    > >(case b), because there you may just be taking over some "information"
    > >from the environment by means of selection. And I did not doubt there
    > >are some RNA functions (case c) that are not very difficult to find (if
    > >you do have RNA!), just as Joyce, Szostak and others have found, even if
    > >you are looking for a function not yet present in the starting mix.
    > >Again, we have no means of telling whether any information has emerged
    > >de novo. With proteins, there is a way of dealing with semantic
    > >information (II), cf. Yockey's book.
    > All Yockey did was to substitute hydrophobic amino acids for hydrophobic,
    > hydrophilic for hydrophylic in cytochrome c and then count the
    > possibilities. That is not predicting the function. That is merely saying
    > that there are x number of sequences which appear to be able to perform the
    > same task. I think Yockey is correct, or close to it. But it isn't taking a
    > novel sequence and deciding what it does.

    He did not just consider amino acid polarity, but also composition and
    volume. I never said he wanted to predict function. And I never said he
    wanted to find a "minimal-functionality" protein. I cited him for his
    use of what I call functional information (II), as opposed to maximum
    information capacity (I). I do use his way of deriving an amount of
    information (II) from amino acid specificity within an orthologous
    protein family. And I cite him for his estimate of the extremely low
    frequency (2 in 10^44) of functional iso-1-cytochromes c in the
    transastronomical composition space; as he doesn't take into
    consideration amino acid properties other than the three mentioned, nor
    any correlations between different positions, nor any species-specific
    requirements, his estimate likely is still far too high.

    > >With RNA, I know of no similarly
    > >promising way of dealing with functional information, because residue
    > >conservation is much less clearly definable (you have only 4
    > >nucleotides, and there is the additional complication of base pairing).
    > >So, I am looking for examples of case a, but you keep pointing to
    > >examples of case b and/or case c.
    > I disagree here. Yockey is speaking of using pre-existing functionality to
    > predict similarity of function in similar molecules. The importance of what
    > he did was to show that the silly anti-evolutionary argument of past years
    > in which only one sequence is allowed to perform a given function is false.
    > Case c, the RNA shows probability is much much less. And with Case a, the
    > proteins, I posted references to multifunctionality yesterday. That is
    > evidence that proteins will be subject to the same thing.

    Yockey is dealing with a family of orthologous proteins. There is no
    reason to suppose that the members of this family which exist or existed
    did not descend from a common ancestral protein by way of individually
    selected mutations. There is no way of estimating any probabilities
    involved in this process. Notice that Yockey assigns an information
    content (II) to the entire family, not to an individual sequence. It may
    safely be assumed that all precursors of the present cytochromes c
    (cyt.c) back to their most recent common ancestor (MRCA) belong to this
    same family. Therefore this same information content (II) applies to all
    precursors back to the MRCA sequence, as well. All that happened during
    the time since then is not global modifications, but only species-(or
    genus-, etc.)-specific ones which are not taken into consideration in
    Yockey's estimate.

    But what happened before the time of that coalescence in the MRCA? The
    MRCA must have evolved from earlier forms, which probably were simpler
    and less active, back to the minimal-selectable-functionality (MSF)
    cyt.c (about which I wrote in the other post, "Polyphyly and the origin
    of life").

    And before the time of this MSF cyt.c? The emergence of this MSF cyt.c
    is the only process not under natural selection (by definition), so it
    was a random mutational walk through sequence space, whose probability
    can be estimated if the size of the specification for the MSF stage can
    be determined.

    Of course, all this has nothing to do with the idea that there can be
    only one active cyt.c sequence. I wonder where you get that idea from.
    Do you know of anyone ignorant enough to hold it?

    As for the RNA case c, you probably wanted to say that the improbability
    (rather than probability) is much less. This is correct for artificial
    selection systems. And it may even be correct for an initial natural RNA
    world - although we don't know this. But so what? I already explained
    that the RNA world probably cannot be used to estimate functional
    information content (II).

    I dealt in my last post with the irrelevance of the multifunctional

    > And I have to ask a really dumb question here. NO ONE IN THE ORIGIN OF LIFE
    > Consider this from 1991:
    > Sydney Fox's Experiment
    > "By repeatedly heating amino acids and dissolving them in water,
    > he induced them to coagulate into tiny spheres composed of short
    > protein strands.
    > "Fox argued then - and continues to do so - that these
    > 'proteinoids' represent the first cells, but his work has fallen
    > out of favor among many scientists. Once proteinoids are formed,
    > 'that's it,' says Gerald F. Joyce of the Research Institute of
    > Scripps Clinic. "They can't reproduce or evolve."~John Horgan,
    > "In the Beginning", Scientific American, February, 1991, p. 118-
    > 119.

    Please don't link me with Sydney Fox. Of course, proteinoids are no
    model for proteins, for many reasons, but primarily because of the
    problem of sequence information (II), which cannot emerge without
    reproduction and evolvability, as Joyce says. It's now 40 years ago that
    I became aware of Fox's proteinoids and immediately started criticizing
    them as completely useless for helping to explain the origin of life -
    at a time when everybody was celebrating him. So you are wrong in
    calling me "way behind the times".

    Of course, I know the advantages of the RNA world hypothesis, as far as
    ribozyme functionality and the potential elimination of the
    protein-or-nucleic-acid- first chicken-and-egg problem are concerned.
    But we still don't know of any feasible prebiotic emergence of RNA and
    of replication.

    > This from 1959:
    > "According to Dr. Pirie, the fact that all forms of life known
    > today do use protein 'will have no more relevance (to primitive
    > life being dependent on protein) for a discussion about the
    > origins of life than the now almost universal use of paper has
    > for the origin of writing or the use of matches for the original
    > making of fire."~N. W. Pirie, "Chemical Diversity and Origins of
    > Life,", The Origin of Life on Earth, (New York: MacMillan Co.,
    > 1959), p. 78

    In principle, the same caveat applies to the now-favored RNA world

    > >No, Glenn, they are not at all similar. There are fundamental
    > >differences between proteins and RNAs. Structure-function relationships
    > >are completely different; and with proteins, you need the
    > >genotype-phenotype code translation - to just mention two factors. You
    > >questioned my concept of semantic biological information, but you refuse
    > >to consider my definition of it. I don't see anything relevant to this
    > >type of information in the RNA artificial selection work - although it
    > >certainly is of interest in other respects. It's just not applicable to
    > >what I said and you questioned.
    > I agree that they are different and given that no one but you seems to think
    > that proteins were the first biopolymers to evolve means that this entire
    > discussion about proteins and the origin of information is nothing but a
    > discussion about that which no one believes. It is a strawman set up to
    > appear as if scientists actually believed that information first arose via
    > proteins. They don't any more. Most believe in the RNA world, which is why
    > I am discussing RNA rather than a 50-year out dated and rejected
    > proteinaceous concept.

    Apparently, it's YOU who are the strawman builder! And you can only do
    it because you either haven't read what I wrote or because you ignored
    it or because you forgot it. I hope the last is the case. You are
    constantly upbraiding me for not having read the most recent papers you
    think were relevant, but you are not even up-to-date on what those you
    criticize wrote!

    I never claimed proteins were the first to evolve. It's just that I am
    almost as skeptical about current pet speculations about
    self-organization as I was about those of 40 years ago. Some serious
    thinking about the emergence of biological information (II) is sorely
    needed - both with respect to the origin of life 3.9 billion years ago
    and with respect to the origin of novel molecular functionalities ever

    > >> >> >This only works because you first give me the book, which contains all
    > >> >> >the relevant semantic information. With the signal, you just send me
    > >> >> >ln(3) bits of information, not lots.
    > >> >>
    > >I don't dispute these calculations at all. But again and again, I have
    > >emphasized that we have to distinguish between
    > >(I) Maximum information carrying capacity;
    > >(II) Functional information relevant for biological systems.
    > >Shannon entropy is related to (I), not directly to (II). Meaning
    > >(biological or otherwise) is found in (II) and is a function of a
    > >functional system like a given language or biological system. (I), which
    > >is a function of sequence length and alphabet size, specifies nothing
    > >but a maximum amount of functional information (II) which can be stored
    > >in a given sequence having a maximal capacity (I). Never have I claimed
    > >a 1-to-1 correspondence between a "value" of (I) and a "value" of (II).
    > ANd over and over, I keep asking you to recognize this II. Even if you can't
    > quantify it, you seem to be unable to recognize it when offered a chance to
    > tell me if a sequence has this II functional information. You never ever
    > try to tell me which sequence has it. If you can't even recognize it, can't
    > quantify it, do you really expect us to believe it is real?

    I'll repeat again what I wrote x times already: A string of any symbols
    has a computable Shannon information, and it has a computable maximum
    information carrying capacity (I). But it is unknown, without any
    further knowledge, whether it contains any semantic or functional
    information (II). Unless you know the appropriate language, you can't
    read it. You may, by statistical analysis, find that, whith a certain
    probability, it does contain some information. And if the probability is
    high enough and the text is long enough, you may even be able to learn
    the appropriate language (including its grammer, syntax,...), e.g.
    Sumerian. And proper understanding requires knowledge of the appropriate
    culture and situation. With the human genome, we are now in this stage
    of learning.

    > Lets give you another chance.
    > ken quine monie hiv a wyme
    > wyme a monie quine hiv ken
    > ken wyme a monie quine hiv
    > a ken monie quine hiv wyme
    > ken quine a hiv wyme monie
    > hiv a quine wyme ken monie
    > quine monie a ken wyme hiv
    > Which sequence in Doric has functional information. Can you recognize it
    > when you see it. We can't quantify modern art, but we can at least
    > recognize it. So, if you want me to believe that functional information is
    > a real concept then pass 5 tests like tis with only 1 failure. Which
    > sequence has meaning?

    It is possible to recognize that this may be a meaningful language. It
    is even possible to guess at an indogermanic language, and to guess at
    the possible meaning of some words (although this is risky). But 6 words
    are definitely insufficient to deduce an unknown syntax, which is a
    requirement for selecting the legal word placement you ask me for. Try
    the same with Latin! There, you may find many possible word sequences in
    a sentence to be legal, particularly with poetry.

    But anyway, this game is a non-starter. You are still in the strawman
    mode. You are presupposing something I never said. Information (II) is
    meaningful with respect to its natural environment only. Reading a
    string is insufficient.

    > >You may compute the Shannon entropy of a given DNA sequence (4-letter
    > >alphabet) or a given protein (20-letter alphabet). You'll get different
    > >values, even for a length ratio of 3:1.
    > So what? Ratio has no place in the definition of shannon's entropy. What
    > you say shows you don't know much about Shannon entropy.

    I said "length ratio", not "Shannon entropy ratio". And I wasn't giving
    you a definition of Shannon entropy. Please be more careful in how you

    > >Yockey also shows the connection to meaningful biological information:
    > >"Let us consider evolution as a communication system from past to
    > >present. At some time in the history of life the first cytochrome c
    > >appeared. As a result of drift, random walk and natural selection, this
    > >ancestor genetic message was communicated along the dendrites of a
    > >fractal ... representing a phylogenetic tree ... Some dendrites lead to
    > >modern organisms, the sequence having changed with time. Thus the
    > >original genetic message of the common ancestor specifying cytochrome c,
    > >regarded as an input, has many outcomes that nevertheless carry the same
    > >specificity. The evolutionary processes can be considered as random
    > >events along an ergodic Markov chain ... that have introduced
    > >uncertainty in the original genetic message. This uncertainty is
    > >measured by the conditional entropy in the same manner as the
    > >uncertainty of random genetic noise is measured ... Since the
    > >specificity of the modern cytochrome c is preserved, although many
    > >substitutions have been accepted, this conditional entropy may be
    > >subtracted from the source entropy ..., to obtain the mutual entropy or
    > >information content needed to specify at least one cytochrome c sequence
    > >... The information content of the sequence that determines at least one
    > >cytochrome c molecule is the sum of the information content of each
    > >site. The total information content is a measure of the complexity of
    > >cytochrome c" (p.132). For the mathematical formulation, please refer to
    > >Yockey's book.
    > I don't have a problem with Yockey's point. All he is pointing out is that
    > one must account for degeneracy in functionality, i.e. that many sequences
    > will perform the same function, which is what I keep telling you.

    And which is what I keep telling you. In particular, Yockey is dealing
    with a family of orthologous sequences, NOT with independently evolved
    protein folds or "synonymous families". You keep ignoring such vital
    distinctions. See earlier in this post. Here, I was quoting Yockey to
    show you that biological information (II) certainly IS related to
    information capacity (I), which you denied.

    > >> >Homonyms may be difficult to find in biology! They occasionally occur in
    > >> >our languages, even within the same language.
    > >>
    > >> See Szostak and Ellington above and Joyce. They are finding homonyms in
    > >> biology but you don't seem to want to discuss them.
    > >
    > >This is in vitro RNA chemistry using some biochemical molecules. It may
    > >not have much to do with biology. The RNA world is completely
    > >hypothetical, and we have no idea how it might have emerged. Presumed
    > >natural evolutionary processes in it are completely different from known
    > >evolutionary processes in living organisms.
    > And no one believes, like you seem to, that proteins were the first
    > biopolymers. They weren't, and you are arguing for a 50-year-old rejected
    > idea. At least stay with the program and the current thinking on the topic.
    > Most researchers beleive that life evolved through the RNA.

    You are constantly misrepresenting what I wrote. See above. I'll just
    add some short remarks about the homonyms, which I skipped last time.

    A homonymous protein would be a given protein sequence which shows
    completely different functionality in two different contexts. One
    example comes to mind: an enzyme which is alternatively used as an
    optically transparent substance in the eye, but I don't remember if it
    has exactly the same sequence or just a related one. This, however, is
    something of a special case, because in the eye it doesn't function as
    an enzyme, but just by means of its physical properties in a
    concentrated solution. Another example might be prions which are toxic
    versions of normal proteins folded in a different way. But again, this
    is special, in this case because it's not physiological.

    I don't know of any example of a homonymous RNA.

    Something which is quite different from homonymy is multifunctionality,
    in which _different_ regions of a molecule display different
    functionalities. They can perform both functions at the same time if
    this is called for. The analogous situation does not exist in language.
    A pun is different.

    But all this has nothing to do with synonymous protein families or other
    systems which might help us to get at estimates of functional
    information (II).

    > >Of course, today one cannot predict biological function (if any) from a
    > >sequence alone. I never claimed this.
    > You said Yockey did it above.

    Again you are mixing up orthologous sequences and independently evolved
    synonymous families, which have nothing to do with each other.

    > >However, as researchers are
    > >getting better at understanding the biological systems which can "read"
    > >and express such sequences in the appropriate functional environment, a
    > >measure of meaningful prediction will emerge. This is what the new field
    > >of proteomics is all about. This confirms the relationship between
    > >information (I) and information (II).
    > As Shannon pointed out, and I have repeated many times in this exchange,
    > there is NO relationship between Shannon entropy and your information II.
    > For some reason you don't seem to be able to understand what SHannon
    > actually wrote. I repeat it again.

    It seems to me that "you don't seem to be able to understand" what
    Yockey actually wrote. And that you don't really listen to what I say,
    but rather assume something you erroneously think I might have said.

    > "The fundamental problem of communication is that of reproducing
    > at one point either exactly or approximately a message selected
    > at another point. Frequently the messages have _meaning-, that
    > is they refer to or are correlated according to some system with
    > certain physical or conceptual entities. These semantic aspects
    > of communication are irrelevant to the engineering problem." C.
    > E. Shannon, " A Mathematical theory of Communication" The Bell
    > System Technical Journal, 27(1948):3:379-423, p. 379
    > What part of the term 'irrelevant' do you not understand?
    > glenn

    I understand that you completely misapply this quotation from Shannon to
    our discussion. We may perhaps apply it in a _partially_ meaningful way
    by saying that there is a communication channel from DNA to protein, and
    that the semantic aspects of the biological information transmitted are
    irrelevant to the engineering problem of transcription and translation.
    That is, the irrelevance applies only apart from the fact that the
    transcription and translation machineries themselves are also specified
    by the semantic aspects of the biological information. Thus, unlike
    information technology, the biological system is self-referential.

    But our discussion, that is, what I proposed in the beginning and you
    criticized, was not at all about this problem, but about the question of
    estimating amounts of biological functional information, using genome
    sequence space as a ruler.


    Dr. Peter Ruest, CH-3148 Lanzenhaeusern, Switzerland
    <> - Biochemistry - Creation and evolution
    "..the work which God created to evolve it" (Genesis 2:3)

    This archive was generated by hypermail 2b29 : Tue May 21 2002 - 13:27:23 EDT