RE: Polyphyly and the origin of life

From: Peter Ruest (
Date: Thu May 23 2002 - 12:41:09 EDT

  • Next message: Adrian Teo: "RE: Catholic Church and Morality"

    Glenn Morton wrote (20 May 2002 21:59:02 -0700):
    > >From: Peter Ruest []
    > >Sent: Monday, May 20, 2002 7:54 AM
    > >> >That's artificial selection of RNA function in vitro, rather than
    > >> >spontaneous emergence of minimal protein function without selection in
    > >> >vivo (or in the prebiotic world).
    > >>
    > >> The interesting thing I notice is that in your first statement, the
    > >> statement which I criticized, you have no requirement for me to
    > >show that it
    > >> arose out of nothing and no restriction to non-vitro experiments.
    > >
    > >Do you mean "in vivo", rather than "in vitro"?
    > No, I mean in vitro. In vitro means in glass. One could put it in a coke can
    > I presume.

    Hi Glenn
    Sorry, I miscounted the logical negations in your clause "no restriction
    to non-vitro". My fault! I know what in vitro means. I did this for many

    > >My first statement which you criticized (1 May 2002 20:30:36 -0700) was:
    > >>>The amount of meaningful or semantic information contained in a system
    > >>>may be defined as the minimal length of an algorithm capable of
    > >>>specifying it (M.V. Volkenstein, "Punctualism, non-adaptionism,
    > >>>neutralism and evolution", BioSystems 20 (1987), 289). This would
    > >>>exclude all features irrelevant for meaning or functionality. The
    > >>>meaningful information contained in today's biosphere may be
    > >>>approximated by a (purely theoretical) minimal set of genome parts
    > >>>"streamlined" to include the code for whatever is really required for
    > >>>the organisms represented in the biosphere, but nothing else. Its amount
    > >>>is such that the improbability of its generation by random-variation /
    > >>>natural-selection processes, starting with a prebiotic universe, is
    > >>>vastly transastronomical.
    > >
    > >I clearly was talking about the origin of life (or of the biosphere as a
    > >whole), with its crucial first living system(s) originating out of a
    > >prebiotic environment, in which there is no biological information (thus
    > >"nothing"). As each in vitro experiment using artificial selection
    > >artificially introduces plenty of functional information, it cannot be
    > >used (at least not without an appropriate correction) for estimating the
    > >possible amount of information II originating spontaneously. Thus, the
    > >requirements were clearly stated.
    > No they don't introduce functional information. If you recall, some of the
    > quotes I posted said that they had randomized the sequence so that .25 of
    > the sequences were totally random. That is degrading functional information
    > (assuming such information really exists).

    Functional information can be introduced (1) by starting with some
    biological function already contained in the RNA being subjected to
    evolution. I agree that this is not the case if you completely randomize
    the starting RNA. You concede that not in all experiments this is the
    case. But functional information is also introduced (2) in each
    artificial selection experiment by the selection itself (for each
    selection step this amounts to at least 1 bit, a yes/no decision), and
    (3) in each natural selection of a functional molecule in vivo.

    > >This refers to the general problem of information II. But in order to
    > >find any estimate of amounts of information II, we have to consider much
    > >simpler systems. And the only one I could think of to-date, which might
    > >offer a hope of getting at such values, is the origin of a novel
    > >enzymatic functionality in its minimal form, just before natural
    > >selection sets in. This situation, of course, can only be investigated
    > >in a modern biological system including genetic coding, transcription,
    > >translation, folding, and probably only in the context of a large family
    > >of orthologous proteins, such as the cytochromes c.
    > So I am waitg for you to tell me the 'amounts of information II any
    > sequence has. To date you can't define how one would estimate such a
    > quantity. Indeed, you have said it isn't possible, but now you say it is.
    > What is the equation for estimating the 'amount of information II"

    I have described and discussed this problem, in detail, in P. Ruest,
    "How has life and its diversity been produced", PSCF 44/2 (June 1992),
    80-94 (available at, in the chapter
    "Semantic information".

    Yockey has shown how an invariant for an orthologous protein family is
    calculated: this is what I call information II, if it corresponds to the
    first ancestral sequence of the family at the time when the first
    selectable functionality emerged (the minimal-functionality protein).
    The invariants in my example below, transformed into entropies according
    to Yockey's formula, would be information II. Unfortunately, today, no
    minimal-functionality proteins exist in the biosphere. They would have
    to be synthesized artificially.

    Here I give an example calculation for an estimate of the maximum
    feasible length of a random-walk (i.e. before selection can set in)
    mutational path needed, on the average, to reach a specified invariant
    for an enzyme family in vivo (extremely optimistic parameters assumed),
    or the average time needed to find such an invariant of a given size for
    a novel activity:
    n=3 nucleotides/codon
    d=3.05 codons/aminoacid [={(4^n)-3}/20, with 3 stop codons]
    j=2.16 mutations per specified aminoacid replacement (geom.average)
    m=10^(-8) mutations per nucleotide replicated -->
    r=average number of nucleotide replications required
       per specified amino acid replacement:
    C=10^16 moles carbon/year metabolized in today's biosphere
    B=10^14 bacteria per mole carbon
    N=4.7*10^6 nucleotide pairs/bacterium assumed -->
    R=number of nucleotides replicated per year in the biosphere:
    s=invariant=number of specified aminoacid replacements required
       for minimal novel enzymatic activity -->
    t=time required:
    t=(r^s)/R years
    For s=1: t=4*10^(-14) seconds
    for s=2: t=4 minutes
    for s=3: t=40 billion years
    Known invariants:
    s=~30 for simple enzymes (cytochrome c, ribonuclease)
    s=~5 for specific enzyme adaptations
       (e.g. stomach lysozyme for foregut fermenters)
    I am receptive for any possibly better modelling ideas.

    > >As I told you before, information II cannot be determined from
    > >artificial selection in vitro because we don't know how much information
    > >is artificially introduced into the molecules selected. RNA can only
    > >model the purely hypothetical RNA world, of which we don't even know
    > >whether it is viable at all - even ignoring the problem of its
    > >initiation. And RNA selection experiments are done with the help of
    > >biological protein-enzymes! Thus, again, there is no possibility of
    > >estimating biologically relevant information II. Remember that, as you
    > >have emphasized yourself, information II is absolutely undefined apart
    > >from the (right, biological!) context of a molecule considered.
    > If you can't define 'biologically relevant information II', then you have
    > nothing worth speaking of in science. YOu have a belief, and nothing more.
    > Science demands definitions which are objective. The only way I can see that
    > you can prove objectivity in your definition of information II is for you to
    > determine which sequence contains it, something you keep avoiding.

    I think I have operationally defined 'biologically relevant information
    II', even though no mathematical definition exists. The problem is that
    proof is available neither for the existence of such information II, NOR
    FOR ITS NON-EXISTENCE. Its non-existence is just ASSUMED by most people,
    including ALL atheists - for obvious reasons. Don't you think that its
    discussion might be a legitimate endeavour among those who don't just
    reject it out of hand - out of a philosophical prejudice?

    For theological reasons, I suspect that the existence of information II,
    as I defined if, might never be strictly provable. Freedom of the will
    with respect to choosing to believe in God would seem to imply that, in
    principle, no stringent proof that God exists is possible. This would
    also imply that we'll NOT be able to PROVE that evolution of some novel
    functionalities is not possible. Of course, the opposite proof, namely
    that spontaneous emergence of all existing functionalities is feasible,
    is also NOT possible. But should we, for this reason, stop and forbid
    any thinking about the feasibility of spontaneous emergence of life and
    its complexity? I think not.

    If you don't agree, there is no use for us to continue this discussion.
    But at least, you should be ready to acknowledge that your way of
    argumenting is absolutely one-sided, and therefore unscientific, in
    requiring strict proofs of those proposing the existence of information
    II, but requiring nothing of the sort of those claiming the opposite. If
    we look at the evidence available, I think it points much more towards
    the reality of such information. Trying to evade this issue, by
    dogmatically defining it offside or outside of science, doesn't strike
    me as very openminded.

    > >Under the designation "multiple families", you are mixing up some
    > >fundamentally different concepts (orthologs and paralogs are
    > >subgroupings of homologs (which have significantly similar sequences)):
    > >(1) orthologs in different species are derived by common ancestry from
    > >the same ancestral protein;
    > >(2) paralogs in the same or different species are derived from
    > >independent evolution from a gene duplication in some ancestral species;
    > >(3) xenologs are homologs obtained by lateral gene transfer;
    > >(4) different families are sets of orthologs, where the different sets
    > >are (usually) paralogs of each other (domain shuffling may introduce
    > >additional levels of complexity between paralogous families, and
    > >supersets of families more distantly related may form superfamilies).
    > No, I am not mixing up these concepts. I am speaking of totally different
    > know what I am saying and it isn't what you are trying to make me say.
    > I wrote:
    > >> No, not at all. There is indeed evidence of multiple origins of life and
    > >> then a period of mixing of genomes among the early metazoans.
    > >
    > >You mean protozoans or prokaryotes, rather than metazoans.
    > You are correct, I miswrote there.
    > >It's the
    > >protozoans (unicellular organisms), and in particular the prokaryotes
    > >(without a nucleus: the archaea and bacteria), which exchanged genes,
    > >possibly quite liberally. This became much more difficult with metazoans
    > >(multicellular organisms).
    > >
    > >As far as a possible multiple origin of life is concerned, we don't have
    > >anything beyond speculation. The evidence pointing to multiple lateral
    > >gene transfers is no evidence at all for multiple origins of life.
    > It is evidence that one can't automatically assume a single origin of life.
    > Such a mixing would clearly mask any multiple origins of life.


    > >Mixing of genes of different origin in the same organism (by means of
    > >lateral gene transfer) implies that this organism becomes a hybrid to
    > >some extent, and if you want to trace all genes, the phylogenetic tree
    > >becomes reticulate.
    > Yes, and any multiple origins of life would not be easy to detect under
    > these assumptions.


    > >Yet (apart from domain shuffling) this does not
    > >imply any reticulation of the individual gene trees (as opposed to the
    > >organismal tree): each gene (or more precisely, each functional protein
    > >domain) has its own unique descent and originated in a particular
    > >species at a particular time.
    > I disagree with your use of this data.

    I am not trying to prove monophyly here - although, on the basis of
    other data, I think it is the most likely assumption. Cases of xenologs,
    on the other hand, are quite insufficient to make a strong case for

    Each case of xenology contributes to reticulation of the organismal
    tree, but not to reticulation of that particular gene tree, since there
    are, at the point of lateral gene transfer, not two different sources
    for this transferred gene, but only one.

    When domain shuffling comes in, on the other hand, a case might occur
    where a new gene is generated by genetic recombination involving two
    domains, one of which was already resident in that particular organism,
    whereas the other one entered it from some other organism as a xenolog.
    This produces a hybrid gene having a gene tree with a reticulation.

    If the two domains remain intact, i.e. the recombination occurs beyond
    the boudaries of functionality of the two domains, both of them might
    retain their functionalities, which are, as a rule, different from each
    other. Thus, this gives us no indication of the same function being
    executed by two independently evolved protein sequences.

    If, on the other hand, the recombination cuts out some part of one or
    both source domains, a new domain might be formed. Now, the domain from
    which a part was snipped will most probably lose its original function,
    and if both source domains were shortened, both functions will probably
    be lost. Again, this gives us no indication of the same function being
    executed by two independently evolved proteins.

    An extreme case can be imagined, in which two independently evolved
    domains of exactly the same function come together by lateral gene
    transfer and are recombined somewhere in the middle of both source
    domains, producing an entirely new hybrid domain which, in addition,
    reconstitutes the same function each source domain had, to begin with.
    This is the only case I can conceive of, where gene tree reticulation
    could tell us something about the same function being executed by
    independently evolved sequences. Do you think such an almost-miracle is
    likely to happen often? Do you know of any published example?

    > >And if you look back to the definition of "synonymous families" given
    > >above, you see that my claim stands, that "huge numbers" of them are
    > >very unlikely.
    > I don't agree with your assumption. Indeed, experimental evidence wouldn't
    > agree.

    Which assumption? Which evidence? Ok, give me just one published example
    of two protein families, where (1) each family consists of orthologs,
    i.e. clearly has a common ancestor, and (2) the two families execute
    exactly the same function, and (3) the respective common ancestors of
    the two families evolved independently! Instead of families, there could
    theoretically be just two single proteins with properties (2) and (3),
    but this would make it virtually impossible to show that (3) applies.
    But you are not talking of just one example, but of "huge numbers"!

    > >Now, this is confusing, Glenn. The relationship between DNA, RNA and
    > >proteins is (to a first approximation) coding, transcription and
    > >translation. If the code for a multifunctional protein is contained in a
    > >gene, of course the resulting protein is multifunctional. The different
    > >functionalities usually reside in different domains of the protein.
    > >Where did I claim there couldn't be multifunctional proteins? I never
    > >believed that.
    > Well, you erroneously claimed that there were no multifunctional proteins.
    > At least that is what you had written. You were wrong. You wrote Sat
    > 5/18/02 8:55 that:
    > > >Agreed - in principle. Yet, if there are any other families (let alone
    > > >huge numbers) which will perform the same function (in the same
    > > >organismal environment), I find it strange that no such example has been
    > > >found to date, as far as I know.

    Now tell me why (a) a multifunctional protein, which by definition is
    one single protein performing at least two different functions, and (b)
    two synonymous proteins, which by definition are very different, because
    they evolved independently, but perform one and the same function,
    should be the same thing! Case (b) interests us, not case (a). Your
    claim is still confusing.

    > And then I went out to find them.

    You did not at all.

    > > In any case,
    > >multifunctional proteins perform different functions in the same
    > >organism, and if we want to find out anything about the de novo
    > >emergence of any one of these functions (information II), we have to
    > >look at the family of the domain, in whose (simple) function we are
    > >interested, and go back in time to the common ancestor of the domain.
    > This simply isn't true. Multifunctional proteins, as I posted last night,
    > perform different functions IN THE SAME ORGANISM.

    That's exactly what I wrote, word for word (apart from capitalizing), in
    the second line of my paragraph just above your reply! What do you want
    to say?

    > >Multiple-function proteins are irrelevant to the question under
    > >investigation, see above.
    > No they aren't, you said they didn't exist. They are relevant to measure
    > your knowledge of the field, and they are relevant to the measurement of
    > probability. Besides, as I noted, no one believes, save you, that proteins
    > were what arose first.

    I just showed you above that I didn't say they didn't exist, and I
    explained again why they are irrelevant. Besides, as I noted in my last
    post to you, I never claimed proteins arose first.

    > >You keep mixing up multiple families and multifunctional proteins.
    > NO, I am showing that multifunctionality demonstrates that the probability
    > of finding a given function in probability space is less than even Yockey
    > calculated. Multifunctionality is related to multiple familis. If protein X
    > does both function A and B and Protein Y does function B and C, then Y is
    > multifunctional and is multifamily.
    > glenn

    Multifunctionality demonstrates nothing of the kind. It just shows that
    multiple functions can sometimes exist peaceably side by side in the
    same protein - and presumably, that this is advantageous in these cases.
    Multifunctionality has nothing to do with synonymous families, i.e.
    independently evolved families of identical functionality.

    Now, in your last sentence, for the first time, you provide a completely
    new case. Does this indicate what exactly you meant all along with your
    combination of multifunctionality and synonymy? I read this as functions
    A, B, C being due to protein domains A, B, C. If not, you would first
    have to give some indications about the sequence-function relationships,
    and of sequence dissimilarity and gene trees indicating different
    origins, before the case could be discussed profitably. In any case, X
    and Y would then not be fully synonymous, the part(s) of the molecules
    performing the common function B could be influenced by A and C,
    respectively, and the relative contributions of the different parts to
    the functional information II would be extremely difficult to

    For the case of domains (one function - one domain): X = A--B, Y = B--C
    (or perhaps better: X = A--B, Y = C--B): these apparently are two
    two-domain proteins sharing a domain B, but differing in the other
    domain, A or C. They presumably arose by domain shuffling. It may be
    presumed (if it is not known) that B has the same (or a similar)
    function in both X and Y, whereas the functions of A and C are probably
    different. We clearly have multifunctionality in both X and Y, but just
    as clearly they are not necessarily synonymous. If X and Y WERE indeed
    synonymous as entire proteins, it would be a valid case. But then the
    interest about synonymity would have to focus on A versus C as domains,
    and it would have to be shown that they do not share a common ancestral
    domain, since B in both proteins clearly does have the same ancestral
    domain. Can you indicate a paper documenting such a case, sufficiently
    well researched to answer the above questions?


    Dr. Peter Ruest, CH-3148 Lanzenhaeusern, Switzerland
    <> - Biochemistry - Creation and evolution
    "..the work which God created to evolve it" (Genesis 2:3)

    This archive was generated by hypermail 2b29 : Thu May 23 2002 - 13:51:34 EDT