Polyphyly and the origin of life

From: Peter Ruest (pruest@pop.mysunrise.ch)
Date: Wed May 22 2002 - 14:58:35 EDT

  • Next message: Craig Rusbult: "Re: origins theories (in ASA Sci Ed website)"

    Glenn Morton wrote (18 May 2002 20:51:50 -0700):
    > >[PR:]... But remember that
    > >the whole long argument started (04 May 2002 16:46:30 +0200) with my
    > >simple claim that we have to distinguish between:
    > >(I) Maximum information carrying capacity;
    > >(II) Functional information relevant for biological systems.
    > >Your reaction was that Shannon information is the only valid
    > >information, and that (II) has nothing to do with information.
    > I noted that II above had nothing to do with Shannon information and that
    > there is data showing that there are huge numbers of different families of
    > molecules which will perform the same function.

    This only refers to artificially selected RNA of questionable relevance
    for learning about natural evolution.

    > >That's artificial selection of RNA function in vitro, rather than
    > >spontaneous emergence of minimal protein function without selection in
    > >vivo (or in the prebiotic world).
    > The interesting thing I notice is that in your first statement, the
    > statement which I criticized, you have no requirement for me to show that it
    > arose out of nothing and no restriction to non-vitro experiments.

    Do you mean "in vivo", rather than "in vitro"?
    My first statement which you criticized (1 May 2002 20:30:36 -0700) was:
    >>The amount of meaningful or semantic information contained in a system
    >>may be defined as the minimal length of an algorithm capable of
    >>specifying it (M.V. Volkenstein, "Punctualism, non-adaptionism,
    >>neutralism and evolution", BioSystems 20 (1987), 289). This would
    >>exclude all features irrelevant for meaning or functionality. The
    >>meaningful information contained in today's biosphere may be
    >>approximated by a (purely theoretical) minimal set of genome parts
    >>"streamlined" to include the code for whatever is really required for
    >>the organisms represented in the biosphere, but nothing else. Its amount
    >>is such that the improbability of its generation by random-variation /
    >>natural-selection processes, starting with a prebiotic universe, is
    >>vastly transastronomical.

    I clearly was talking about the origin of life (or of the biosphere as a
    whole), with its crucial first living system(s) originating out of a
    prebiotic environment, in which there is no biological information (thus
    "nothing"). As each in vitro experiment using artificial selection
    artificially introduces plenty of functional information, it cannot be
    used (at least not without an appropriate correction) for estimating the
    possible amount of information II originating spontaneously. Thus, the
    requirements were clearly stated.

    This refers to the general problem of information II. But in order to
    find any estimate of amounts of information II, we have to consider much
    simpler systems. And the only one I could think of to-date, which might
    offer a hope of getting at such values, is the origin of a novel
    enzymatic functionality in its minimal form, just before natural
    selection sets in. This situation, of course, can only be investigated
    in a modern biological system including genetic coding, transcription,
    translation, folding, and probably only in the context of a large family
    of orthologous proteins, such as the cytochromes c.

    > And I
    > agree that life has not been generated in the test tube yet. But I did show
    > that there were lots of other sequences which would perform at least some of
    > the functions which means that the probability against life originating on
    > its own is not as bad as anti-evolutionists claim.

    As I told you before, information II cannot be determined from
    artificial selection in vitro because we don't know how much information
    is artificially introduced into the molecules selected. RNA can only
    model the purely hypothetical RNA world, of which we don't even know
    whether it is viable at all - even ignoring the problem of its
    initiation. And RNA selection experiments are done with the help of
    biological protein-enzymes! Thus, again, there is no possibility of
    estimating biologically relevant information II. Remember that, as you
    have emphasized yourself, information II is absolutely undefined apart
    from the (right, biological!) context of a molecule considered.

    > And as to your belief that there are no examples of multiple familes found
    > to date, you are wrong. Indeed this evidence also indicates that there were
    > possibly multiple origins of life. And there are multiple families found in
    > nature:
    > "Similarly, the archaeal proteins responsible for several
    > crucial cellular processes have a distinct structure from the
    > proteins that perform the same tasks in bacteria. Gene
    > transcription and translation are two of those processes. ...
    > Biochemists found that archaeal RNA polymerase, the enzyme that
    > carries out gene transcription, more resembles its eukaryotic
    > than its bacterial counterparts in complexity and in the nature
    > of its interactions with DNA. The protein components of the
    > ribosomes that translate archaeal messenger RNAs are also more
    > like the ones in eukaryotes than those in bacteria." W. Ford
    > Doolittle, "Uprooting the Tree of Life, Scientific American,
    > February 2000, p. 90-95, p. 92-93

    Under the designation "multiple families", you are mixing up some
    fundamentally different concepts (orthologs and paralogs are
    subgroupings of homologs (which have significantly similar sequences)):
    (1) orthologs in different species are derived by common ancestry from
    the same ancestral protein;
    (2) paralogs in the same or different species are derived from
    independent evolution from a gene duplication in some ancestral species;
    (3) xenologs are homologs obtained by lateral gene transfer;
    (4) different families are sets of orthologs, where the different sets
    are (usually) paralogs of each other (domain shuffling may introduce
    additional levels of complexity between paralogous families, and
    supersets of families more distantly related may form superfamilies).

    (5) In addition, there may be different proteins in different (or
    perhaps even the same?) species that perform the same function in the
    same cellular and organismal environment, but that evolved independently
    from each other (thus are not homologous as protein sequences). They
    have no common ancestral protein of this (or a similar) function, but
    arose by functional convergence. These would be members of two of what I
    called "synonymous families". And this is the only case which is of
    interest in the context of the emergence of function and of estimating
    the amount of information II, because any homologous relationship (of
    types (1) to (4)) indicates that a common ancestor of the same or a
    similar functionality existed, and therefore the functionalities of the
    two proteins compared emerged earlier.

    In practice, it is often unknown whether two homologous proteins are
    orthologs, paralogs, or convergents. In Doolittle's review you quote,
    there are probably various different types of relationships between the
    corresponding proteins in archaea and bacteria. Nobody knows whether
    there is any case of family relationship of type (5) between some
    archaeal and bacterial protein(s). Therefore, your quoting Doolittle and
    Pennisi (below) is gratuitous - or else give me the evidence pointing to
    a type (5) relationship. What they discuss is lateral gene transfer
    (homology of type (3)), without giving any indication of type (5)

    > >Yes, what I wrote underlines that there was one origin of life, not
    > >multiple ones, and having one family is expected. But the point I made
    > >is that this fact strongly argues against your assumption that huge
    > >numbers of synonymous families are possible. If that were the case, you
    > >would expect multiple families, even if there were only one origin of life.
    > No, not at all. There is indeed evidence of multiple origins of life and
    > then a period of mixing of genomes among the early metazoans.

    You mean protozoans or prokaryotes, rather than metazoans. It's the
    protozoans (unicellular organisms), and in particular the prokaryotes
    (without a nucleus: the archaea and bacteria), which exchanged genes,
    possibly quite liberally. This became much more difficult with metazoans
    (multicellular organisms).

    As far as a possible multiple origin of life is concerned, we don't have
    anything beyond speculation. The evidence pointing to multiple lateral
    gene transfers is no evidence at all for multiple origins of life.

    Mixing of genes of different origin in the same organism (by means of
    lateral gene transfer) implies that this organism becomes a hybrid to
    some extent, and if you want to trace all genes, the phylogenetic tree
    becomes reticulate. Yet (apart from domain shuffling) this does not
    imply any reticulation of the individual gene trees (as opposed to the
    organismal tree): each gene (or more precisely, each functional protein
    domain) has its own unique descent and originated in a particular
    species at a particular time.

    And if you look back to the definition of "synonymous families" given
    above, you see that my claim stands, that "huge numbers" of them are
    very unlikely.

    > >> ... it has to do with multiple families of
    > >> biopolymers being able to perform the same function. ... the data I just
    > >> posted but which you failed to give comment.
    > >
    > >Just like in Sept. 2000, when we discussed this last time, you keep
    > >talking about RNA artificial selection in vitro, rather than protein
    > >natural selection in vivo or prebiotic random walk emergence of minimal
    > >function, which is very different, and I explained why. ...
    > And like the last time, you fail to take note of the relationship between
    > RNA, DNA and proteins. Do you seriously think proteins can't perform
    > mutliple functions? They are being found all the time. Maybe the fact that
    > you don't seem to know about them indicates you aren't keeping up with the
    > field here.

    Now, this is confusing, Glenn. The relationship between DNA, RNA and
    proteins is (to a first approximation) coding, transcription and
    translation. If the code for a multifunctional protein is contained in a
    gene, of course the resulting protein is multifunctional. The different
    functionalities usually reside in different domains of the protein.
    Where did I claim there couldn't be multifunctional proteins? I never
    believed that.

    But even more confusing is that you apparently think this has anything
    to do with what you call "multiple families of biopolymers being able to
    perform the same function" (among which I want to concentrate on the
    subset of "synonymous families", for the reasons I gave). In any case,
    multifunctional proteins perform different functions in the same
    organism, and if we want to find out anything about the de novo
    emergence of any one of these functions (information II), we have to
    look at the family of the domain, in whose (simple) function we are
    interested, and go back in time to the common ancestor of the domain.
    This is quite a different question from the origin of the domain
    combination producing the multifunctionality. I don't expect such domain
    shuffling to produce any new information II. Therefore, I skip your
    references to multifunctional proteins as irrelevant.

    > "However, some functions (e.g. proteolysis) have evolved
    > multiple times and can not be accounted for by any single set of residues."
    > E. W. Stawiski et al, "Progress in Predicting Protein Function from
    > structure: Unique Features of O-Glycosidases,"
    > http://www.smi.stanford.edu/projects/helix/psb02/stawiski.pdf

    Now this might be interesting, if there is compelling evidence for the
    same function (not just any proteolysis, but proteolysis with the same
    sequence specificity) is found in non-homologs of type (5) above. It is
    not easy to provide evidence for independent evolution of the same
    function; probably, one would have to require different protein folds.
    Has a paper been published by Stawiski about this? Can you give me the
    journal reference?

    > What I have shown here is that proteins have multiple functions, multiple
    > functionality implies other families could do a similar task.

    Multiple-function proteins are irrelevant to the question under
    investigation, see above. And the rest of your sentence is a
    non-sequitur if we are concerned with non-homologous synonymous

    > That implies
    > that the probability for producing life is not nearly so impossible as
    > anti-evolutionists would have us believe.

    It implies nothing of the kind.

    > And multiple families are seen in
    > the various kingdoms which implies a possibly ancient polyphyletic origin of
    > life.

    You keep mixing up multiple families and multifunctional proteins. And
    with respect to reticulate phylogenesis and polyphyletic origin of life
    see above. Your argument is rather confused and unconvincing.

    > I am just a geophysicist and know what is happening in this field. Why is it
    > that anti-evolutionists don't seem to be able to keep up with the latest
    > discoveries? This is what bothers me most about Chistian apologetics. It is
    > always always, way behind times.

    Do you think this is a fair statement? Think again!

    > glenn


    Dr. Peter Ruest, CH-3148 Lanzenhaeusern, Switzerland
    <pruest@dplanet.ch> - Biochemistry - Creation and evolution
    "..the work which God created to evolve it" (Genesis 2:3)

    This archive was generated by hypermail 2b29 : Wed May 22 2002 - 15:26:32 EDT