Hi Glenn, for the context:
> >[PR:]... But remember that
> >the whole long argument started (04 May 2002 16:46:30 +0200) with my
> >simple claim that we have to distinguish between:
> >(I) Maximum information carrying capacity;
> >(II) Functional information relevant for biological systems.
Glenn Morton wrote (19 May 2002 09:01:04 -0700):
> >> Pay attention to the issue at hand. I am saying that ignoring the data I
> >> posted above is exactly LIKE, ANALOGOUS, SIMILAR to the way the YECs act.
> >> And indeed, you skipped right by it without any comment.
I thought I'd better just ignore such an unfair and ridiculous reproach.
Come on, Glenn, don't just throw anyone criticising evolutionary just-so
stories into the YEC bin! I have never been in the YEC corner. Even if
you say "exactly LIKE, ANALOGOUS, SIMILAR", you are being unreasonably
offensive. I'm not conscious of having ignored any data you presented
without at least saying why I'm skipping it. But if you want to come
back to such a supposed case, tell me what it was.
> >The issue at hand is random evolution of novel protein functionality,
> >and, in particular, the first minimal functionality of a novel protein,
> >before natural selection can set in. This has nothing to do with
> >artificial selection of RNA in vitro, particularly if some of the
> >functionality selected is already present. It's you who are evading the
> >issue, not I.
> This is a rigging of the roulette wheel. What you seem to be doing is ruling
> out any experimental evidence which can be collected today.
Nonsense! I started this thread and have the right to come back to my
original statement of the problem you criticized. I'm gladly ready to
consider "any experimental evidence which can be collected today" - _if_
it has any relevance for my statement you criticized. I have rather the
impression that you have still not understood what I wanted to say.
> If it is in
> vitro, you say it has nothing to do with the origin of life and the
> emergence of useful proteins. If we put it in a metal dish and do the same,
> or in a rock dish, does it suddenly become ok? I doubt you would accept any
> experiments with zeolites which might show interesting effects merely
> because they took place on a university campus. We have no time machine
> with which to return to 3.8 Gyr and watch the process. So what you do by
> ruling out the discussion of any experimental evidence is firmly plant your
> head below the ground so that one can not see observational data.
Please stop such unfair talk which has nothing to do with what I wrote!
If you resent my skipping such remarks, just try to better understand
what I am saying!
> >> >>So, given that I am mentioning this work for a second time, will you
> >> >> respond to it's import now?
> >> >
> >> >You have not mentioned these papers (if I remember correctly), but
> >> >similar ones, and I responded in detail. But I may do it again, giving
> >> >you a new example if you insist. A. Lombardi, et al., "Miniaturized
> >> >metalloproteins: Application to iron-sulfur proteins", PNAS 97 (2000),
> >> >11922, attempted to design a minimal redox enzyme, but haven't achieved
> >> >their goal as yet. Their dimeric undecapeptide can hold an iron atom,
> >> >but is unstable, being too small to shield off the environmental water.
> >> >The invariant of their (intelligently designed) construct amounts to at
> >> >least 5 specific amino acid occupations, which is too much to be
> >> >attainable by an evolutionary process without selection.
> >> What Lombardi is doing is not at all what Joyce, Szostak and Ellington are
> >> doing.Lombardi is trying to shrink the proteins down to
> >> of smaller length.
> >This is exactly my point, see above. These miniature proteins are the
> >ones which may give indications about the origin of semantic or
> >functional biological information, about which I was talking (case a).
> We weren't discussing the minimum length proteins can be. We were discussing
> if other families of proteins could perform a given task than the one we
> find doing it today. That was the issue. Not how short a protein can be.
> Tell me how short the sequence for 'be' can be and still have you understand
> what is meant?
Right, "if other families of proteins could perform a given task than
the one we find doing it today" is one of the important questions, not
the length of the protein. I hope you are talking about "synonymous
families", as I defined them in my last post, "Polyphyly and the origin
Last time, I paid no attention to your adding "of smaller length". The
small length of such "miniature proteins" is not at all the critical
point, either with Lombardi or with me. What is important, instead, is
the minimal number of specified amino acids (what I called the
"invariant" above) - just as with Yockey's work. The protein may be
longer, if the identity of the other amino acids adds nothing to the
functionality. The question is: at how many positions in a protein do I
need a particular amino acid (or for less stringent positions, any one
in a given restricted set of amino acids), in order to get the function
looked for? This is the "amount of specification" required for the
In contrast to Yockey, I add the theoretical requirement that this
sequence is not derivable, by evolution with natural selection, from a
different one with _less_ specification, but having nevertheless some of
the function. This is another way of saying that it had to be formed by
means of a strictly random-walk mutational path. In this way, I hope to
arrive at an estimate of the amount of information II. I would call such
a protein a "minimal-functionality protein". This additional
requirement, by the way, is not necessarily beyond experimental testing.
> And to assume that the original proteins performed precisely
> the same task as evolved proteins do today is quite a leap.
I have never said they did. All I assume for the estimate I am looking
for is that it is a "minimal-functionality" protein. Subsequently, it
may have evolved further by means of a normal evolutionary pathway with
non-negative natural selection at each step. During this evolutionary
path, its function may have been modified, as well as increased. But
this further path is no longer easily tractable for a determination of
> >I did not want to deal with improvement of a preexisting functionality
> >(case b), because there you may just be taking over some "information"
> >from the environment by means of selection. And I did not doubt there
> >are some RNA functions (case c) that are not very difficult to find (if
> >you do have RNA!), just as Joyce, Szostak and others have found, even if
> >you are looking for a function not yet present in the starting mix.
> >Again, we have no means of telling whether any information has emerged
> >de novo. With proteins, there is a way of dealing with semantic
> >information (II), cf. Yockey's book.
> All Yockey did was to substitute hydrophobic amino acids for hydrophobic,
> hydrophilic for hydrophylic in cytochrome c and then count the
> possibilities. That is not predicting the function. That is merely saying
> that there are x number of sequences which appear to be able to perform the
> same task. I think Yockey is correct, or close to it. But it isn't taking a
> novel sequence and deciding what it does.
He did not just consider amino acid polarity, but also composition and
volume. I never said he wanted to predict function. And I never said he
wanted to find a "minimal-functionality" protein. I cited him for his
use of what I call functional information (II), as opposed to maximum
information capacity (I). I do use his way of deriving an amount of
information (II) from amino acid specificity within an orthologous
protein family. And I cite him for his estimate of the extremely low
frequency (2 in 10^44) of functional iso-1-cytochromes c in the
transastronomical composition space; as he doesn't take into
consideration amino acid properties other than the three mentioned, nor
any correlations between different positions, nor any species-specific
requirements, his estimate likely is still far too high.
> >With RNA, I know of no similarly
> >promising way of dealing with functional information, because residue
> >conservation is much less clearly definable (you have only 4
> >nucleotides, and there is the additional complication of base pairing).
> >So, I am looking for examples of case a, but you keep pointing to
> >examples of case b and/or case c.
> I disagree here. Yockey is speaking of using pre-existing functionality to
> predict similarity of function in similar molecules. The importance of what
> he did was to show that the silly anti-evolutionary argument of past years
> in which only one sequence is allowed to perform a given function is false.
> Case c, the RNA shows probability is much much less. And with Case a, the
> proteins, I posted references to multifunctionality yesterday. That is
> evidence that proteins will be subject to the same thing.
Yockey is dealing with a family of orthologous proteins. There is no
reason to suppose that the members of this family which exist or existed
did not descend from a common ancestral protein by way of individually
selected mutations. There is no way of estimating any probabilities
involved in this process. Notice that Yockey assigns an information
content (II) to the entire family, not to an individual sequence. It may
safely be assumed that all precursors of the present cytochromes c
(cyt.c) back to their most recent common ancestor (MRCA) belong to this
same family. Therefore this same information content (II) applies to all
precursors back to the MRCA sequence, as well. All that happened during
the time since then is not global modifications, but only species-(or
genus-, etc.)-specific ones which are not taken into consideration in
But what happened before the time of that coalescence in the MRCA? The
MRCA must have evolved from earlier forms, which probably were simpler
and less active, back to the minimal-selectable-functionality (MSF)
cyt.c (about which I wrote in the other post, "Polyphyly and the origin
And before the time of this MSF cyt.c? The emergence of this MSF cyt.c
is the only process not under natural selection (by definition), so it
was a random mutational walk through sequence space, whose probability
can be estimated if the size of the specification for the MSF stage can
Of course, all this has nothing to do with the idea that there can be
only one active cyt.c sequence. I wonder where you get that idea from.
Do you know of anyone ignorant enough to hold it?
As for the RNA case c, you probably wanted to say that the improbability
(rather than probability) is much less. This is correct for artificial
selection systems. And it may even be correct for an initial natural RNA
world - although we don't know this. But so what? I already explained
that the RNA world probably cannot be used to estimate functional
information content (II).
I dealt in my last post with the irrelevance of the multifunctional
> And I have to ask a really dumb question here. NO ONE IN THE ORIGIN OF LIFE
> ISSUE BELIEVES THAT LIFE AROSE FROM PROTEINS FIRST. THAT HASN'T BEEN
> BELIEVED FOR A LONG TIME. SO WHY ON EARTH ARE YOU LOOKING FOR THE ORIGIN OF
> INFORMATION AMONG THE PROTEINS? IS THIS ANOTHER CASE OF YOU BEING WAY BEHIND
> THE TIMES ON THE TOPIC YOU WANT TO DISCUSS?
> Consider this from 1991:
> Sydney Fox's Experiment
> "By repeatedly heating amino acids and dissolving them in water,
> he induced them to coagulate into tiny spheres composed of short
> protein strands.
> "Fox argued then - and continues to do so - that these
> 'proteinoids' represent the first cells, but his work has fallen
> out of favor among many scientists. Once proteinoids are formed,
> 'that's it,' says Gerald F. Joyce of the Research Institute of
> Scripps Clinic. "They can't reproduce or evolve."~John Horgan,
> "In the Beginning", Scientific American, February, 1991, p. 118-
Please don't link me with Sydney Fox. Of course, proteinoids are no
model for proteins, for many reasons, but primarily because of the
problem of sequence information (II), which cannot emerge without
reproduction and evolvability, as Joyce says. It's now 40 years ago that
I became aware of Fox's proteinoids and immediately started criticizing
them as completely useless for helping to explain the origin of life -
at a time when everybody was celebrating him. So you are wrong in
calling me "way behind the times".
Of course, I know the advantages of the RNA world hypothesis, as far as
ribozyme functionality and the potential elimination of the
protein-or-nucleic-acid- first chicken-and-egg problem are concerned.
But we still don't know of any feasible prebiotic emergence of RNA and
> This from 1959:
> "According to Dr. Pirie, the fact that all forms of life known
> today do use protein 'will have no more relevance (to primitive
> life being dependent on protein) for a discussion about the
> origins of life than the now almost universal use of paper has
> for the origin of writing or the use of matches for the original
> making of fire."~N. W. Pirie, "Chemical Diversity and Origins of
> Life,", The Origin of Life on Earth, (New York: MacMillan Co.,
> 1959), p. 78
In principle, the same caveat applies to the now-favored RNA world
> >No, Glenn, they are not at all similar. There are fundamental
> >differences between proteins and RNAs. Structure-function relationships
> >are completely different; and with proteins, you need the
> >genotype-phenotype code translation - to just mention two factors. You
> >questioned my concept of semantic biological information, but you refuse
> >to consider my definition of it. I don't see anything relevant to this
> >type of information in the RNA artificial selection work - although it
> >certainly is of interest in other respects. It's just not applicable to
> >what I said and you questioned.
> I agree that they are different and given that no one but you seems to think
> that proteins were the first biopolymers to evolve means that this entire
> discussion about proteins and the origin of information is nothing but a
> discussion about that which no one believes. It is a strawman set up to
> appear as if scientists actually believed that information first arose via
> proteins. They don't any more. Most believe in the RNA world, which is why
> I am discussing RNA rather than a 50-year out dated and rejected
> proteinaceous concept.
Apparently, it's YOU who are the strawman builder! And you can only do
it because you either haven't read what I wrote or because you ignored
it or because you forgot it. I hope the last is the case. You are
constantly upbraiding me for not having read the most recent papers you
think were relevant, but you are not even up-to-date on what those you
I never claimed proteins were the first to evolve. It's just that I am
almost as skeptical about current pet speculations about
self-organization as I was about those of 40 years ago. Some serious
thinking about the emergence of biological information (II) is sorely
needed - both with respect to the origin of life 3.9 billion years ago
and with respect to the origin of novel molecular functionalities ever
> >> >> >This only works because you first give me the book, which contains all
> >> >> >the relevant semantic information. With the signal, you just send me
> >> >> >ln(3) bits of information, not lots.
> >> >>
> >I don't dispute these calculations at all. But again and again, I have
> >emphasized that we have to distinguish between
> >(I) Maximum information carrying capacity;
> >(II) Functional information relevant for biological systems.
> >Shannon entropy is related to (I), not directly to (II). Meaning
> >(biological or otherwise) is found in (II) and is a function of a
> >functional system like a given language or biological system. (I), which
> >is a function of sequence length and alphabet size, specifies nothing
> >but a maximum amount of functional information (II) which can be stored
> >in a given sequence having a maximal capacity (I). Never have I claimed
> >a 1-to-1 correspondence between a "value" of (I) and a "value" of (II).
> ANd over and over, I keep asking you to recognize this II. Even if you can't
> quantify it, you seem to be unable to recognize it when offered a chance to
> tell me if a sequence has this II functional information. You never ever
> try to tell me which sequence has it. If you can't even recognize it, can't
> quantify it, do you really expect us to believe it is real?
I'll repeat again what I wrote x times already: A string of any symbols
has a computable Shannon information, and it has a computable maximum
information carrying capacity (I). But it is unknown, without any
further knowledge, whether it contains any semantic or functional
information (II). Unless you know the appropriate language, you can't
read it. You may, by statistical analysis, find that, whith a certain
probability, it does contain some information. And if the probability is
high enough and the text is long enough, you may even be able to learn
the appropriate language (including its grammer, syntax,...), e.g.
Sumerian. And proper understanding requires knowledge of the appropriate
culture and situation. With the human genome, we are now in this stage
> Lets give you another chance.
> ken quine monie hiv a wyme
> wyme a monie quine hiv ken
> ken wyme a monie quine hiv
> a ken monie quine hiv wyme
> ken quine a hiv wyme monie
> hiv a quine wyme ken monie
> quine monie a ken wyme hiv
> Which sequence in Doric has functional information. Can you recognize it
> when you see it. We can't quantify modern art, but we can at least
> recognize it. So, if you want me to believe that functional information is
> a real concept then pass 5 tests like tis with only 1 failure. Which
> sequence has meaning?
It is possible to recognize that this may be a meaningful language. It
is even possible to guess at an indogermanic language, and to guess at
the possible meaning of some words (although this is risky). But 6 words
are definitely insufficient to deduce an unknown syntax, which is a
requirement for selecting the legal word placement you ask me for. Try
the same with Latin! There, you may find many possible word sequences in
a sentence to be legal, particularly with poetry.
But anyway, this game is a non-starter. You are still in the strawman
mode. You are presupposing something I never said. Information (II) is
meaningful with respect to its natural environment only. Reading a
string is insufficient.
> >You may compute the Shannon entropy of a given DNA sequence (4-letter
> >alphabet) or a given protein (20-letter alphabet). You'll get different
> >values, even for a length ratio of 3:1.
> So what? Ratio has no place in the definition of shannon's entropy. What
> you say shows you don't know much about Shannon entropy.
I said "length ratio", not "Shannon entropy ratio". And I wasn't giving
you a definition of Shannon entropy. Please be more careful in how you
> >Yockey also shows the connection to meaningful biological information:
> >"Let us consider evolution as a communication system from past to
> >present. At some time in the history of life the first cytochrome c
> >appeared. As a result of drift, random walk and natural selection, this
> >ancestor genetic message was communicated along the dendrites of a
> >fractal ... representing a phylogenetic tree ... Some dendrites lead to
> >modern organisms, the sequence having changed with time. Thus the
> >original genetic message of the common ancestor specifying cytochrome c,
> >regarded as an input, has many outcomes that nevertheless carry the same
> >specificity. The evolutionary processes can be considered as random
> >events along an ergodic Markov chain ... that have introduced
> >uncertainty in the original genetic message. This uncertainty is
> >measured by the conditional entropy in the same manner as the
> >uncertainty of random genetic noise is measured ... Since the
> >specificity of the modern cytochrome c is preserved, although many
> >substitutions have been accepted, this conditional entropy may be
> >subtracted from the source entropy ..., to obtain the mutual entropy or
> >information content needed to specify at least one cytochrome c sequence
> >... The information content of the sequence that determines at least one
> >cytochrome c molecule is the sum of the information content of each
> >site. The total information content is a measure of the complexity of
> >cytochrome c" (p.132). For the mathematical formulation, please refer to
> >Yockey's book.
> I don't have a problem with Yockey's point. All he is pointing out is that
> one must account for degeneracy in functionality, i.e. that many sequences
> will perform the same function, which is what I keep telling you.
And which is what I keep telling you. In particular, Yockey is dealing
with a family of orthologous sequences, NOT with independently evolved
protein folds or "synonymous families". You keep ignoring such vital
distinctions. See earlier in this post. Here, I was quoting Yockey to
show you that biological information (II) certainly IS related to
information capacity (I), which you denied.
> >> >Homonyms may be difficult to find in biology! They occasionally occur in
> >> >our languages, even within the same language.
> >> See Szostak and Ellington above and Joyce. They are finding homonyms in
> >> biology but you don't seem to want to discuss them.
> >This is in vitro RNA chemistry using some biochemical molecules. It may
> >not have much to do with biology. The RNA world is completely
> >hypothetical, and we have no idea how it might have emerged. Presumed
> >natural evolutionary processes in it are completely different from known
> >evolutionary processes in living organisms.
> And no one believes, like you seem to, that proteins were the first
> biopolymers. They weren't, and you are arguing for a 50-year-old rejected
> idea. At least stay with the program and the current thinking on the topic.
> Most researchers beleive that life evolved through the RNA.
You are constantly misrepresenting what I wrote. See above. I'll just
add some short remarks about the homonyms, which I skipped last time.
A homonymous protein would be a given protein sequence which shows
completely different functionality in two different contexts. One
example comes to mind: an enzyme which is alternatively used as an
optically transparent substance in the eye, but I don't remember if it
has exactly the same sequence or just a related one. This, however, is
something of a special case, because in the eye it doesn't function as
an enzyme, but just by means of its physical properties in a
concentrated solution. Another example might be prions which are toxic
versions of normal proteins folded in a different way. But again, this
is special, in this case because it's not physiological.
I don't know of any example of a homonymous RNA.
Something which is quite different from homonymy is multifunctionality,
in which _different_ regions of a molecule display different
functionalities. They can perform both functions at the same time if
this is called for. The analogous situation does not exist in language.
A pun is different.
But all this has nothing to do with synonymous protein families or other
systems which might help us to get at estimates of functional
> >Of course, today one cannot predict biological function (if any) from a
> >sequence alone. I never claimed this.
> You said Yockey did it above.
Again you are mixing up orthologous sequences and independently evolved
synonymous families, which have nothing to do with each other.
> >However, as researchers are
> >getting better at understanding the biological systems which can "read"
> >and express such sequences in the appropriate functional environment, a
> >measure of meaningful prediction will emerge. This is what the new field
> >of proteomics is all about. This confirms the relationship between
> >information (I) and information (II).
> As Shannon pointed out, and I have repeated many times in this exchange,
> there is NO relationship between Shannon entropy and your information II.
> For some reason you don't seem to be able to understand what SHannon
> actually wrote. I repeat it again.
It seems to me that "you don't seem to be able to understand" what
Yockey actually wrote. And that you don't really listen to what I say,
but rather assume something you erroneously think I might have said.
> "The fundamental problem of communication is that of reproducing
> at one point either exactly or approximately a message selected
> at another point. Frequently the messages have _meaning-, that
> is they refer to or are correlated according to some system with
> certain physical or conceptual entities. These semantic aspects
> of communication are irrelevant to the engineering problem." C.
> E. Shannon, " A Mathematical theory of Communication" The Bell
> System Technical Journal, 27(1948):3:379-423, p. 379
> What part of the term 'irrelevant' do you not understand?
I understand that you completely misapply this quotation from Shannon to
our discussion. We may perhaps apply it in a _partially_ meaningful way
by saying that there is a communication channel from DNA to protein, and
that the semantic aspects of the biological information transmitted are
irrelevant to the engineering problem of transcription and translation.
That is, the irrelevance applies only apart from the fact that the
transcription and translation machineries themselves are also specified
by the semantic aspects of the biological information. Thus, unlike
information technology, the biological system is self-referential.
But our discussion, that is, what I proposed in the beginning and you
criticized, was not at all about this problem, but about the question of
estimating amounts of biological functional information, using genome
sequence space as a ruler.
-- Dr. Peter Ruest, CH-3148 Lanzenhaeusern, Switzerland <firstname.lastname@example.org> - Biochemistry - Creation and evolution "..the work which God created to evolve it" (Genesis 2:3)
This archive was generated by hypermail 2b29 : Tue May 21 2002 - 13:27:23 EDT