> Hi Peter,
> On Sun Sep 24 02:20:20 2000, email@example.com wrote:
> > Glenn:
> > You are right, there IS randomness in all these 21-letter sequences, no
> > matter whether they were generated by encrypting a meaningful phrase or
> > by running a random number generator, and ANY meaningful 21-letter
> > message can be generated from ANY of the 26^21 possible sequences if the
> > right key is found.
> > But this fact does NOT imply that meaning or semantics can arise
> > spontaneously by random processes, without some intelligent input of
> > information. Either this happens when the sender encrypts his message
> > and gives the key to the designated receiver, or when an eavesdropper
> > searches for meaning, using very much intelligence and effort in the
> > process.
> > Do such encrypted messages really tell us anything about the process of
> > evolution? There, we have a random number generator alright, and we have
> > natural selection. But for finding meaning, natural selection isn't as
> > patient and powerful as an intelligent cryptographer with his computer.
> Once again, you are ignoring the fact that when experimenters make random
> strings of RNA and then search for novel functionality, they find strings to
> perform the task with a frequency of 10^-14 or so. While they are not all
> perfectly efficient they do their task. When it comes to the comparision with
> language, I once calculated that there are over 330,000 ways to convey the
> concept that if you pick your nose you will get warts. I ceased counting
> because I got tired, not because I ran out of ideas. All of these were with
> sequences of 28 letters or less. If you add mispellings, which don't destroy
> meaning (a technique often used in cryptography to foil frequency analysis) I
> could add a thousand ways to mispell each sequence yet still retain its
> meaning. Such mis-spellings would look like: waarts ar spred bi playcing thi
> fingur in thi noz or wurtz arre sbred by plaising da feenger en a nos. THe
> meaning is still there so the sequence performs its function. Thus there are at
> least 330 million sequences for just this concept.
> For the sake of argument, let us suppose that there are 300,000 different ways
> to express the same concept in 21 letters or less. And lets assume that each
> can be misspelled without loss of meaning in 1000 different ways (which may be
> a vast underestimate. And then assume that there are a trillion different
> concepts which have the same traits as what we see. (The human language is so
> flexible that a trillion concepts is not impossible at all.) Then we have 10^21
> different sequences which will perform a useful function. How does that
> compare to the number of possible sequences? with 21 letters there are 26^21
> hwihc is 10^29 so we estimate that useful sequences are found in the range of
> 10^-8 or one in 10 million. Is that too low a rate for random processes to
> stumble upon a meaningful sequence? No. At one per second (and my computer
> can do it quicker than this, we should find a meaningful sentence on average
> every 3.2 years. That hardly seems out of the realm of possibility. And it
> certainly is not a rate that would deter evolution over millions of years.
You keep misunderstanding what I argued. There are (at least) five
different types of search processes that have surfaced in our
(a) search for a meaningful letter sequence among random ones,
(b) artificial selection of a functional ribozyme from a collection of
random RNA sequences,
(c) evolution of a functional ribozyme in RNA world organisms,
(d) evolution of a protein by mutation of the DNA and natural selection
of the protein,
(e) a random DNA mutational walk finding a minimally active protein.
I fully agree with you that both (a) and (b) are relatively easy, and
certainly successfully doable (although you may be overestimating the
fraction of letter sequences representing a recognizable meaning - but I
don't know). These are the only two types you have been dealing with up
to now. As we don't know anything about the feasibility of an RNA
world, it is too uncertain to speculate about the chances for success of
(c). But suppose there was a viable RNA world, I assume (c) might not
have been much more difficult than (b) - apart from needing more time.
So we may also agree on (c). With (d), there is an additional layer of
complexity between the mutable genotype (DNA) and the selectable
phenotype (protein), namely translation using a triplet code and a 64:21
code table. So, numerical estimates derived from (a) or (b) cannot be
applied immediately. In (a) and (b) each individual string or molecule
has to be considered an "organism", while in (d), an organism is very
much more complex, and consequently, there usually are very much fewer
of them in a population capable of exchanging information. But we know
from experiments that the process, microevolution, works. As expected,
it is much slower than (b), and its progress usually levels off quite
rapidly, because the starting enzymes we can work with are already
pretty well optimized for their job. So, I don't hesitate to concede
that (d) also is workable and has been going on for the past 3.8 billion
Where we part company, for the moment, is with case (e), which you have
never considered in our discussion, although my argument focussed on
this case alone, from the beginning, with the calculated model of the
probability of a random walk leading to a minimal enzyme activity within
the geologically available time. What's so different about case (e)? As
the activity wanted does not yet exist, not even to a minimal degree,
there is nothing to select, and natural selection of intermediates in
the mutational random walk just is not possible - by definition. Both in
(a) and (b), and presumably in (c), some activity or meaning is present
in the sample collection from the beginning, or can be generated
relatively easily by mutagenization. In (d), it is present by
definition, because (e) is its precursor.
A question which remains, of course, is the amount of semantic
information at the transition point between (e) and (d). If this is just
a few bits, my problem doesn't exist. What we can do is to try to define
an upper and a lower limit for this transition point. Presumably, the
two limits are very far from each other, but this is the best we can do
for the moment. For the upper limit we may look at the amount of
semantic information required for a modern (i.e. a known) enzyme. This
is what Yockey did. To find a lower limit, we may estimate how much
semantic (specified) information can be generated in a random walk and
how much time this would take. And that's exactly what I tried to
present for discussion in my first post. But you dismissed my
(tentative) conclusion out of hand, without discussing it, by referring
to cases (a) and (b), which cannot be compared with it at all.
Here you snipped out what explained the sentence following it, referring
to a combination of processes (e) and (d), as well as any amount of
horizontal gene transfer and exon shuffling you like:
" In the evolutionary process, the only possible natural source of
information is the environment. But the extraction of this information
is extremely slow, probably only a fraction of a bit per generation -
when any useful mutants are available at all. And if they are, they must
penetrate the entire population before being fixed. For small selective
advantages and large populations, the mutation still risks being lost by
> > If we compare this process with the huge amount of information in
> > today's biosphere, I'm pretty sure 4 billion years is by far too little
> > time.
> Do you have a calculation or is this merely an emotional feeling? Upon what do
> you base your estimate of the total information on earth today? I would suggest
> the following. We know that microbes vastly outnumber us and indeed modern
> research is showing that the vast majority of living matter on earth may
> actually be contained in the rocks below our feet. Let us assume that there
> have been 10 million species on earth and we will give them each a 3 billion
> long nucleotide genome (a bit generous). Yockey, (Molecular Evolution and
> INformation Theory, p. 377-380) points out that there are a maximum of 6 bits
> of information per codon. Thus, we have 20 billion bits of information max in
> the genome of an species and thus there are 2 x 10^17 bits of information in
> the biosphere today. I have seen suggestions that there might have been as many
> as a billion different species over geologic time, so multiply the above by
> 100. I will assume (but justify below) that the small addition of bits from the
> individuals of a species is too small to worry about (see below) Is there time
> to generate that info? Of course there is. There is more than enough time. To
> show it I need to take a diversion into info theory.
> Consider the sequence
> That represents a max of 24 bits as we discussed above from Yockey. If we
> allow polyploidy to occur, and we copy this and attach it to itself, we have
> the sequence
> Which now represents an increase of one bit of information. Why one bit?
> becuase the sequence is compressible. It is ordered. Copying itself doesn't
> add to the informational content. Only when you mutate it do you add
> information to the system. (REMEMBER: Information is not that ill-defined word
> we use in English and equivocate to the english word 'meaning'. Information is
> defined by a mathematical equation and has nothing to do with 'meaning' or
> specificity.) Mutations add information to the system because they make the
> sequence LESS compressible.
All this is just Shannon information. For a string of length L and 4
nucleotides, the maximum amount of information corresponds to 4^L
possibilities. This may be called information potential. But none of
this tell us anything about usable or semantic information or meaning in
the sense of specification of biological function. Mutations add nothing
to the semantic information until you test them by the environment.
> Now, because of this fact about copying adding only 1 bit, you get 1 bit of
> information for every clone on earth--plus 20 billion for the first species.
> This is why the additional one bit of information from each individual organism
> isn't enough to worry about.
> So, if the earth has 10^19 bits of information how rapidly does that have to
> develop? 100 bit per second as 10^19 is 100 times the number of seconds in 4.5
> billion years. This is not a rapid rate.
Your calculation omits some very crucial details about how an organism
functions and how the biosphere communicates. Before you apply natural
selection, you have no semantic or functional information whatever. Your
string of a huge amount of Shannon information (which equals amount of
randomness or entropy) is nothing but raw material for selection, bit by
bit. First you need a functioning organism coded by the string (how do
you get that?), then you can start testing each of the other bits
against the environment in which this organism lives - a rather slow
process. Furthermore, it's no use having all these bits randomly
distributed in 10 million bags (species), or even further spread out
among the individuals of a species. Biology only works if the right
information is in the right place at the right time. Each individual
must have all the information it requires. That will slow down the
process tremendously. For each bit of information, you must consider
that it can be input into the biosphere almost anywhere on earth. One
bit improves cytochrome c in a fish on an Australian shelf, the next one
improves a kinase in a worm in Canadian soil, the next one improves an
ATPase in a heterotrophic bacterium 1 km below the surface in a Siberian
rock, etc. This may help if each of the functionalities needed is
already in place in each organism and is just made a little bit better.
To make use of the improvements, the other organisms of the same species
would have to trade their genes among themselves, which is not a matter
of seconds, nor even of a few years. And if other species should profit,
the trade between species or even higher taxa is much slower. But, most
importantly, how about the origin of new functionalities by process (e)?
This last factor might easily transcend any estimate for process (d) by
a transastronomical magnitude.
> It is estimated that about 1000 different protein folds exist in
> > living organisms, comprising about 5000 different protein families (Wolf
> > Y.I., Grishin N.V., Koonin E.V. "Estimating the number of protein folds
> > and families from complete genome data", J.Molec.Biol. 299 (2000),
> > 897-905). When we compare the prebiotic Earth with today's biosphere as
> > a whole, each of these folds, families and individual proteins with
> > their functions had to arise at least once somewhere. There is NO
> > evidence that all or most of them could be derived from one or a few
> > initial sequences through step-by-step mutation, each of the
> > intermediates being positively selected, and this within a few billion
> > years.
> If you are going to say that protein folding is too complex to have just
> happened, I would suggest that you take a look at the following:
No, you misunderstood. You may want to read the Wolf et al. paper. Their
1000 protein folds don't concern the problem of folding specific
proteins into their native configurations. Different proteins whose
sequences are somewhat similar and which have somewhat similar functions
are grouped into protein families and these into less similar
superfamilies. Different superfamilies which, despite unrecognizable
sequence similarity fold into the (almost) same 3-dimensional structure
(or "fold") belong to the same "fold". And of these folds, there are an
estimated 1000. How each individual sequence folds into its own specific
native conformation when exiting from the ribosome is an entirely
different question. So I'll just snip out your comments on this.
> > In my post, I was discussing the evolution of functional proteins in a
> > DNA-RNA-protein world, not evolution in an RNA world. I never talked
> > about ribozymes (I did mention ribonucleases, but these are protein
> > enzymes). I know about the in vitro selection of functional ribozymes,
> > but I do not consider these as valid models of evolution at all. They
> > just are techniques for finding active ribozymes among as many sequences
> > as possible.
> It is always a bit amazing to me how no experiment is every considered to be
> good evidence of evolution by those who don't like evolution. Why do you think
> that is? The claim that useful variants of long biopolymers are too rare to be
> found is one that is claimed over and over and over again by the anti-
> evolutionary crowd, yet when one points them to an example where usefulness is
> found at a relatively high level of probability, the claim is made that it
> isn't evidence at all. It most assuredly is evidence that the rates of useful
> biopolymers has been vastly underestimated by the anti-evolutionary crowd if
> nothing else.
These objections should be answered by what I wrote above. And if you
think I'm one of those (despised? ;-)) anti-evolutionists, you may read
what I published with Armin Held in PSCF 51 (Dec. 1999), 231. Mainly for
theological reasons, I do believe that God used (and uses) evolution as
(one of) his tool(s) of creating and maintaining the biosphere. But that
doesn't oblige me to uncritically swallow every belief of the
"evolutionary crowd". Are questions about unsolved problems forbidden?
> But if you want to talk about proteins, as you indicated above consider this:
> "Examination of over 30 residues in the N-terminal domain of [lambda]
> repressor reveals that a surprisingly large number of positions are quite low
> in informational content. Nearly half of the positions examined in helix 1 and
> helix 5 will accept nine or more different residues, and only a few positions
> are absolutely conserved. THis suggests that there is a high level of
> degeneracy in the folding process; that is, there are many possible seqeunces
> that will specify a protein that resembles the N-terminal domain of [lambda]
> repressor. Moreover, if the criterion for neutral mutations were changed from
> the present requirement of 5-10% activity compared to wild type, to the less
> stringent requirement that the protein simply be folded, the level of
> degeneracy would presumably be even higher." p. 315
> "Extrapolating to the rest of the protein indicates that there should be about
> 10^57 different allowed sequences for the entire 92-residue domain.
This fits in very nicely with Yockey's cytochrome c estimate. Now, using
his "effective number of amino acids" 17.621, we get 17.621^92 = 4.3 x
10^114 possible sequences, and the probability of finding any one of the
10^57 [lambda] repressor sequences is 0.23 x 10^(-57), rather low!
> this is an extraordinarily rough calculation, and we do not intend to suggest
> that we can accurately determine how many sequences would actually adopt a
> structure resempling the N-terminal domain of [lambda] repressor. However, the
> calculation does indicate in a qualitative way the tremendous degeneracy in the
> information that specifies a particular protein fold."~John F. Reidhaar-Olson
> and Robert T. Sauer, "Functionally Acceptable Substitutions in Two [alpha]-
> helical Regions of [lambda] Repressor," Proteins: Structure, Function, and
> Genetics, 7:315, 1990 p. 315
> In other words, there are lots and lots of proteins which will perform the
> function they studied also. Why is this never really raised and discussed by
> the anti-evolutionists?
At least for the last 20 years, this has been taken into consideration
by critics of evolution (e.g. in my papers at the 1988 Tacoma, WA,
conference about Sources of Information Content in DNA, and in PSCF 44
(June 1992), 80). But nevertheless, even with this caveat, asking
questions about the feasibility of evolution is not accepted in the
established big journals (in the early 80's, I tried J. of theoretical
Biology, Nature, Origins of Life, Philosophy of Science, and a German
journal, all in vain). It is not politically correct to question the
possibility of evolution. The editors' justifications of refusal were
quite evasive. As you see, even the huge numbers of possibly active
sequences are by far not sufficiently huge.
The authors continue
> "A method of targeted random mutagenesis has been used to investigate the
> informational content of 25 residue positions in two [alpha]-helical regions of
> the N-terminal domain of [lambda] repressor. Examination of the functionally
> allowed sequences indicates that there is a wide range in tolerance to amino
> acid substituion at these positions. At positions that are buried in the
> structure, there are severe limitations on the number and type of residues
> allowed. At most surface positions, many different residues and residue types
> are tolerated. However, at several surface positions there is a strong
> preference for hydrophilic amino acids, and at one surface position proline is
> absolutely conserved. The results reveal that high level of degeneracy in the
> information that specifies a particular protein fold."~John F. Reidhaar-Olson
> and Robert T. Sauer, "Functionally Acceptable Substitutions in Two [alpha]-
> helical Regions of [lambda] Repressor," Proteins: Structure, Function, and
> Genetics, 7:315, 1990. p. 306
These artificial mutations were targeted intelligently to specific small
sequence regions to be tested, which makes it practical to recover
biologically active mutants. Thus, this is not an experimental
simulation of darwinian evolution. If you want to use these results for
probability estimates, you have to factor this in.
> Degeneracy equals lots and lots of different proteins to perform the same task.
> And before you say that there is an invariant region that must be as it is in
> order to assure protein function, have you ruled out that other sequences in
> other protein folded structures can't perform the same thing?
The sequences of the same fold are already taken into consideration in
the 10^57 sequences. Whether there are sequences of different folds with
the same activity is not known. If I remember correctly, cases of
different folds having the same activity are extremely rare, if they
exist at all.
> Of course, mutagenizing steps generate new diversity, but
> > the selection procedures most certainly are NOT natural.
> Of course they aren't natural as we have had to speed up the process, or are
> you advocating getting one's Ph.D when one is 2 million years old? To study
> things at the rate they naturally occur would require that long in order to do
> the research. This seems to be a silly suggestion that means that we don't
> have to draw any conclusions until we are 2 million years old. And surprise, we
> won't be able to live that long so we can always claim that we aren't seeing
This objection is already answered above, case (e) against case (b).
> What we can
> > learn from some of these experiments is the frequency of a given
> > ribozyme activity among the pool of RNA sequences supplied (which
> > usually is just a very tiny sample of all possible sequences, and of
> > unknown bias).
> Not unknown bias. The ribozymes were made randomly. Randomly means no bias. If
> you have a charge of bias in their experimental procedure, then be specific and
> to the point. Vague charges of bias (more in hope than in evidence) to avoid
> the conclusions required by the data is a poor way of avoiding the issue.
What I meant with "unknown bias" is this: the starting pool of RNAs was
certainly about random (within the limits of biochemical precision), but
this was only a minute fraction of all possible sequences. Whatever is
contained therein has a greater chance of being selected than sequences
not in the starting pool, which just might, but need not, be formed by
later mutagenesis. And Lorsch & Szostak (Nature 371 (1994), 31), for
instance, indicate that their starting pool already contained the ATP
binding site required, "which greatly increased the odds of finding
catalytically active sequences". Furthermore, they suggest it would be
better to mix, match and modify small functional domains.
> > Further problems of the ribozyme work are: (1) Usually artificial
> > "evolution" tapers off at activities several orders of magnitude lower
> > than natural ribozymes (not to speak of protein enzymes) (cf. Bartel &
> > Szostak, Science 261, 1411). (2) We don't yet know whether there ever
> > was an RNA world. (3) We don't know whether it would be viable at all.
> > (4) We don't know how it could have arisen by natural processes. Leslie
> > E. Orgel, one of the pioneers in this field, wrote (Trends Bioch.Sci. 23
> > (1998), 491):
> All arguments from ignorance and all arguments that we will never know
> therefore we can beleive what we want. Is there anything positive that you can
> offer from your point of view about what data we should observe in some future
> experiment that would prove that evolution is incompatible with the evidence.
> By this, I don't mean the other guy's failure. I want to see if you have
> anything you can predict that if found would be amazing and support your view
> that randomness plays no role in living systems.
The don't-knows are Orgel's! (you clipped out his very relevant comments
I quoted.)You don't want to claim he hasn't done anything worth while,
during several decades of work, to solve these questions, do you? It's
not just one "guy's failure", but the failure of a whole field of
research, in ALL research groups having had a try at it. Orgel is one of
the leaders in the field.
> I am asking that you cease doing what all antievolutionists do, which is stone
> throwing, and actually propose a workable system that can be verified. Can you
> do this?
Now, Glenn, if you call my arguments "stone throwing", what are YOU
doing with your harsh criticism of those who try to build bridges across
the awful gulf presently existing between Christians (in the biblical
sense) on the theistic evolution side and Christians (in the biblical
sense) on the YEC or ID side in this area of trying to understand
Creation? Our testimony to the world appears to be severely compromised!
Although I agree with most of your criticism of the YEC position and
some of that of the ID folk, there is certainly some value in their
sincere commitment to biblical doctrins like divine inspiration of the
Bible, and the idea of the ID "wedge" fighting the nihilistic
degeneration of our society is worth of approval, although some of the
intellectual tools used may be questionable. And aren't some TE all too
eager to accept without questioning anything "anti-fundamentalists" are
saying, hardly stopping short of ideas like those of Dawkins,
E.O.Wilson, Gould, Teilhard de Chardin, process theology, deism, liberal
theology, destructive Bible criticism, syncretism etc.? It's not just
science against "anti-science", it's also about what we believe
concerning God, revelation, the Bible, Creation and God's "maintenance"
of everything. It's all too easy to call "myths" and "errors of the
ancient near eastern culture" anything in the Bible we don't understand.
> > Against this background, I think it is moot, at present, to speculate
> > about the probabilities of evolutionary steps in an RNA world. We DO
> > know, on the other hand, how the microevolutionary mechanisms work in
> > our world. This is why I chose to deal with this only, rather than with
> > ribozymes.
> If you will go back and look at what I said, rather than what you thought I
> said, I never applied the ribozyme data to the RNA world. In fact, in this
> entire thread that last sentence is the first time I have used the term RNA
> world. What I have said all along is that useful sequences are found at a far
> higher probabbility than anti-evolutionists have ever admitted. Is that so
> hard to understand?
All this has been dealt with above. I wrote about the RNA world because
I thought you considered the artificial ribozyme selections to be valid
models of evolution in an RNA world, as I think Szostak and others do.
This was before you told me you weren't thinking of the RNA world.
> > You are right in pointing out that Yockey revised his probability
> > estimate for cytochrome c (now iso-1-cytochrome c) in his book
> > "Information theory and molecular biology" (Cambridge: Cambridge
> > Univ.Press, 1992). On p.254, he gives the probability of accidentally
> > finding any one of the presumably active iso-1-cytochromes c as 2 x
> > 10^(-44), which is 21 orders of magnitude better than his 1977 estimate
> > for cytochrome c.
> The reason I hit you so hard is that I know that you are in the area of biology
> and write as an apologist. I have grown very tired of apologists who insist on
> using 20, 30 and 40 year old data as if it is dogma and can't be change. It
> shows that we are doing sloppy apologetics by not keeping up in the areas about
> which we write. If you and I were 30 years behind our respective fields of
> employment, I can guarentee you that we would both be unemployed. At least I
> know I would be in the oil industry. If we keep up with our fields for the
> sake of our employment, why don't we keep up when we are working for the Lord???
You are being unfair. Check what I have written!
> > One problem which remains is his assumption that there are no
> > interdependencies between the different amino acid occupations within
> > the sequence. On p.141, he even cites one observed case where the
> > equivalence prediction of his procedure fails. We don't know how many
> > more there are. Such interdependencies would reduce the overall
> > probability massively.
> > Furthermore, Yockey deals with modern cytochromes c (and some artificial
> > derivatives) only, which are the result of a few billion years of
> > optimization. A "primitive" enzyme may be more easily accessible. The
> > only reason I quoted him was that we have NO information about ANY
> > "primitive" enzyme.
> Actually that isn't quite true. We find bits and pieces of enzymes in oil. We
> know certain proteins that appear in oil when sponges evolved, others appear
> when diatoms evolved, others when angiosperms evolved, and still others appear
> in oils generated only after grasses appear. We are not totally blind about
> past proteins.
What you find in oil are bits not of primitive, but of highly functional
enzymes. The first fossil bacteria found date back 3.5 billion years,
and they look like cyanobacteria which are highly complex biochemically.
Life had to evolve from a prebiotic world before that, and awfully fast,
starting after the big meteorite bombardment.
> > By the way, I would still be very interested to hear any comments about
> > the model I calculated, from you, Glenn, or anyone else!
> I thought http://www.calvin.edu/archive/asa/200009/0125.html did a good job so
> I didn't see any reason to respond redundantly.
> > In both of the cases you quote, an initial catalytic activity of the
> > type selected for was present initially (gamma-thiophosphate transfer in
> > Lorsch J.R., Szostak J.W., Nature 371 (1994), 31, and
> > oligoribonucleotide linkage in Bartel D.P., Szostak J.W., Science 261
> > (1993), 1411), and the same applies, as far as I know, to all other in
> > vitro ribozyme selection experiments done to date.
> It is present because it is found in the vat not because it was introduced by
> the experimenter.
I didn't say it was. But it made for a faster success.
> > Thus, on both counts, random-path mutagenization to generate a
> > previously non-existing activity and natural vs. intelligent selection,
> > in vitro ribozyme selection experiments are NOT valid models of the
> > crucial steps in darwinian evolution, and the artificial ribozyme
> > figures of 10^(-16) or 10^(-13) are irrelevant.
> I think you have misunderstood what the experimenters are doing. They are not
> introducing the solution to the vat.
No. See above!
This archive was generated by hypermail 2b29 : Wed Sep 27 2000 - 15:02:22 EDT