# Random origin of biological information

From: pruest@pop.dplanet.ch
Date: Fri Sep 22 2000 - 07:51:34 EDT

• Next message: George Andrews Jr.: "Re: Geocentricity"

Glenn Morton wrote (in part):

>Date: Wed, 20 Sep 2000 14:06:26 -0500 (CDT)
>From: mortongr@flash.net
>Subject: Random chance brings meaning
>...
>We are going to test these ideas, that random sequences can't create
>information. And if genes are like words and sentences as Kenyon and Davis
>claim, then I will show that random sequences CAN create information.

Glenn presented the Vignere code, an encryption-decryption method, to
demonstrate that random processes "CAN create information". He shows
that a
21-letter message encoded by a random 21-letter key can be "decoded" by
9
other random 21-letter keys to yield 9 different meaningful messages.

In fact, it is quite easy to obtain such solutions: select a meaningful
21-
letter phrase, take its first (second,...) letter, locate it in the
first
row of the Vignere code table, go down this column to the first
(second,...) letter of the coded message and look up the first letter in
this row: this is the first (second,...) letter of the new "random" key.
Repeat this for the 21 letters.

For instance, take the message 'RandomOriginOfEnzymes': the procedure
yields the key 'yeslsxvafodpqduiwwqtt'. Apply this "random" key to
decipher
Glenn's original coded message 'pefogjjrnulceiyvvucxl' and you'll obtain
'RandomOriginOfEnzymes'!

But of course, that's cheated, because we worked backwards!

There are 26^21, or about 5.2 x 10^29 (that's 520,000 trillion
trillion),
different 21-letter strings of 26 possible letters. How many meaningful
phrases of 21 letters might there be? 1000? a million? a trillion? I
don't
know. I haven't written a computer program to try to get an estimate.
The
"natural selection" routine required for this program must be quite
involved, including a parser, a dictionary, some expert system
algorithms,
as well as a user-friendly interface for a human to evaluate the
tentative
solutions proposed by the program. But maybe Glenn, who certainly did
not
cheat, can provide us with such an estimate. What's your hitting
average,
Glenn?

Manfred Eigen, Nobelist and inventor of the hypercycles, also cheated by
working backwards. In popular lectures about the origin of life, he used
to
present a computer simulation purporting to show that information can
indeed emerge quite rapidly by means of random "evolutionary" processes.
He
generated a random sequence of letters, which he mutated randomly. Each
time a letter happened to equal the corresponding letter of a meaningful
phrase previously deposited, it was and remained fixed. Of course, the
process produced the "information" supplied after not too many
generations!

But let's look more closely at what really happens in evolution! Hubert
P.
Yockey ("A calculation of the probability of spontaneous biogenesis by
information theory", J.theoret.Biol. 67 (1977), 377) compared the then
known sequences of the small enzyme cytochrome c from different
organisms.
He found that 27 of the 101 amino acid positions were completely
invariant,
2 different amino acids occurred at 14 positions, 3 at 21, etc., more
than
10 nowhere. Optimistically assuming that the 101 positions are mutually
independent and that chemically similar amino acids can replace each
other
at the variable positions without harming the enzymatic activity, he
calculated that 4 x 10^61 different sequences of 101 amino acids might
have
cytochrome c activity. But this implies that the probability of
spontaneous
emergence of any one of them is only 2 x 10^(-65), which is way too low
to
be considered reasonable (it is unlikely that these numbers would change
appreciably by including all sequences known today). A similar situation
applies to other enzymes, such as ribonucleases.

Thus, a modern enzyme activity is extremely unlikely to be found by a
random-walk mutational process. But "primitive" enzymes, near the origin
of
life, may be expected to have much less activity and to be much less
sensitive to variation. Unfortunately, before someone synthesizes a set
of
"primitive" cytochromes c, we have no way of knowing the effects of
these
factors.

What we can do, however, is to estimate how many invariant sites can be
expected to be correctly occupied by means of a random walk before a new
enzyme activity becomes selectable by darwinian evolution (of course,
such
an invariant set may be distributed among more sites which are
correspondingly more variable, without affecting the conclusions). So,
let's start with some extremely optimistic assumptions (cf. P. Rüst,
"How
has life and it's diversity been produced?" PSCF 44 (1992), 80):

Let's assume that all of the Earth's biomass consists of the most
efficient
biosynthesis "machines" known, bacteria, and all of them continually
churn
out test sequences for a new enzyme function, which doesn't exist yet in
any organism. They start with random sequences or sequences having a
different function. Natural selection starts only after a minimal
enzymatic
activity of the type wanted is discernable. In today's biosphere, t =
10^16
moles of carbon are turned over yearly, there are n = 10^14 bacteria per
mole of carbon, a bacterium is taken to have b = 4.7 x 10^6 base pairs
in
its DNA. This yields R = tnb = 4.7 x 10^36 nucleotide replications per
year
on Earth.

In protein biosynthesis, there are c = 61/20 = 3.05 codons per amino
acid,
a = 2.16 mutations per amino acid replacement (geometric average of all
possible shortest mutational walks in the modern code table), a mutation
rate of 1 mutation in m = 10^8 nucleotides replicated. Therefore, r =
1/(c(3/m)^a) = 5.8 x 10^15 nucleotide replications are required for 1
specific amino acid replacement (the factor 3 represents the codon
length
in the triplet code).

In order to get s specific amino acid replacements, r^s nucleotide
replacements are needed, and the average waiting period for 1 hit
anywhere
on Earth is W = (r^s)/R. For s = 1, W = 4 x 10^(-14) seconds; for s = 2,
W
= 4 minutes; for s = 3, W = 40 billion years!

Thus the minimal set for a starting enzymatic activity cannot contain
more
than 2 specific amino acid occupations! Of course, for the origin of
life,
biosynthesis "machines" like bacteria were not yet available, and
certainly
not in an amount equalling today's biomass! Does it still sound
reasonable
to assume that biological information is easily generated by random
processes? Or is there something wrong with the model underlying the
above
estimate?

If God used only random processes and natural selection when He created
life 3.8 billion years ago, we should be able to successfully simulate
it
in a computer. You may even cheat: the genome sequences of various non-
parasitic bacteria and archaea are available. The challenge stands. By
grace alone we proceed, to quote Wayne.

Peter Rüst

This archive was generated by hypermail 2b29 : Fri Sep 22 2000 - 07:49:23 EDT