Re: Information: Brad's reply (was Information: a very

Glenn Morton (
Mon, 29 Jun 1998 12:08:40 -0500

On Mon 29 Jun 1998 15:17:15 Brad Jones writes:

I wrote:

> >>>I believe that you are mixing the way memory works in such systems. Zero
> >>>memory usually applies to a Markov chain which doesn't use the previous
> >>>character to determine the next.


Brad replied,
> >
> >>A Markov source is defined as a source where the next symbol is dependent
> >>on one or more previous symbols. A zero memory source is the opposite of
> >>this. It is therefore impossible by definition to have a markov
> >source where the next symbol is not dependent on the previous.
> >
Then I wrote:

> >This is not correct. It is indeed possible to have markov chain
> >which does not depend upon the previous symbol. It is a special case of Markov
> >matrices. A Markov chain is a probability matrix in which the
> >state of some system is sequentially followed by other states with a given probability.

[Examples of such Markov chains snipped after which I continued]

>> Any letter is equally likely to be followed by any other letter. That is
>> what Yockey was saying in Brian Harper's post. The above transition
>> probability matrix is a special Markov matrix with absolutely no memory of
>> the previous state because it doesn't matter what the previous letter is,
>> the next letter is randomly selected from the 4 possibiliites. It is
>> mathematically a Markov matrix.
Brad finally replied this morning:

>Ok, I accept that it is possible to construct a matrix that has equal
>probabilities but this is exactly what is termed a "zero memory source" I
>think it is better terminology as it specifies exactly what is being
>discussed. (a more precise definition if you like).<<<

I appreciate the honesty here. A Markov matrix is different than you thought and can be memoryless. DNA is memoryless when examined along the sequence axis. One has no ability to predict the next character from the last character. Thus DNA doesn't remember what the last letters are. In English I can be sure that if I see a 'q' the next letter will be 'u'. That is a system with memory. When examined along the generational axis, it has an exacting memory.

[examples of the two different dimensions of DNA (generational axis and sequence axis snipped]

Brad replied this morning:

>I can accept what you are saying here but have some questions:

Once again, I appreciate the honesty.

>*Biology question*
>1. If the mutation does not occur in DNA replication then all it would do
>is create a different protein every now and then. How can this lead to
>cumulative mutations that in turn lead to macro evolution?

Because the novel protein has an effect on the organism either beneficial or harmful. For instance, Blood type A (I think it is; this is from memory) gives some type of resistance to syphillis which is beneficial in some environments. Sickle cell anemia is cause by a protein change to hemoglobin. When you have 1 sickle cell gene and 1 normal gene, you have some resistance to malaria, which is very beneficial to those individuals. But if you have 2 sickle cell genes, you will die young. But, since the benefit of malaria resistance is so great in malaria areas, the benefit to the population of the sickle cell gene is such that it becomes widespread. The benefit outweighs the harm. And there is a mutation in an Italian family, a change to some blood protein, that gives this family immunity to cholesterol. A very beneficial mutation which probably will spread around the world I the next few millenia.

Changes in proteins affecting development of the embryo also occur. And these changes also have either beneficial or harmful effects. Those that occur in an environment where the change is beneficial survive and reproduce. Those that don't, die. This is how macro evolution occurs. I would suggest reading the last chapter of Gilbert's Developmental Biology. It is fascinating and explains in some detail how this occurs. I would also say that the entire book is well worth reading.

>For evolution to happen the DNA itself must mutate, not an occasional glitch
>in the creation of proteins.

And that is what happens. DNA mutates along the generational axis.


>*Information Theory*
>A glitch in putting out a symbol on a random basis is exactly what I was
>talking about in my analogy to a CD copy. This is an "information channel"
>and as such any random variation on an information channel ALWAYS reduces
>the channels information carrying capacity.
>If you look at my previous post I definitely stated that I was refuting your
>information theory post on info theory grounds EVEN though I thought it was
>a wrong model as well as a wrong application.

Once again you are equivocating on the terms information as used for knowledge or intelligence rather than information as a mathematically defined concept. See Greg Billock's post yesterday or see Yockey's "Application of Information Theory to the Central Dogma and the
sequence hypothesis" Journal of Theoretical biology 46:369-406

>The sequence axis is just as easily modeled by the information stream coming
>of the CD. It will mostly be the same thing over and over again with the
>occasional random error.

But this is where you misunderstand DNA. DNA does have repeats, but it is not per se 'the same thing over and over again. The statistical distribution of nucleotides appears randomly distributed. And you are treating the sequence axis as a source and receiver in your info theory approach below(and as you suggest above). That is not correct. DNA is the source proteins or the next generation is the receiver.

>The info theory model of this is:
>I(A;B) = H(A) - H(A/B)
>Where I(A;B) is the mutual information of the channel.
>Here it can be seen that the maximum mutual information is obtained when
>H(A/B) is zero which is given only by a noiseless channel.
>ANY channel noise will ALWAYS REDUCE the mutual information.
>The mutual information is defined as the information provided by the

This cannot be modeled in the fashion you are attempting. Along the sequence axis, let us assume we have the sequence


and we are wanting to know the next character in the sequence. Because DNA is memoryless the sequence above has no bearing on what the next character in the sequence is. There is no transmission of information to a 'receiver' at the next position. You must run a source/reciever calculation only between the DNA and the protein manufacturing facility (ribosomes?) or between the DNA of one generation and the next. But you can't run it along the sequence axis!!!!!


Brad wrote
>> >I don't know who Yockey is but this is what Engineers are taught
>> >the world over in relation to information theory since Shannon and Wiener
>> >invented it in 1948.
>> >


and I replied:
>> When you have familiarized yourself with Yockey's work, then you will be
>> ready to discuss the issue.

And today Brad wrote:
>Hmm, well I will read some of Yockeys work but I have read some parts of one
>of his books and it seemed to me that he does not necessarily hold your

He believes in evolution. He doesn't believe in the origin of life by means of a warm pond. I have had several e-mail exchanges with him about these issues.

>Maybe you will read some info theory texts?

I would be delighted to but you must know that there are some significant differences between info theory as practiced by you EE's and what is applicable to biology. Biology has a major difference in its informational system. The genetic code is degenerate or redundant. Up to 6 different DNA triplets code for the very same amino acid. Only two amino acids are coded for by a single DNA triplet. This degeneracy causes a loss of information between the DNA and the protein and it causes the mathematics to be altered from your CD and computer applications. Since you haven't read Yockey, then much of the math you are using does not take the degeneracy of the biological code into account.

Brad wrote today:
>> >reasons why DNA is similar to a CD in terms of info theory:
>> >
>> >1. is a channel for encoded information.
I don't believe that the sequence axis is not a channel in the sense you are using the term.(DNA is not both source and receiver).

Brad wrote today:

>> >2. outputs a set sequence of codes repeatedly.
>> >3. can be replicated.
>> >4. random errors/mutations can occur in replication process.
>> >on what grounds do you object to this comparison?

See above. And there is one VERY important difference between reproduction of digits off a CD and DNA sequence reproduction. When CD's are made, on takes a master data base and makes thousands of copies from that single master. The master copy makes thousands of 'babies'. When CDs are played, you use the same CD over and over again to produce the sound in the air. You don't record the sound on a new CD then destroy the first one, play the copy, record the sound and repeat the foregoing over and over. Master CD's aren't copying the copy in succession. But in living things, each generation gets a new copy from the copy their parents were given. Each generation of living creature is given a copy of a copy of a copy of a copy* of the original DNA.

Brad wrote today:
>CD's are still like DNA. You can consider a CD in both sequential and
>generational modes, that is why a CD is so good an analogy. You actually
>strengthen the analogy by bringing up another axis in which the CD is
>similar to DNA.
>Still haven't actually given any decent reasons on this one....

Because you use the CD analogy in a equivocation between information used as 'knowledge' or 'intelligence' rather than information as defined by H =-k sum(-p[I]log[p[I]). My son told me that in his branch of EE, which is sound synthesis and acoustical engineering, the sequence that has the most information is the sequence with the whitest spectrum, i.e. has components in all frequencies. But he is quick to add that you probably couldn't see a lot of CD with white noise on them. That is the difference between information defined mathematically and information used colloquially as you are doing.

[I just noticed that Greg gave you some more reasons for the CD analogy not working.]

Brad wrote the other day:
>> >If it is only the fact that mutations (in your opinion) add information
>> >where errors in CD replication does not. Well, this is precisely
>> >the point being debated and so it is obviously not a valid argument.

Mutations in a CD drive the CD to have a whiter spectrum, i.e. adds information. It destroys meaning and listening enjoyment, but information theory does not have anything to do with meaning as Greg Billock pointed out yesterday.

I wrote:
>> Actually if you mutate the digits on a CD some of the mutations will add
>> information and some will remove information. Both those that add and
>> those that subtract may remove the message, but the informational content
>> of a sequence is not the same as the message content.

Brad replied today:
>No, it will never increase information.

Once again you are making a fundamental error in information theory. See Greg Billock's post if you won't believe me.

>Do I have to prove this or will you just accept my word and use common sense?

It is your colloquial usage of the word 'information' rather than the mathematical expression of information that is causing you the problem.


I wrote:
>> And you haven't heard of Yockey????????

Brad replied today:
>I am an Engineering Student, we don't study Yockey :P

>I do not claim to be an expert on biology, however I DO claim a good
>knowledge of information theory. I finished exams on this just a few weeks
>ago so this stuff is very fresh in my mind.

If you don't study Yockey, then you would be very unaware of how the degeneracy of the genetic code affects the mathematics you use with the more simple, non-degenerate codes, widely used in Electrical Engineering . In point of fact, you guys don't like degeneracy because it introduces uncertainty. But living systems use degeneracy as a means of error correction and error avoidance.

>Yocky states
>> "One must know the language in which a word is being used: 'O singe
>> fort' may be read in French or German with entirely different meanings. The
>> reader may find it amusing to list all the words in languages that he knows
>> that are spelled the same but have different meanings. For example a
>> German-speaking visitor to the United States might have all his suspicions
>> about America confirmed when he finds that there is a Gift shop in every
>> airport, hotel and shopping center." The message 'mayday, mayday' may mean
>> a distress signal, a Bolshevik holiday or a party for children in the
>> spring, all depending on the context.

Brad replied today:
>Sure they have different meanings. But they also have different information
>contents when analysed in terms of the language model.

Actually the sequence 'gift' meaning 'poison' in German and 'gift' meaning 'gift' in English have identical informational content as isolated sequences. Sure if you put them in a sentence the sentences would have different information content. But then you would be measuring the information content of the sentence containing 'gift' NOT the sequence 'gift'.

>Information theory does not ascribe meaning to information. It does however
>ascribe NO MEANING to any randomness or noise. Do you underand this?

Do you understand that information theory doesn't deal with meaning?????? The sequence transmitted by one system to anther, is all that matters. An astronomer listening to the hiss of a radiofrequency in space uses the same equations to govern the processing of his static as the radio engineer down the street uses to broadcast "Inagaddavida" or however you spell that infernal song title. (That song might not have made it to Australia)

Brad wrote today:
>Did you know it is possible to achieve better compression on a text if you
>know what language it is? This shows that better models lead to more
>accurate analysis of the information content.

Yes I did. But algorithmic complexity is not the best measure of relative information. While one might find a short algorithm with which to compress a sequence, one can never be sure that that algorithm is the SHORTEST one possible. (Brian might want to make a comment here; he knows more about algorithmic complexity than I)


I wrote, quoting Yockey:

>> "Information theory shows that it is fundamentally undecidable whether a
>> given sequence has been generated by a stochastic process or by a highly
>> organized process. This is in contrast with the classical law of the
>> excluded middle (tertium non datur), that is, the doctrine that a
>> statement
>> or theorem must be either true or false. Algorithmic information theory
>> shows that truth or validity may also be indeterminate or fundamentally
>> undecidable."~Hubert Yockey, Information Theory and Molecular Biology,
>> (Cambridge: Cambridge University Press, 1992), p. 81-82.
>> This last means that you cannot possibly tell whether a given sequence has
>> MEANING created by a HUMAN or whether it is random gibberish. Show me the
>> algorithm that will tell these two apart. You can't and no one else can.
>> This is not my opinion but Yockey's.

Brad replied:
>This is true, but I don't see how it applies to your argument.

Because you keep confusing "meaning" with the mathematical definition of information. If mathematically you can't tell random gibberish apart from Marc Antony's 'Friends, Romans and Countrymen" speech, then mathematically both sequences must look the same on a statistical level. If they didn't look the same then you could tell them apart. Thus, sequences with lots of meaning, have lots of information, but random sequences with no meaning whatsoever, also have lots of information. The only way humans tell random gibberish from a language is by prior ascent by all members of the language group. Look at Alien languages spoken in movies. They sound like gibberish and are gibberish unless you know the apriori convention. But the apriori convention is not held in the sequence of 'gibberish' itself any more than the dictionary definitions of the words I am writing are given in the above paragraph!

That is why you can't tell whether I am writing real mandarin chinese (pinyin) below or real gibberish.

Ni xue yao xuexi hen duo!

Brad wrote today:
>As I said above. The better we know the source the closer we can model the
>information content. Example given was if we know what language a text is we
>can achieve better compression on it.
>Therefore if we have no knowledge of a source then yes, we cannot tell them
>apart. HOWEVER this is NOT the TRUE information content, this is just the
>model we are using.
>The better the model the closer it is to the TRUE information content. And
>if we know that one source is gibberish we can just turn it off and ignore
>Therefore it follows that if we know that mutations are caused by RANDOM
>mutations then we can confidently say they do not add information.
>If on the other hand the mutations are not random in origin then you have a
>valid argument. I take this as another indicator of intelligent design.
>Here is an exam question from a 1997 exam paper given by the E&EE department
>at UWA:
>Do you aggree or disagree with the following statments?
>"Information theory is pure nonsense! Noise is usually modelled as a random
>source and a random source contains the most information since all symbols
>are equiprobable. Thus the most informative information source is noise."
>Post what you think the correct response to this and your reasons why....

Have you ever heard the term 'Fallacy of equivocation"? That is what the above is.

>Hint: read what I have said above.

Hint: read Greg Billock's post from Sunday.

I wrote:
>> I absolutely agree that randomness would korupt thu missage won wunts ta
>> sind. But by making new arbitrary definitions about the meanings
>> of a word,
>> the language evolves. Meaning is not information by the definition of
>> information theory. Why do you think we of English descent don't still
>> speak Latin? Mutations to the spoken language were given new arbitrary
>> definitions. The language mutated but it wasn't destroyed.
>> Meaning is not the same as information.

Brad replied today:

>Language changes, yes. Does it gain more information? NO
>Is modern english more informative than Latin?

Actually if you measure the information content in a sequence containing all the words of a given language laid end to end, then yes English, having VASTLY MORE words than Latin is more informative than Latin. We have many more nouns in modern English than were in Latin. They didn't have terms like 'information theory' 'Electrical Engineering,' 'Quantum Mechanics' etc ad nauseum.

Brad wrote today:

>I would very much enjoy being able to debate this with someone more
>knowedgable than you on this topic (info theory), so please invite your som
>to join (or mail me privately if he doesn't want to join the group)

Now, let me get this straight. You didn't know who Yockey was; originally you said you hadn't read any of his articles; you didn't understand that a Markov transition matrix could be constructed which didn't rely on the previous character (you admitted that I was correct ); you didn't know how DNA affects proteins and then the survival of the organism leading to macro evolution; you didn't know what a memoryless system was; you are using mathematics for a non-degenerate code; and you continue to equivocate on information used colloquially and information as defined mathematically (as pointed out to you by two of us). And you think I am not knowledgeable??? There is a word for this, but I would rather not use it.