RE: Information: Brad's reply (was Information: a very

Brad Jones (
Mon, 29 Jun 1998 15:17:15 +0800

Brad Jones writes:

> At 05:06 PM 6/27/98 +0800, Brad Jones wrote:
> >>>------------------------------------------------------------
> >>>Glenn,
> >>>A DNA sequence of AAAAATAAAA will output this each and every
> >>>
> >>>A zero memory source with the probabilities given would produce
> >>
> >>
> >>I believe that you are mixing the way memory works in such systems. Zero
> >>memory usually applies to a Markov chain which doesn't use the previous
> >>character to determine the next.
> >
> >A Markov source is defined as a source where the next symbol is dependent
> >on one or more previous symbols. A zero memory source is the opposite of
> >this. It is therefore impossible by definition to have a markov
> source where
> >the next symbol is not dependent on the previous.
> >
> This is not correct. It is indeed possible to have markov chain
> which does
> not depend upon the previous symbol. It is a special case of Markov
> matrices. A Markov chain is a probability matrix in which the
> state of some
> system is sequentially followed by other states with a given probability.
> For instance in English the letter q is followed by u with a
> probability of
> 100%. The matrix representing the change from state q to the next letter
> would look like:
> from \ To state
> state \ a b c ....q r ...z
> q 0 0 0.....1 0....0
> To simplify things, consider the 4 nucleotides, A,T, C,G. In the DNA
> sequnce one can set up a probability matrix
> from \ To next symbol in sequence
> state \ A T C G
> A | a1 a2 a3 a4
> T | t1 t2 t3 t4
> C | c1 c2 c3 c4
> G | g1 g2 g3 g4
> Where a1+a2+a3+a4 =1; t1+t2+t3+t4=1; etc. This is because A must be
> followed by some letter so the sum of the probabilities must be 1.
> If you have a transition probability matrix (Markov matrix) with
> values like:
> from \ To next symbol in sequence
> state \ A T C G
> A | 3 7 55 35
> T | 15 23 12 50
> C | 67 10 3 20
> G | 44 32 5 18
> The presence of the state C is highly correlated with the next
> letter being
> A. conversely in this case the presence of A is highly
> correlated with the
> next letter being C. Such a transition matrix should give lots of
> ACACACAC's. The system has memory and remembers what the last character
> was and behaves accordingly.
> But by definition "The persistence of memory is a function of the
> determinism of the system, because a completely deterministic system
> 'remembers' forever, whereas a random system has no memory at
> all." John C.
> Davis, Statistics and data Analysis in Geology (New York: John Wiley,
> 1973), p. 285
> What is a random system referred to by Davis above? It is a Markov
> probability matrix with the following components:
> from \ To next symbol in sequence
> state \ A T C G
> A | 25 25 25 25
> T | 25 25 25 25
> C | 25 25 25 25
> G | 25 25 25 25
> Any letter is equally likely to be followed by any other letter. That is
> what Yockey was saying in Brian Harper's post. The above transition
> probability matrix is a special Markov matrix with absolutely no memory of
> the previous state because it doesn't matter what the previous letter is,
> the next letter is randomly selected from the 4 possibiliites. It is
> mathematically a Markov matrix.

Ok, I accept that it is possible to construct a matrix that has equal
probabilities but this is exactly what is termed a "zero memory source" I
think it is better terminology as it specifies exactly what is being
discussed. (a more precise definition if you like).

> When you stated above in your original post
> :
> >>>A DNA sequence of AAAAATAAAA will output this each and every
> >>>
> >>>A zero memory source with the probabilities given would produce
> I was puzzled. I think I now know what you are mixing up. A
> DNA sequence
> in a reproductive system has 2 dimensions. There is the dimension of the
> sequence itself and on the other axis is the dimension of what is
> passed on
> to the offspring in the next generation. It is the generation axis.
> Generation
> Axis
> 5
> 4
> 3
> 2
> 1
> Sequence axis
> When you state that the DNA sequence will output this each and every time
> you are referring to the generational axis. What is passed from parent to
> offspring. It would look like:
> Generation
> Axis
> Sequence axis
> But in our original notes on information theory, both Brian and I were
> talking about the Sequence axis. Information is measured along the
> sequence axis, not per se the generational axis.
> Thus when I pointed out that the sequence AAAAAAAAAA had zero information
> content, and the mutation to AAAAATAAAA represented an increase in
> information it does because we are not talking about the generation axis.
> But even putting it into your terminology, the
> output(generational axis) of
> the DNA sequence AAAAAAAAAA is not always AAAAAAAAAA but occasionally is
> AACAAAAAAA or AAAAATAAAA. There is a Generational markov matrix that is
> something like:
> from \ To next symbol in next generation
> state \ A T C G
> A | .9999999997 .0000000001 .0000000001 .0000000001
> T | .9999999997 .9999999997 .0000000001 .0000000001
> C | .0000000001 .0000000001 .9999999997 .0000000001
> G | .0000000001 .0000000001 .0000000001 .9999999997
> This generational transition matrix allows the next generation to avoid
> very many mutations. But along the sequence axis, not the generational
> axis, the Markov matrix is such that each letter is independent of the
> previous choice or:
> from \ To next symbol in sequence
> state \ A T C G
> A | 25 25 25 25
> T | 25 25 25 25
> C | 25 25 25 25
> G | 25 25 25 25

I can accept what you are saying here but have some questions:

*Biology question*
1. If the mutation does not occur in DNA replication then all it would do
is create a different protein every now and then. How can this lead to
cumulative mutations that in turn lead to macro evolution?

For evolution to happen the DNA itself must mutate, not an occasional glitch
in the creation of proteins.

This is what I assumed you were talking about as it is what could lead to
change in the organism.

I would like your opinion on this as I am certainly no expert on biology.

*Information Theory*
A glitch in putting out a symbol on a random basis is exactly what I was
talking about in my analogy to a CD copy. This is an "information channel"
and as such any random variation on an information channel ALWAYS reduces
the channels information carrying capacity.

If you look at my previous post I definitely stated that I was refuting your
information theory post on info theory grounds EVEN though I thought it was
a wrong model as well as a wrong application.

The sequence axis is just as easily modeled by the information stream coming
of the CD. It will mostly be the same thing over and over again with the
occasional random error.

The info theory model of this is:

I(A;B) = H(A) - H(A/B)

Where I(A;B) is the mutual information of the channel.

Here it can be seen that the maximum mutual information is obtained when
H(A/B) is zero which is given only by a noiseless channel.

ANY channel noise will ALWAYS REDUCE the mutual information.

The mutual information is defined as the information provided by the

> >>>
> >>You are using a 10th degree Markov chain and that is not what
> DNA is. Brian
> >>would you care to comment on this?
> >
> >I just proved that it is at least a 10th order and you claim it
> isn't with
> >no supporting evidence (let alone mathematical analysis!) ?!?
> >
> Actually you mixed up the generational axis with the sequence axis. You
> haven't proved it yet.

I took account of both in my previous posts, I actually stated that I
thought you were analyzing it using the wrong model (source instead of

When I was discussing the information source I was merely refuting your
incorrect use of the information theory formula.

> >>A zero memory Markov chain is what a random sequence is.
> >
> >There is no such thing as a zero memory markov chain. But a
> random sequence
> >is a zero memory source, this has no relevance to the topic though.
> >
> See above. Are you trying to say that equal probabilities for all choices
> is not a matrix? Or are you saying that the name of the matrix isn't
> Markov? If the latter then you are really playing a semantic game.
> >>Maybe you should tell this to Hubert Yockey. But I don't think he would
> >>agree with you. By the way, DNA is not like a CD. There are mutations in
> >>DNA and they can add information.
> >
> >I don't know who Yockey is but this is what Engineers are taught
> the world
> >over in relation to information theory since Shannon and Wiener
> invented it
> >in 1948.
> >
> Yockey is only one of the leading figures of information theory as applied
> to biology. Your lack of familiarity with him shows that you haven't
> looked at the biological problem very closely. here are some of Yockey's
> publications:
> 1956 "An application of information theory to the physics of
> tissue damage,
> Radiation Research 5:146-155
> 1958 "A Study of aging, thermal killing and radiation damage by
> information
> theory," Symposium on Information Theory in Biology ed. H. P. Yocky, R.
> Platzman and H. Quastler, pp 297-316
> 1974 "An applicaton of informatin Theory to the Central Dogma and the
> sequence hypothesis" Journal of Theoretical biology 46:369-406--a
> must read
> 1977 "A prescription which predicts functionally equivalent residues at
> given sites in protein sequences," Journal of Theoretical Biology
> 67:337-343
> 1977 "on the Information Content of Cytochrome C" Journal of Theoretical
> Biology 67:345-376
> 1977 "A calculation of the probability of spontaneous biogeneisis by
> information theory" Journal of Theoretical biology 67:377-398--This is one
> that will fit into your father's preconceptions.
> 1978"Can the Central Dogma be derived from Information Theory? Journal of
> Theoretical Biology 74:149-152
> 1979 "Do overlapping genes violate molecular biology and the theory of
> evolution? Journal of Theoretical biology 80:21-6
> 1981 "Self Organization origin of life scenarios and information theory
> Journal of Theoretical biology 91:13-31
> Information Theory and Molecular Biology, (New York: CAmbridge University
> Press, 1992)
> When you have familiarized yourself with Yockey's work, then you will be
> ready to discuss the issue.

Hmm, well I will read some of Yockeys work but I have read some parts of one
of his books and it seemed to me that he does not necessarily hold your

Maybe you will read some info theory texts?

> >reasons why DNA is similar to a CD in terms of info theory:
> >
> >1. is a channel for encoded information.
> >
> >2. outputs a set sequence of codes repeatedly.
> >
> >3. can be replicated.
> >
> >4. random errors/mutations can occur in replication process.
> >
> >on what grounds do you object to this comparison?
> You are using the generational axis and we were using the sequence axis in
> our calculation of information. That is why your analogy is flawed.

CD's are still like DNA. You can consider a CD in both sequential and
generational modes, that is why a CD is so good an analogy. You actually
strengthen the analogy by bringing up another axis in which the CD is
similar to DNA.

Still haven't actually given any decent reasons on this one....

> >
> >If it is only the fact that mutations (in your opinion) add information
> >where errors in CD replication does not. Well, this is precisely
> the point
> >being debated and so it is obviously not a valid argument.
> Actually if you mutate the digits on a CD some of the mutations will add
> information and some will remove information. Both those that add and
> those that subtract may remove the message, but the informational content
> of a sequence is not the same as the message content.

No, it will never increase information.

Do I have to prove this or will you just accept my word and use common

> >
> >Do you have any objections actually based on why they are different in
> >relation to information theory?
> See my last paragraph. Familiarize yourself with Yockeys' work!!!!!

Hmm, that is not a reason.....

maybe if Yockey puts reasons why then you could post them?

> >
> >Is anyone interested what a realistic analysis of the problem
> would show if
> >done correctly as an information channel?
> >
> >>>The mutations of DNA seem analogous to the errors encountered
> >>>when copying a CD which is quite easily modeled by a correct
> >>>application of information theory.
> >>>
> >>>By doing it this way it is possible to model the random mutations and
> >>>the effect they have on the information, ie the difference they make to
> >>>the information content as opposed to the actual information content.
> >>>The measure of this is called the mutual information of a channel.
> >>>
> >>>I hope this clears it up somewhat, it is quite difficult to
> explain this
> >in
> >>>easy terms and I would recommend finding a good textbook if you
> >>>really want to pursue this.
> >>
> >>
> >>I was about to make the same recommendation to you.
> >
> >Sorry Glenn but I know what I am talking about here.
> And you haven't heard of Yockey????????

I am an Engineering Student, we don't study Yockey :P

I do not claim to be an expert on biology, however I DO claim a good
knowledge of information theory. I finished exams on this just a few weeks
ago so this stuff is very fresh in my mind.

> >Further more you have not refuted my calculations with anything but your
> >personal opinion which does not seem to be based on any knowledge of the
> >material you are discussing.
> >
> >You misunderstand the basics of information theory if you think
> that random
> >noise consists of or can create information in any form whatsoever.
> You are equivocating on the word 'information' as meaning
> semantics. My son
> is a EE and you guys talk about fidelity of the MESSAGE being conveyed.
> But that is not informational content of a sequence. Yocky states
> "One must know the language in which a word is being used: 'O singe
> fort' may be read in French or German with entirely different
> meanins. The
> reader may find it amusing to list all the words in languages
> that he knows
> that are spelled the same but have different meanings. For example a
> German-speaking visitor to the United States might have all his suspicions
> about America confirmed when he finds that there is a Gift shop in every
> airport, hotel and shopping center." The message 'mayday,
> mayday' may mean
> a distress signal, a Bolshevik holiday or a party for children in the
> spring, all depending on the context.

Sure they have different meanings. But they also have different information
contents when analysed in terms of the language model.

Information theory does not ascribe meaning to information. It does however
ascribe NO MEANING to any randomness or noise. Do you underand this?

Did you know it is possible to achieve better compression on a text if you
know what language it is? This shows that better models lead to more
accurate analysis of the information content.

An example of this is as follows:

If we compress text we can find a general information content of 4.75 bits
per symbol.

BUT if we know that text is going to be english we can refine this to 3.32
bits per symbol.

see: Abramson N, "Information theory and coding", McGraw Hill 1963.

> ...
> "The examples cited above show that the meaning of a sequence of
> symbols in
> natural languages is subject to the arbitrary agreement between source and
> receiver. This question of meaning is best left to philosophers,
> linguists
> and semanticists. The communications engineer, in designing his
> equipment,
> need not concern himself with the meaning of the sequence of symbols.
> indeed, no humans may be involved. The message may be, for example, a
> computer communicating data to another machine or a spacecraft
> sending data
> from which pictures of other planets will be made."
> "A great deal of arbitrariness is also found in the
> sequences that carry
> specificity in a protein, as we shall see in Chapter 6. In fact, as I
> shall show in Chapter 9, there are 9.737 x 10^93 iso-1-cytochrome c
> sequences that differ in at least one amino acid, each carrying the same
> specificity. Like a good communications system, the genetic information
> storage and transmission apparatus is independent of the
> specificity of the
> genetic messages. It deals with specificity of the genetic messages only
> through the information those messages carry." Yockey, Information Theory
> and Molecular biology, p. 59

Information theory ALWAYS works better when we know what is being sent. A
good engineer always finds out the most he can about the source so as to
model it as closely as possible.

As shown above knowing what language a text is will allow us to design a
better transmission system.

> And
> "Information theory shows that it is fundamentally undecidable whether a
> given sequence has been generated by a stochastic process or by a highly
> organized process. This is in contrast with the classical law of the
> excluded middle (tertium non datur), that is, the doctrine that a
> statement
> or theorem must be either true or false. Algorithmic information theory
> shows that truth or validity may also be indeterminate or fundamentally
> undecidable."~Hubert Yockey, Information Theory and Molecular Biology,
> (Cambridge: Cambridge University Press, 1992), p. 81-82.
> This last means that you cannot possibly tell whether a given sequence has
> MEANING created by a HUMAN or whether it is random gibberish. Show me the
> algorithm that will tell these two apart. You can't and no one else can.
> This is not my opinion but Yockey's.

This is true, but I don't see how it applies to your argument.

As I said above. The better we know the source the closer we can model the
information content. Example given was if we know what language a text is we
can achieve better compression on it.

Therefore if we have no knowledge of a source then yes, we cannot tell them
apart. HOWEVER this is NOT the TRUE information content, this is just the
model we are using.

The better the model the closer it is to the TRUE information content. And
if we know that one source is gibberish we can just turn it off and ignore

Therefore it follows that if we know that mutations are caused by RANDOM
mutations then we can confidently say they do not add information.

If on the other hand the mutations are not random in origin then you have a
valid argument. I take this as another indicator of intelligent design.

Here is an exam question from a 1997 exam paper given by the E&EE department
at UWA:
Do you aggree or disagree with the following statments?

"Information theory is pure nonsense! Noise is usually modelled as a random
source and a random source contains the most information since all symbols
are equiprobable. Thus the most informative information source is noise."

Post what you think the correct response to this and your reasons why....

Hint: read what I have said above.

> The fact
> >that the next symbol is random does not imply in any way that it
> is random
> >values that are being produced. You should look into source coding to see
> >what is actually meant by the symbol probabilities of a source.
> >
> >Information theory uses words such as random in a very different way than
> >what a layman means. For example If you are sending english down
> your modem
> >then I would model that as a random source, but it is NOT random
> noise that
> >is being sent, it is english text that any randomness would corrupt not
> >enhance. In fact, as common sense suggests, any random
> modification of the
> >signal will ALWAYS reduce the information.
> I absolutely agree that randomness would korupt thu missage won wunts ta
> sind. But by making new arbitrary definitions about the meanings
> of a word,
> the language evolves. Meaning is not information by the definition of
> information theory. Why do you think we of English descent don't still
> speak Latin? Mutations to the spoken language were given new arbitrary
> definitions. The language mutated but it wasn't destroyed.
> Meaning is not
> the same as information.

Language changes, yes. Does it gain more information? NO

is modern english more informative than Latin?

> >
> >Additional information on Markov sources can be found in any mathematics
> >book that deals with random processes (this is quite heavy going
> if you are
> >not up on your probability theory).
> Let's not start the ad hominem attacks again.
> >--------------------------------------------
> >Brad Jones
> >3rd Year BE(IT)
> >Electrical & Electronic Engineering
> >University of Western Australia
> BTW, Stephen, If we want to play your son against, mine, my MS electrical
> engineer son has no problem with what I am saying.

I would very much enjoy being able to debate this with someone more
knowedgable than you on this topic (info theory), so please invite your som
to join (or mail me privately if he doesn't want to join the group)

Brad Jones
3rd Year BE(IT)
Electrical & Electronic Engineering
University of Western Australia