RE: Information: Brad's reply (was Information: a very

Brad Jones (
Tue, 30 Jun 1998 13:19:49 +0800

> On Mon 29 Jun 1998 15:17:15 Brad Jones writes:
> I wrote:
> > >>>I believe that you are mixing the way memory works in such
> systems. Zero
> > >>>memory usually applies to a Markov chain which doesn't use
> the previous
> > >>>character to determine the next.
> And
> Brad replied,
> > >
> > >>A Markov source is defined as a source where the next symbol
> is dependent
> > >>on one or more previous symbols. A zero memory source is the
> opposite of
> > >>this. It is therefore impossible by definition to have a markov
> > >source where the next symbol is not dependent on the previous.
> > >
> Then I wrote:
> > >This is not correct. It is indeed possible to have markov chain
> > >which does not depend upon the previous symbol. It is a
> special case of Markov
> > >matrices. A Markov chain is a probability matrix in which the
> > >state of some system is sequentially followed by other states
> with a given probability.
> [Examples of such Markov chains snipped after which I continued]
> >> Any letter is equally likely to be followed by any other
> letter. That is
> >> what Yockey was saying in Brian Harper's post. The above transition
> >> probability matrix is a special Markov matrix with absolutely
> no memory of
> >> the previous state because it doesn't matter what the previous
> letter is,
> >> the next letter is randomly selected from the 4 possibiliites. It is
> >> mathematically a Markov matrix.
> >
> Brad finally replied this morning:
> >Ok, I accept that it is possible to construct a matrix that has equal
> >probabilities but this is exactly what is termed a "zero memory source" I
> >think it is better terminology as it specifies exactly what is being
> >discussed. (a more precise definition if you like).<<<
> I appreciate the honesty here. A Markov matrix is different than
> you thought and can be memoryless. DNA is memoryless when
> examined along the sequence axis. One has no ability to predict
> the next character from the last character. Thus DNA doesn't
> remember what the last letters are. In English I can be sure that
> if I see a 'q' the next letter will be 'u'. That is a system with
> memory. When examined along the generational axis, it has an
> exacting memory.

The matrix was not different than I thought, I just use different
terminology. Its like saying a plane is also a car because it has an engine
and wheels. maybe its true but I'd call it a plane to save confusion.

> [examples of the two different dimensions of DNA (generational
> axis and sequence axis snipped]
> Brad replied this morning:
> >I can accept what you are saying here but have some questions:
> Once again, I appreciate the honesty.
> >
> >*Biology question*
> >1. If the mutation does not occur in DNA replication then all it would do
> >is create a different protein every now and then. How can this lead to
> >cumulative mutations that in turn lead to macro evolution?

Greg answered this quite well I thought.

But I am a bit confused as to what you think create sthe information, is it
generational or sequential. Firstly we were arguing generational, then you
changed to sequential which doesn't seem to have much effect on the whole
organism in terms of major change.

ie. one cell with a faulty protein isn't going to be passed to the
offspring. In fact wouldn't it have to be DNA replicating in a sex cell
which can create the necessary information that you require?

I would like a clarification of the position you are arguing, it seems to
vary a bit as the discussion continues.

> [snip]
> >*Information Theory*
> >A glitch in putting out a symbol on a random basis is exactly what I was
> >talking about in my analogy to a CD copy. This is an
> "information channel"
> >and as such any random variation on an information channel ALWAYS reduces
> >the channels information carrying capacity.
> >
> >If you look at my previous post I definitely stated that I was
> refuting your
> >information theory post on info theory grounds EVEN though I
> thought it was
> >a wrong model as well as a wrong application.
> Once again you are equivocating on the terms information as used
> for knowledge or intelligence rather than information as a
> mathematically defined concept. See Greg Billock's post
> yesterday or see Yockey's "Application of Information Theory to
> the Central Dogma and the
> sequence hypothesis" Journal of Theoretical biology 46:369-406

I disagree here. Information is related to and requires meaning. You cannot
have information without meaning. You CAN have information carrying capacity
without meaning, but no information.

> >
> >The sequence axis is just as easily modeled by the information
> stream coming
> >of the CD. It will mostly be the same thing over and over again with the
> >occasional random error.
> But this is where you misunderstand DNA. DNA does have repeats,
> but it is not per se 'the same thing over and over again. The
> statistical distribution of nucleotides appears randomly
> distributed. And you are treating the sequence axis as a source
> and receiver in your info theory approach below(and as you
> suggest above). That is not correct. DNA is the source proteins
> or the next generation is the receiver.

I thought the error rate was pretty small, around 10E-9 if I remember
correctly, that is easily small enough to compare to a CD.

Either DNA stores information or it does not, which one is it? If it does
then it can be compared to other information storage devices. I do not know
what you mean about treating it as source and reciever, I was treating it as
neither, rather the channel.

> >
> >The info theory model of this is:
> >
> >I(A;B) = H(A) - H(A/B)
> >
> >Where I(A;B) is the mutual information of the channel.
> >
> >Here it can be seen that the maximum mutual information is obtained when
> >H(A/B) is zero which is given only by a noiseless channel.
> >
> >ANY channel noise will ALWAYS REDUCE the mutual information.
> >
> >The mutual information is defined as the information provided by the
> >channel.
> This cannot be modeled in the fashion you are attempting. Along
> the sequence axis, let us assume we have the sequence
> acggtaacctgggtcgatacgtagc
> and we are wanting to know the next character in the sequence.
> Because DNA is memoryless the sequence above has no bearing on
> what the next character in the sequence is. There is no
> transmission of information to a 'receiver' at the next position.
> You must run a source/reciever calculation only between the DNA
> and the protein manufacturing facility (ribosomes?) or between
> the DNA of one generation and the next. But you can't run it
> along the sequence axis!!!!!

ANY information storage device IS a channel. Do not make the mistake of
thinking anything that has an output is a source. This is not correct.

> [snip]
> Brad wrote
> >> >I don't know who Yockey is but this is what Engineers are taught
> >> >the world over in relation to information theory since
> Shannon and Wiener
> >> >invented it in 1948.
> >> >
> [snip]
> and I replied:
> >> When you have familiarized yourself with Yockey's work, then
> you will be
> >> ready to discuss the issue.
> >>
> And today Brad wrote:
> >Hmm, well I will read some of Yockeys work but I have read some
> parts of one
> >of his books and it seemed to me that he does not necessarily hold your
> >viewpoint.
> He believes in evolution. He doesn't believe in the origin of
> life by means of a warm pond. I have had several e-mail
> exchanges with him about these issues.
> >Maybe you will read some info theory texts?
> I would be delighted to but you must know that there are some
> significant differences between info theory as practiced by you
> EE's and what is applicable to biology. Biology has a major
> difference in its informational system. The genetic code is
> degenerate or redundant. Up to 6 different DNA triplets code for
> the very same amino acid. Only two amino acids are coded for by
> a single DNA triplet. This degeneracy causes a loss of
> information between the DNA and the protein and it causes the
> mathematics to be altered from your CD and computer applications.
> Since you haven't read Yockey, then much of the math you are
> using does not take the degeneracy of the biological code into account.

I would like to read information theory text to correct your application of
information theory. It was you who started using info theory (incorrectly)
and that is what I am debating.

> Brad wrote today:
> >>
> >> >reasons why DNA is similar to a CD in terms of info theory:
> >> >
> >> >1. is a channel for encoded information.
> >
> I don't believe that the sequence axis is not a channel in the
> sense you are using the term.(DNA is not both source and receiver).

Channels are independent of source and receiver. A channel is something
which information is stored in. DNA stores information does it not?

> Brad wrote today:
> >> >2. outputs a set sequence of codes repeatedly.
> >> >3. can be replicated.
> >> >4. random errors/mutations can occur in replication process.
> >> >on what grounds do you object to this comparison?
> See above. And there is one VERY important difference between
> reproduction of digits off a CD and DNA sequence reproduction.
> When CD's are made, on takes a master data base and makes
> thousands of copies from that single master. The master copy
> makes thousands of 'babies'. When CDs are played, you use the
> same CD over and over again to produce the sound in the air. You
> don't record the sound on a new CD then destroy the first one,
> play the copy, record the sound and repeat the foregoing over and
> over. Master CD's aren't copying the copy in succession. But in
> living things, each generation gets a new copy from the copy
> their parents were given. Each generation of living creature is
> given a copy of a copy of a copy of a copy* of the original DNA.

CDs are quite capable of being made from copies which is what I first stated
in my post. I stated that CD can be used as an analogy to DNA, I didn't
state that the CD manufacturing process is like the DNA replication process.

DNA replicates as can CDs (not by themselves I'll grant but that is not the
point in issue)

DNA outputs sequences as do CDs.

DNA can be modeled as similar to a CD, or book if you like.

How about old manuscript that were copied in turn be each generation of
scribes? do you like that better? well its exactly the same thing.

> Brad wrote today:
> >CD's are still like DNA. You can consider a CD in both sequential and
> >generational modes, that is why a CD is so good an analogy. You actually
> >strengthen the analogy by bringing up another axis in which the CD is
> >similar to DNA.
> >
> >Still haven't actually given any decent reasons on this one....
> Because you use the CD analogy in a equivocation between
> information used as 'knowledge' or 'intelligence' rather than
> information as defined by H =-k sum(-p[I]log[p[I]). My son told
This is the most inaccurate method of anaysing anything but the most trivial
examples of sources. DNA is certainly not acurately modeled by this. I
showed this in great length in earlier posts and you have never refuted my
arguments. (or even acknowedged them).

> me that in his branch of EE, which is sound synthesis and
> acoustical engineering, the sequence that has the most
> information is the sequence with the whitest spectrum, i.e. has
> components in all frequencies. But he is quick to add that you
> probably couldn't see a lot of CD with white noise on them. That
> is the difference between information defined mathematically and
> information used colloquially as you are doing.

White noise does not have any information. White noise has the capacity to
carry a great deal of information but it isn't carrying it because nobody
put any into it. do you get this yet?

> [I just noticed that Greg gave you some more reasons for the CD
> analogy not working.]
> Brad wrote the other day:
> >>
> >> >If it is only the fact that mutations (in your opinion) add
> information
> >> >where errors in CD replication does not. Well, this is precisely
> >> >the point being debated and so it is obviously not a valid argument.
> Mutations in a CD drive the CD to have a whiter spectrum, i.e.
> adds information. It destroys meaning and listening enjoyment,
> but information theory does not have anything to do with meaning
> as Greg Billock pointed out yesterday.

Now mutations in electronic devices are EXACTLY what I study and you are
totally wrong on this one. If you think that random changes add information
to a CD then you are sadly mistaken and should go ask anyone involved in

Random errors in a CD will ALWAYS reduce the information, and the meaning
and everything else.

> I wrote:
> >>
> >> Actually if you mutate the digits on a CD some of the
> mutations will add
> >> information and some will remove information. Both those that add and
> >> those that subtract may remove the message, but the
> informational content
> >> of a sequence is not the same as the message content.
> >>
> Brad replied today:
> >No, it will never increase information.
> Once again you are making a fundamental error in information
> theory. See Greg Billock's post if you won't believe me.

Hmm, amazing how I can make such fundamental errors in info theory yet still
achieve high marks while studying it at university level....

or, maybe it is you who do not quite understand it?

> >Do I have to prove this or will you just accept my word and use
> common sense?
> It is your colloquial usage of the word 'information' rather than
> the mathematical expression of information that is causing you
> the problem.

No, I think it is you who misunderstands info theory and how randomness
affects it.

> [snip]
> I wrote:
> >> And you haven't heard of Yockey????????
> Brad replied today:
> >I am an Engineering Student, we don't study Yockey :P
> >I do not claim to be an expert on biology, however I DO claim a good
> >knowledge of information theory. I finished exams on this just a
> few weeks
> >ago so this stuff is very fresh in my mind.
> If you don't study Yockey, then you would be very unaware of how
> the degeneracy of the genetic code affects the mathematics you
> use with the more simple, non-degenerate codes, widely used in
> Electrical Engineering . In point of fact, you guys don't like
> degeneracy because it introduces uncertainty. But living systems
> use degeneracy as a means of error correction and error avoidance.

Information storage and transmission can be analysed without understanding
the actual codes, you have stated this yourself. (but drawn incorrect
conclusions from it)

I do not need to know the biological codes to see the error in your
application of info theory.

> [snip]
> >Yocky states
> >>
> >> "One must know the language in which a word is being used: 'O singe
> >> fort' may be read in French or German with entirely different
> meanings. The
> >> reader may find it amusing to list all the words in languages
> that he knows
> >> that are spelled the same but have different meanings. For example a
> >> German-speaking visitor to the United States might have all
> his suspicions
> >> about America confirmed when he finds that there is a Gift
> shop in every
> >> airport, hotel and shopping center." The message 'mayday,
> mayday' may mean
> >> a distress signal, a Bolshevik holiday or a party for children in the
> >> spring, all depending on the context.
> >>
> Brad replied today:
> >Sure they have different meanings. But they also have different
> information
> >contents when analysed in terms of the language model.
> Actually the sequence 'gift' meaning 'poison' in German and
> 'gift' meaning 'gift' in English have identical informational
> content as isolated sequences. Sure if you put them in a
> sentence the sentences would have different information content.
> But then you would be measuring the information content of the
> sentence containing 'gift' NOT the sequence 'gift'.

No. Information theory is concerned with transmitting the MEANING in as
efficient a manner as possible, therefore it would treat the word
differently depending on what language and context it was used it.

> >Information theory does not ascribe meaning to information. It
> does however
> >ascribe NO MEANING to any randomness or noise. Do you underand this?
> Do you understand that information theory doesn't deal with
> meaning?????? The sequence transmitted by one system to anther,
> is all that matters. An astronomer listening to the hiss of a
> radiofrequency in space uses the same equations to govern the
> processing of his static as the radio engineer down the street
> uses to broadcast "Inagaddavida" or however you spell that
> infernal song title. (That song might not have made it to Australia)

Information theory very much deals with meaning, it tries to store and
transmit the meaning as efficiently and with as little corruption as
possible. We don't sit around all day and make the best method of sending
garbage to each other, we make the best method of conveying MEANING to each

> Brad wrote today:
> >Did you know it is possible to achieve better compression on a
> text if you
> >know what language it is? This shows that better models lead to more
> >accurate analysis of the information content.
> Yes I did. But algorithmic complexity is not the best measure of
> relative information. While one might find a short algorithm
> with which to compress a sequence, one can never be sure that
> that algorithm is the SHORTEST one possible. (Brian might want
> to make a comment here; he knows more about algorithmic complexity than I)

It is entirely possible to know that you have the best compression possible,
that is what info theory is all about. This has NOTHING to do with
algoirthmic complexity at all.

By finding the true information content of a signal you can devise
compression to compress it to as close to 100% efficiency as you like, it
just takes more processing power. This is known as "Shannon's noiseless
coding theorum" and involves taking extensions of a source to find the true
information content and then coding using this.

This was what I did to refute your first post about the addition of
information. You have never shown any error I made in doing this so the
result stands.

> [snip]
> I wrote, quoting Yockey:
> >> "Information theory shows that it is fundamentally undecidable
> whether a
> >> given sequence has been generated by a stochastic process or
> by a highly
> >> organized process. This is in contrast with the classical law of the
> >> excluded middle (tertium non datur), that is, the doctrine that a
> >> statement
> >> or theorem must be either true or false. Algorithmic information theory
> >> shows that truth or validity may also be indeterminate or fundamentally
> >> undecidable."~Hubert Yockey, Information Theory and Molecular Biology,
> >> (Cambridge: Cambridge University Press, 1992), p. 81-82.
> >>
> >> This last means that you cannot possibly tell whether a given
> sequence has
> >> MEANING created by a HUMAN or whether it is random gibberish.
> Show me the
> >> algorithm that will tell these two apart. You can't and no one
> else can.
> >> This is not my opinion but Yockey's.
> Brad replied:
> >This is true, but I don't see how it applies to your argument.
> Because you keep confusing "meaning" with the mathematical
> definition of information. If mathematically you can't tell
> random gibberish apart from Marc Antony's 'Friends, Romans and
> Countrymen" speech, then mathematically both sequences must look
> the same on a statistical level. If they didn't look the same
> then you could tell them apart. Thus, sequences with lots of
> meaning, have lots of information, but random sequences with no
> meaning whatsoever, also have lots of information. The only way
> humans tell random gibberish from a language is by prior ascent
> by all members of the language group. Look at Alien languages
> spoken in movies. They sound like gibberish and are gibberish
> unless you know the apriori convention. But the apriori
> convention is not held in the sequence of 'gibberish' itself any
> more than the dictionary definitions of the words I am writing
> are given in the above paragraph!

mathematics shows NOTHING if used in the wrong context. By investigating the
source we can use mathematics more appropriately to model it. I explained
this earlier with the reference to the investigation of a text to gain
better compression.

> That is why you can't tell whether I am writing real mandarin
> chinese (pinyin) below or real gibberish.
> Ni xue yao xuexi hen duo!

No I cannot tell, but with a bit of investigation I could. Once I know which
one it is I would be able to find the true information content. If you were
speaking mandarin I would be able to find the information content of that
language, if it was gibberish I would just igore you. You see If you speak
gibberish and I ignore you I gain the same information as if I listen to
you, this is therefore the ultimate compression.

You are not helping your case with examples like this.

> Brad wrote today:
> >As I said above. The better we know the source the closer we can
> model the
> >information content. Example given was if we know what language
> a text is we
> >can achieve better compression on it.
> >
> >Therefore if we have no knowledge of a source then yes, we
> cannot tell them
> >apart. HOWEVER this is NOT the TRUE information content, this is just the
> >model we are using.
> >
> >The better the model the closer it is to the TRUE information
> content. And
> >if we know that one source is gibberish we can just turn it off
> and ignore
> >it.
> >
> >Therefore it follows that if we know that mutations are caused by RANDOM
> >mutations then we can confidently say they do not add information.
> >
> >If on the other hand the mutations are not random in origin then
> you have a
> >valid argument. I take this as another indicator of intelligent design.
> >
> >Here is an exam question from a 1997 exam paper given by the
> E&EE department
> >at UWA:
> >_____________________________________________________________
> >Do you aggree or disagree with the following statments?
> >
> >"Information theory is pure nonsense! Noise is usually modelled
> as a random
> >source and a random source contains the most information since
> all symbols
> >are equiprobable. Thus the most informative information source is noise."
> >_____________________________________________________________
> >
> >Post what you think the correct response to this and your reasons why....
> Have you ever heard the term 'Fallacy of equivocation"? That is
> what the above is.

Hehe, well it is also a real exam question on information theory. Tell us
Glenn, is noise the most informative source?

just post a yes/no answer, its not hard.

btw, I have the answer as written by the senior lecturer on info theory at
UWA, Dr Roberto Togneri. If you want to check his credentials then vist.

As a indication of what your answer would be here are some quotes:

"the sequence that has the most information is the sequence with the whitest
-- Glenn

"A randomly generated sequence of messages produces >>> MAXIMAL <<<
-- Greg

I get tired of you telling me I don't understand my own field of study so
just answer this question and lets see what you understand.

> >Hint: read what I have said above.
> Hint: read Greg Billock's post from Sunday.

yep, Greg is wrong also....your point?

> [snip]
> I wrote:
> >>
> >> I absolutely agree that randomness would korupt thu missage
> won wunts ta
> >> sind. But by making new arbitrary definitions about the meanings
> >> of a word,
> >> the language evolves. Meaning is not information by the definition of
> >> information theory. Why do you think we of English descent don't still
> >> speak Latin? Mutations to the spoken language were given new arbitrary
> >> definitions. The language mutated but it wasn't destroyed.
> >> Meaning is not the same as information.
> >
> Brad replied today:
> >Language changes, yes. Does it gain more information? NO
> >Is modern english more informative than Latin?
> Actually if you measure the information content in a sequence
> containing all the words of a given language laid end to end,
> then yes English, having VASTLY MORE words than Latin is more
> informative than Latin. We have many more nouns in modern
> English than were in Latin. They didn't have terms like
> 'information theory' 'Electrical Engineering,' 'Quantum
> Mechanics' etc ad nauseum.

ROFL. I am sorry but the true information content of a language is
significantly harder to find than like that.

It would be nice if the world was that simplistic but it just isn't. More
words does NOT mean more information.

> Brad wrote today:
> >I would very much enjoy being able to debate this with someone more
> >knowedgable than you on this topic (info theory), so please
> invite your som
> >to join (or mail me privately if he doesn't want to join the group)
> Now, let me get this straight. You didn't know who Yockey was;
> originally you said you hadn't read any of his articles; you
> didn't understand that a Markov transition matrix could be
> constructed which didn't rely on the previous character (you
> admitted that I was correct );

You were correct, but I knew it was possible. I just have never heard of
anyone using it in that way.

> you didn't know how DNA affects
> proteins and then the survival of the organism leading to macro
> evolution; you didn't know what a memoryless system was; you are

I know exactly what a memoryless system is.

> using mathematics for a non-degenerate code;

I know that the code is not important in the maths I used.

> and you continue to
> equivocate on information used colloquially and information as
> defined mathematically (as pointed out to you by two of us).

And both of you are wrong on that. See the exam question. It is a common

Below is the url for the exam paper I took the question from if you wish to
verify it is a genuine exam question. Also you can see it is worth about 1%
(an easy question IF you understand)

> And
> you think I am not knowledgeable??? There is a word for this,
> but I would rather not use it.

I explicitly stated that I would like someone more knowledgable on info
theory. Yes, I think I know more than you on info theory, I did not comment
on anything else.

I have stated that I am not a biology expert. I am also sure you know more
in your professional field of study than I do, that is to be expected. (and
vice versa)

Incidentaly if you claim to be an expert in info theory then feel free to
download a copy of an Info theory exam paper from:

1995, 1996 and 1997 papers are there. I have full worked solutions to the

Maybe you would like to see how much you really do know??

If you do not claim to be an expert then I don't see how you can get upset
with my statement.

Brad Jones
3rd Year BE(IT)
Electrical & Electronic Engineering
University of Western Australia