Re: Information: Brad's reply (was Information: a very

Greg Billock
Tue, 30 Jun 1998



> >> But with DNA, we have only 4 nucleotides and thus KNOW what we are =
> >> restricted to. This should make the information content calculable. What =
> >> am I missing here?
> >
> >The other 2,999,999,999 nucleotides in the sequence. :-)
> I think I must have miscommunicated here. The information content of a
> sequence is related to the ENTIRE sequence, all 2,999,999 of them. There
> are only 4 letters in the DNA alphabet. That was what I was meaning with
> the 4 nucleotides. I was NOT referring to a 4 nucleotide long DNA
> molecule. Go back and re read what I said in that light.
> I still don't think we need to know how many of the possible 3 billion long
> DNA chains yield life to calculate information.

I see what you mean, but here's the paragraph from Shannon again:

We can think of a discrete source as generating the message, symbol
by symbol. It will choose successive symbols according to certain
probabilities depending, in general, on preceding choices as well
as the particular symbols in question. A physical system, or a
mathematical model of a system which produces such a sequence of
symbols governed by a set of probabilities, is known as a stochastic

When the system has dependencies like this (i.e. not every symbol is
independent of every other symbol), it is too aggressive to calculate
the information as 2*length (2 bits per base). The reason is because
there are long-range interdependencies in the genome.