# RE: Design detection and minimum description length

From: Iain Strachan (iain.strachan@eudoramail.com)
Date: Sat Dec 09 2000 - 21:59:57 EST

• Next message: Robert Schneider: "Re: George's reply to Howard"

I wrote:
>
>>I really can't (sorry, Glenn), see any problem with this, and how
>>one can go from this to saying that the method has failed because
>>the researchers had to be told it was designed.
>

Glenn replied:

>Let me try in another manner. If I correlate two sequences and get a
>correlation coefficient of .9, my personal state of knowledge isn't what is
>telling me that the sequences are quite similar. The math is telling me. If
>I run a Fourier transform and find that the highest periodicity of a
>sequence is 50 cycles/second, then it isn't my personal state of knowledge
>which tells me that. It is the math. These two procedures allow conclusions
>to be made without any reference to the personal state of knowledge I have
>prior to running the programs. They are objective.

Two points:

(1) The effectiveness of the Fourier transform analysis arose from
personal knowledge that a periodic sequence can be analysed as a
Fourier series. Personally I don't see the difference conceptually
between spotting a Fourier series (i.e. a periodic function) and
spotting a sequence derived from primes. Both are bits of maths that
you had to know in order to make the deduction.

(2) The point about the correlation coefficient is more interesting,
because it precisely illustrates the point that you do need inside
knowledge, and that you can't rely on some "objective math" formula
that allows you to crank the handle and churn out meaningful results.
First, let me quote from a standard text-book on numerical analysis
(Numerical Recipes in C), talking about the correlation coefficient:

"When a correlation is _known to be significant_ [emphasis mine], R
is one conventional way of summarizing its strength. In fact, the
value of R can be translated into a statement about what residuals
(root mean square deviations) are to be expected if the data are
fitted to a straight line by the least-squares method [ref to
equations skipped] ... Unfortunately, R is a rather poor statistic
for deciding _whether_ [emphasis in the original] an observed
correlation is significant, and/or whether one observed correlation
is significantly stronger than another. The reason is that R is
ignorant of the individual distributions of x and y, so there is no
universal way to compute its distribution in the case of a null
hypothesis" [Press, Teukolsky, Vetterling & Flannery: "Numerical
Recipes in C", Second Edition, Cambridge University Press, 1992,
p636].

So, what are Press et al saying here? That the correlation
coefficient R is pretty meaningless unless you know that the data are
can't detect design unless you tell him it's designed, by the "side
information"). Does this mean that the Correlation coefficient is a
totally useless statistic? Not at all. They go on to discuss the
general shape of the distributions of x and y (concerning the
fall-off rate of the tails of the distributions), that allow one do
derive meaningful results and a distribution for R. What it comes
down to is that if your data when plotted on an X/Y scatter plot
looks a bit like a long thin ellipsoid, then you've good reason to
suspect that they are correlated, and from that, you can get
meaningful results by comparing values of R. So, you have to use
your intelligence and prior knowledge of what correlated variables
look like, in order to use the correlation coefficient.

To see just how meaningless the results get if you just put the
numbers into the formula and crank out the result, consider the
following experiment that you can easily perform in Microsoft Excel.

Generate 100 pairs of (x,y) points from random numbers in the range
0-1 (this can be done with the Excel RAND() function. Add a 101st
(x,y) pair and make it equal to (100,100). Now compute the
correlation coefficient between the two sequences (using the Excel
CORREL() function). You will get an answer for R that is close to
0.999. So your "objective math" is telling you that the sequences
are highly correlated.

But something tells me that these sequences are not highly
correlated. What do you think it is? It's my inside knowledge of
what correlated data ought to look like. That tells me that the
(100,100) point is a massive outlier, and should be discarded. (When
R drops to around 0.01).

Is this a silly example that wouldn't occur in real life? I've seen
a lot worse than that. In the first Neural Nets application I worked
on (that ended up as a successfully deployed analysis tool), I was
using a neural net to predict plasma electron density profiles inside
a fusion experiment (the JET vacuum vessel). The electron densities
were of the order of 10^20 per cubic metre. However, the data file I
received had a few electron densities that were of the order of 10^76
per cubic metre. My background knowledge of Physics told me that you
just don't get electron densities of 10^76 per cubic metre in a
vacuum vessel (or anywhere else for that matter ;-). I therefore
concluded that these would be down to a processing error in the
computer program that gave me the file of data, and discarded the
offending items. If I'd naively shoved it all in to the neural net,
it would have ended up predicting everything in the region of 10^75 -
10^76, and the results would have b!
een completely useless.

The moral of the story is that you can't make any statistical
inference (whether it's correlation, pattern detection, or "design")
just by blindly plugging your data into some formula, and relying on
the maths to tell you the answer. You have to use your background
knowledge if it's not to be "Lies, damned lies and statistics".

That is why I don't believe your objection to Dembski's use of "side
information" is a valid one. There may be other reasons for
criticizing Dembski, but this isn't one of them.

I have argued, further, that I believe Dembski's methodology does
have a mathematical procedure, that can be understood in terms of the
minimum description length principle, and that this framework can be
applied to numerical and non-numerical data. While this appears to
be separate from what Dembski writes, it is essentially exactly the
same idea as his discussion of the compressibility of Bit Strings in
terms of the Chaitin/Kolmogorov/Solomonoff "Algorithmic Information
Theory, discussed in detail in Section 2.4 of "No Free Lunch" (p58ff).

Apologies for the long delay in responding to this. Other things intervened.

Regards,
Iain.

Join 18 million Eudora users by signing up for a free Eudora Web-Mail
account at http://www.eudoramail.com

This archive was generated by hypermail 2.1.4 : Mon Dec 09 2002 - 17:57:26 EST