RE: Design detection and minimum description length

From: Iain Strachan (
Date: Sat Dec 09 2000 - 21:59:57 EST

  • Next message: Robert Schneider: "Re: George's reply to Howard"

    I wrote:
    >>I really can't (sorry, Glenn), see any problem with this, and how
    >>one can go from this to saying that the method has failed because
    >>the researchers had to be told it was designed.

    Glenn replied:

    >Let me try in another manner. If I correlate two sequences and get a
    >correlation coefficient of .9, my personal state of knowledge isn't what is
    >telling me that the sequences are quite similar. The math is telling me. If
    >I run a Fourier transform and find that the highest periodicity of a
    >sequence is 50 cycles/second, then it isn't my personal state of knowledge
    >which tells me that. It is the math. These two procedures allow conclusions
    >to be made without any reference to the personal state of knowledge I have
    >prior to running the programs. They are objective.

    Two points:

    (1) The effectiveness of the Fourier transform analysis arose from
    your observation that you had a sequence with periodicity, and your
    personal knowledge that a periodic sequence can be analysed as a
    Fourier series. Personally I don't see the difference conceptually
    between spotting a Fourier series (i.e. a periodic function) and
    spotting a sequence derived from primes. Both are bits of maths that
    you had to know in order to make the deduction.

    (2) The point about the correlation coefficient is more interesting,
    because it precisely illustrates the point that you do need inside
    knowledge, and that you can't rely on some "objective math" formula
    that allows you to crank the handle and churn out meaningful results.
    First, let me quote from a standard text-book on numerical analysis
    (Numerical Recipes in C), talking about the correlation coefficient:

    "When a correlation is _known to be significant_ [emphasis mine], R
    is one conventional way of summarizing its strength. In fact, the
    value of R can be translated into a statement about what residuals
    (root mean square deviations) are to be expected if the data are
    fitted to a straight line by the least-squares method [ref to
    equations skipped] ... Unfortunately, R is a rather poor statistic
    for deciding _whether_ [emphasis in the original] an observed
    correlation is significant, and/or whether one observed correlation
    is significantly stronger than another. The reason is that R is
    ignorant of the individual distributions of x and y, so there is no
    universal way to compute its distribution in the case of a null
    hypothesis" [Press, Teukolsky, Vetterling & Flannery: "Numerical
    Recipes in C", Second Edition, Cambridge University Press, 1992,

    So, what are Press et al saying here? That the correlation
    coefficient R is pretty meaningless unless you know that the data are
    correlated already. This is precisely your objection to Dembski (he
    can't detect design unless you tell him it's designed, by the "side
    information"). Does this mean that the Correlation coefficient is a
    totally useless statistic? Not at all. They go on to discuss the
    general shape of the distributions of x and y (concerning the
    fall-off rate of the tails of the distributions), that allow one do
    derive meaningful results and a distribution for R. What it comes
    down to is that if your data when plotted on an X/Y scatter plot
    looks a bit like a long thin ellipsoid, then you've good reason to
    suspect that they are correlated, and from that, you can get
    meaningful results by comparing values of R. So, you have to use
    your intelligence and prior knowledge of what correlated variables
    look like, in order to use the correlation coefficient.

    To see just how meaningless the results get if you just put the
    numbers into the formula and crank out the result, consider the
    following experiment that you can easily perform in Microsoft Excel.

    Generate 100 pairs of (x,y) points from random numbers in the range
    0-1 (this can be done with the Excel RAND() function. Add a 101st
    (x,y) pair and make it equal to (100,100). Now compute the
    correlation coefficient between the two sequences (using the Excel
    CORREL() function). You will get an answer for R that is close to
    0.999. So your "objective math" is telling you that the sequences
    are highly correlated.

    But something tells me that these sequences are not highly
    correlated. What do you think it is? It's my inside knowledge of
    what correlated data ought to look like. That tells me that the
    (100,100) point is a massive outlier, and should be discarded. (When
    R drops to around 0.01).

    Is this a silly example that wouldn't occur in real life? I've seen
    a lot worse than that. In the first Neural Nets application I worked
    on (that ended up as a successfully deployed analysis tool), I was
    using a neural net to predict plasma electron density profiles inside
    a fusion experiment (the JET vacuum vessel). The electron densities
    were of the order of 10^20 per cubic metre. However, the data file I
    received had a few electron densities that were of the order of 10^76
    per cubic metre. My background knowledge of Physics told me that you
    just don't get electron densities of 10^76 per cubic metre in a
    vacuum vessel (or anywhere else for that matter ;-). I therefore
    concluded that these would be down to a processing error in the
    computer program that gave me the file of data, and discarded the
    offending items. If I'd naively shoved it all in to the neural net,
    it would have ended up predicting everything in the region of 10^75 -
    10^76, and the results would have b!
    een completely useless.

    The moral of the story is that you can't make any statistical
    inference (whether it's correlation, pattern detection, or "design")
    just by blindly plugging your data into some formula, and relying on
    the maths to tell you the answer. You have to use your background
    knowledge if it's not to be "Lies, damned lies and statistics".

    That is why I don't believe your objection to Dembski's use of "side
    information" is a valid one. There may be other reasons for
    criticizing Dembski, but this isn't one of them.

    I have argued, further, that I believe Dembski's methodology does
    have a mathematical procedure, that can be understood in terms of the
    minimum description length principle, and that this framework can be
    applied to numerical and non-numerical data. While this appears to
    be separate from what Dembski writes, it is essentially exactly the
    same idea as his discussion of the compressibility of Bit Strings in
    terms of the Chaitin/Kolmogorov/Solomonoff "Algorithmic Information
    Theory, discussed in detail in Section 2.4 of "No Free Lunch" (p58ff).

    Apologies for the long delay in responding to this. Other things intervened.


    Join 18 million Eudora users by signing up for a free Eudora Web-Mail
    account at

    This archive was generated by hypermail 2.1.4 : Mon Dec 09 2002 - 17:57:26 EST