From: Iain Strachan (email@example.com)
Date: Sat May 10 2003 - 08:19:11 EDT
Thanks for your reply. I'll give the questions a shot.
Jim writes: It seems to me that successive approximations solvers might not be quite the same as the model described. In the first case, it is presumably a convergence on a value (or values) that best satisfy the problem described. In the latter, it sounded more like the computational algorithms were being mutated, not just parameters. Isn't that a different critter?
Yes, if you like it was a symbolic manipulation that was taking place, so the "maths" is different, but really it's still just a function that takes two numbers (the AX and BX registers in the "virtual machine") and calculates an output in the CX register, i.e. F(P,AX,BX) = CX. (here P represents the list of the set of program instructions, and it is these parameters that are being optimized). Elsewhere on the paper's web-site, they describe in more detail how each "organism" was evaluated. Essentially, they had a test set of 32-bit input values for AX,BX & they ran the program on all these examples and compared the CX output with the desired result. If all the CX values matched the correct output for a particular logic function, then the organism was credited by the appropriate score (2,4,8,16 or 32). This can be expressed as an equation solving problem:
F(P, AX(n), BX(n)) - CX'(n) = 0 (n varies over the number of test input pairs)
Here CX' represents the desired answer to be found in the CX register, ie the appropriate logic function.
Another point worth noting here is that there has to be an external computer algorithm that runs this test and already knows what the answers (CXdesired) are supposed to be. These desired values are no different in principle (IMO) to the target string "METHINKS IT IS LIKE A WEASEL", but I believe Michael Roberts has already made this point.
I would not think you a crank - I don't know you well enough, and you don't much talk like one! Are all GAs alike and therefore dismissable on the basis of this one that didn't work well?
Sorry about the "crank" phrase. I just get the impression on this group that anyone who leans towards ID is some kind of dangerous lunatic who wants to undermine your faith, your theology and the teaching of your children. I have what I think are legitimate doubts about the theory of Evolution, at least when based on pure mutation and natural selection (and crossover as well for that matter). Now, admittedly I didn't have any doubts like this until a YEC friend lent me a copy of Behe's book; but having read it & comparing with my own scientific experience of working with genetic algorithms, and other optimization techniques, I realised, in all honesty, that I could no longer pretend that I believed evolution as taught by the gradualist school of thought, and as typified by this Nature paper, was a satisfactory explanation of how we came to be. I would be engaging in a kind of doublethink in noticing that it was useless for engineering problems & then pretending vaguely that it must work somehow for the real engineering problems that it has apprently solved in nature. I did spend a lot of time experimenting with GA's in the hope that they could be made more useful, taking other inspirations from the biology (e.g. having "dominant and recessive genes" , or "competing subpopulations"). But they all have much the same problems. There are some successful applications of them, but they are (a) well publicised and given a high profile and used to plug Biological Evolution, and (b) few and far between, and (c) require a great deal of cleverness in designing the fitness evaluation and problem coding if they have a chance of working. (See my other post about Gray codes for example).
As regards the GA's question. They all work on pretty much the same principle, though there are variations on how the selection is done, crossover, etc. The biggest variability, however, is in the decoding of the bit string (genotype) to the problem representation (phenotype). However, this is not actually part of the GA itself; all the GA does is to generate some bit strings, and then ask the user to supply a subroutine (technically called a "callback") to evaluate the string and return a fitness value. The decoding takes place, therefore, in the user's code.
I wouldn't "dismiss" all GA's for all applications, however. There are some applications for which GA's are suited, and these are where single mutations can lead to improvements. A good example of this is the "Travelling Salesman Problem" (TSP), which can be solved by GA's (though I don't know if they work any better than other methods). In the TSP, however, one has again to design the "mutation" operator to be problem-specific - it's not a general purpose algorithm in that case.
Since Nature seems to have done a pretty good job at solving difficult engineering problems, I'm therefore distinctly underwhelmed at the claim that this toy one-variable-at-a-time problem is a demonstration of the evolutionary origin of complex features.
Are not all of our models toys when compared to the reality? But such models are indispensible in understanding our world and how it behaves, enough so to allow us to make our way in it, make use of its resources, and occasionally husband it. I still recall the one- and two-dimensional physics problems that illuminated the way to understanding of higher degree-of-difficulty problems. I am reluctant to be too dismissive of simple models like the one described because the underlying principles and constructs of nature seem to be so few in number and simple in fundamental relationships.
The word "Toy Problem" has a specific meaning in my field. It simply means a problem of low dimension (complexity) that is used to test out whether your software algorithm works. The progression from "Toy" to reality involves giving the technique to a much more complex problem (which itself is of course only an approximation to reality, but at least a usable one, for example by a design engineer). You might test out a Finite Elements stress analysis code for a bridge by having a simple (toy) model with around 12 elements. Once you are happy with the answer, you might run a much more detailed analysis with thousands of elements. Finite Elements analysis scales well with size & the problem remains tractable. But how would the Avida digital organisms fare?
The EQU function is described as a "complex function", but is it really that complex? It requires 5 NAND operations. The fact that it performs bitwise on 32 bits in a register is actually irrelevant, because the one logic primitive the program had (the NAND function ) also operates on 32 bits as well. It would be the same if it was a 1-bit EQU operation. All it does is to test if two binary digits are equal or not. Suppose instead, you wanted to multiply two numbers together, and you only had add subtract and bit shift (which I think are in the Avida instruction set), but no multiply. To code up a routine that multiplied AX and BX together would take literally hundreds of instructions. How would you do it? The paper showed conclusively that evolution didn't take place in their simulations unless you placed intermediate rewards on lesser functions. So you would have to think up a plausible set of intermediate operations that were less complex than multiplying two numbers together, and define these operations as being "useful", even if they gave meaningless answers. I can think of a way of doing this if you wanted to multiply by, say a fixed number, e.g. 5, because you can do it by doing five adds & you could assign credit increasing credits for programs that multiplied by 2, 3 and 4 as intermediates. But a general purpose multiplication routine that multiplies two numbers together does not do it by just doing the required number of additions; it would have loops and so forth. For example just about the simplest way of doing it might be something like:
1: Set CX to 0
2: CX = CX + AX (not sure how many instructions this would require)
3: BX = BX -1
4: If BX greater than 0, goto 2 else stop
(at the end CX should be AX * BX, ignoring overflows for simplicity)
I do not think it would be easy to think of a plausible set of intermediate steps that would arrive at the above program & if you did and assigned increasing orders of credit to each one, you would have effectively told it the answer, wouldn't you?
A further problem arises with the above. I bet no computer does multiplication like this, because the length of time varies linearly with the size of the number in BX. If BX is 2^30, it takes ages to run. I believe one of the criteria for survival in Avida is the length of time it takes to run because the digital organism consumes "resources" that it gained via the rewards with each processor instruction. To do it efficiently would require a much more complex algorithm, like we do long multiplication, involving bit shifting, accessing individual digits & what not. I think it would be almost impossible to dream up a plausible set of intermediate "functions" that led to this from, say the above algorithm, that still gave meaningful and useful results. And again, if you did, you would have effectively "pre-designed" the solution, dare I say it "Intelligently".
Now, I hope this illustrates the difference between a "toy" problem and a "real" one. I know multiplying two integers is a pretty mickey-mouse problem (it happens hundreds of times every time you press a key in Microsoft Word because it has to figure out where to place the character you've typed), it is pretty advanced technology compared with string a fixed number of NAND's together.
Jim: (concerning Parallel processing)
These observations notwithstanding, desktop computer workstations for demanding applications still have multiple processors. And then there are the United Devices computing schemes that split up the computing tasks among millions of computers like mine to use their otherwise unproductive time for parallel computing efficiency in sorting out candidates for combatting cancer or anthrax. But I think you've made the point that this "parallel computer" parallel can be pressed too far.
Yes, I ran the United Devices software for a while on the computer at work for a while until problems with the "Firewall" made it impossible for it to "phone home" with the results. These types of application are common when the problem you are dealing with can be easily split up into a number of independent workpackages (you are testing a set of molecules for activity against a particular protein, right?). The speed-up will be sub-linear on the number of processors, because they don't all run at the same speed, and different users will have different amounts of "idle time". However, they can indeed achieve great things because of the millions of computers used. But they don't interact.
This leads to another observation about evolution being able to sample all possible combinations because of parallelism. Unfortunately this cannot be true because each creature only has sex with a very limited number of other creatures (local to it). Furthermore, it is unfortunately true that the highly promiscuous attitudes towards sex we have these days lead to the wild-fire spread of disadvantageous diseases. (I guess as we're all Christians I can afford to be "politically incorrect" here! Maybe the Bible has a point about sexual morality :-)
Hence there is no interaction between the elements, except in the natural selection process. Even if there were interactions, the fact that exponential outcome possibilities are available is irrelevant. Sure, the number of _favourable_ possibilities increases exponentially, but so does the number of _unfavourable_ possibilities, at a much faster rate! (As they say in the paper, most mutations are damaging; only a very few actually improve the fitness). What this means is that the probability of finding a favourable one _decreases_ exponentially, because you are dividing an exponentially increasing number by another one that is also exponentially increasing at a faster rate; e^x / e^2x = e^(-x).
Again, I'm at a disadvantage here in even responding, but I'm puzzled. What you say here seems to suggest that they could not get the results they demonstrated. Or does the difference between your argument and their results lie in the specifics of "damaging"?
Perhaps I wasn't clear. The authors know very well that very few mutations are beneficial, but once a beneficial one has been found, then Natural Selection acts to preserve the good ones and the bad ones don't survive. That is all standard evolutionary theory & is why their simulation worked. What I'm saying is that if you have a much bigger set of possibilities to sample, though there may be exponentially more "good" mutations, the number of "bad" ones will also be exponentially more, at a greater rate. Given that nature can only sample (at random) a very small sample of all the possibilities (e.g via mutation), then the probability of your getting a good one decreases dramatically with the overall size and complexity (hence number of possible genomes).
Um, hope that makes it all a little clearer.
This archive was generated by hypermail 2.1.4 : Sat May 10 2003 - 08:19:46 EDT