Is evolution just another hill-climbing-like algorithm where the gradient is inefficiently estimated via sampling rather than direct differentiation? No. The usual formulations of evolutionary computation (EC) capture little of the algorithmically interesting aspects of evolution, which has led many researchers to wonder (perhaps correctly) whether there’s really any point to EC compared to more rationalized and efficient approaches to ML. This reaction is a bit like deeming human powered-flight a fruitless endeavor after seeing the first ineffectual ornithopers. Let’s not throw the baby out with the bathwater!
Evolution is a meta-learning algorithm that discovers its own heuristics and continually improves the rate at which it learns (something missing from most learning algorithms). It has mechanisms to induce generalization and strategies for smoothing out fitness landscapes to avoid local optima. In combination, its unique features have led to producing artifacts of greater complexity than anything humans have engineered (it produced humans, and all the exquisite complexity of the biosphere). Respect.
Let’s have a closer look:
Evolutionary systems not stuck at equilibrium (more on this in a minute) have an important long-term trend: the speed at which species can adapt to change increases over time. The result of this long-term trend is what we now call “intelligence”. Evolution is deceptive: close up, it seems to find highly specialized, even overfit solutions, but in the long-term, it induces the discovery of quite general solutions. Evolution is an “adaptability” discovery machine.
Biological evolution employs several strategies (often missing from our evolutionary algorithms) to decrease the odds of getting stuck in equilibria. Perhaps the most interesting: it discovers and refines its own heuristics as it learns!
Part 1: Evolution improves its own adaptability
Any evolving ecosystem undergoes regular changes in selection pressure, often due to the coevolving behavior of other organisms. For instance, when a predator evolves new behaviors, the selection pressure on its prey changes.
Thus, speed in adapting to change is selected for, as is the scope of adaptations that can be made by species. A species that can respond to only to a limited number of changes over the course of eons is at a disadvantage to one that can respond to more changes even just a little more quickly. (Example: an organism with only hardwired evolved behaviors cannot respond in time to new behaviors evolved by one of its predators, and is less fit than an organism that can do even a tiny amount of learning to adapt to its predators)
Over evolutionary timescales, we therefore see species evolve an increasing ability to adapt to their environments more quickly. This generates a positive feedback loop: when one species develops an ability to adapt more quickly, it benefits that species but also results in an environment which changes even more quickly, generating further selection pressure on other species to gain more rapid and effective means of adapting to changes.
The long-term trend of this positive feedback loop is what we now call “intelligence”. Intelligence isn’t about any particular skill, it’s about the ability to adapt to an ever-expanding scope of possible changes on shorter and shorter timescales. Provided evolution does not get stuck, intelligence is the result.
This argument can be formalized, but that is the gist: adaptability is selected for (because the rules are changed regularly and species have time to adapt to these changes before going extinct), and this generates a positive feedback loop, where speedier adaptation by one species results in more rapid changes to selection pressure and further increasing the selection pressure on adaptability.
Part 2: Necessary features of the biosphere’s evolutionary algorithm
Here’s a likely incomplete list of necessary features to replicate the biosphere’s unparalleled open-ended evolution:
In an evolving ecosystem, there isn’t one fitness function, there are K fitness functions at any given time, one for each “niche”, and these fitness functions are essentially uncorrelated. Organisms are roughly ranked based on the max of these K fitness functions (since species can move between niches assuming they develop the traits enabling them to compete in the new niche). As K grows large, assuming the functions are uncorrelated, the merged fitness function becomes smooth. This same smoothing of merged fitness functions also plays out within a single organism, where evolved structures are regularly repurposed (put to their maximum good use). Without this, the landscape is often too rugged to avoid bad equilibria.
The set of fitness functions changes over time, at an increasing rate, a strategy to induce general solutions and adaptability. But this feature works in concert with another: selection does not act by instantly killing lower-fitness organisms or species. Lower fitness organisms have time to adapt to their environment (learning) before dying, and over generations, species have time to adapt before going extinct. By having this slack, it generates selection pressure to develop ways of adapting on shorter timescales–either within the lifetime of a single organism (learning), or over fewer generations of a species.
For instance, crossover is a much more efficient variational operator than mutation, since it generates variation by combining two valid genotypes. When circumstances changed, species that used sexual reproduction were more able to adapt to these changes than counterparts without this means of generating variation.
If the fitness functions are unchanging or selection has no slack, there’s no selection pressure encouraging adaptability in general, in fact there’s no long-term trends at all, and the system just finds an equilibrium.
The variational operators (encompassing genotype-phenotype mappings) are themselves subject to evolution. We should think of variational operators like heuristics, used to explore the search space. The trick is that these heuristics are wrapped up in the organisms themselves, so when natural selection picks organisms with good survival traits to reproduce, the heuristics which led to those traits are then given further weight (and these heuristics are then further tweaked). In this sense, we can think of evolution as discovering and refining its own heuristics. We can also think of these heuristics as altering the fitness landscape, essentially projecting to a different space that is much smoother.
This last point might be the most important. If the variational operators and genotype-phenotype mapping is static, this limitation eventually overwhelms any smoothing effect of multiple niches. Over time, as the products of evolution become more complex, the odds of random change producing anything useful approach zero—we can’t put an airplane in a giant blender, pulse it a few times, and hope the output is a fighter jet. We would be lucky if the output even functions as an airplane at all! Biological evolution has become increasingly surgical in its variation, matching the complexity of the species it evolves for whom a much smaller number of changes are even viable.
Many uses of evolutionary algorithms have a hardcoded genotype-phenotype mapping (or no such mapping - the mapping is the identity function). Since this is very limited, people often try to engineer the mapping to get better results. But a human engineering this mapping is just “the human doing the learning”, moving the starting line closer to the finish line. Perhaps there’s something to be said for letting an automated procedure handle the “last mile”, but for most interesting domains, coming up with good genotype-phenotype mappings that smooth the search space is exactly the sort of thing we’d like our algorithms to discover for themselves!
The rationalized way of doing something like evolution’s meta-search for heuristics might be to: 1) Generate N heuristics for moving through the search space. 2) Allocate computational resources to each heuristic based on performance (total fitness achieved, for instance) 3) Vary the heuristics, perhaps at a much slower rate than the rest of the system. What isn’t clear to me is how to represent such heuristics in a way that can also be scaled from the very simple to the very complicated. (I wonder: is evolution using the same tricks to evolve “meta-heuristics”, and “meta-meta-heuristics”?)
The features of evolution work in concert, and there are probably other features we are missing. But the cartoon versions of evolution we see in EC, even if they are useful algorithms for some problems, probably aren’t capturing enough to replicate the power of biological evolution.
There’s been a tendency in popular accounts of evolution to give a sense of “we’ve got this all figured out, it’s as simple as Darwin said”, so as not to yield any ground to those who would treat every gap in our understanding as evidence that evolution doesn’t work and that in fact, “God did it”. Unfortunately, responses like these have shaped the common understanding of evolution. The simplistic common understanding is insufficient to explain the biosphere’s learning performance (as anyone who has tried building artificial evolutionary systems can attest), but it’s what many people know, including researchers in ML who often dismiss evolutionary methods as cargo-cult biomimicry.
Perhaps it’s time to give evolutionary methods another closer look!