It is often claimed animals are more sample efficient, meaning that they can learn from many fewer examples (sometimes from single examples), than artificial neural networks and that one of the reasons for this is the useful innate inductive biases sculpted into the brains of animals through adaptations over evolutionary time scales. This claim is then usually followed by a prescriptive advice that we do something similar with artificial neural networks, namely, build in more innate structure into our models (or at least include an outer optimization loop in our models that would learn such useful inductive biases over longer time scales, as in meta-learning). Tony Zador and Gary Marcus made arguments of this sort recently.
In this post, I’d like to take issue with arguments of this sort. Actually, my objection to these kinds of arguments is already well-known. Zador has a section in his paper addressing precisely this objection (the section is titled “supervised learning or supervised evolution?”). So, what is the objection? The objection is that arguments of this sort conflate biology and simulation. They assume that the learning that happens in an artificial neural network is comparable to the learning that happens in a biological system over its individual lifespan. But there’s no good reason to think of artificial learning in this way. We should rather think of it as a combination of the learning that happens over an individual lifespan and the adaptations that take place over evolutionary time scales. When we think of artificial learning in this light, the sample efficiency argument in favor of animals falls by the wayside, because biological evolution has been running the most intense optimization algorithm in the biggest and the most detailed simulation environment ever (called “the real world”) for billions of years (so much for “one-shot” learning!).
As I said, Zador is aware of this objection, so what is his response to it? As far as I can tell, he doesn’t really have a very convincing response. He correctly points out the differences between biological optimization and learning in artificial networks, but this doesn’t mean that they can’t generate functionally equivalent networks.
For example, biological optimization runs two nested optimization loops, the inner loop characterizing the learning processes in individual lifespans, the outer loop characterizing the adaptations over evolutionary time scales. This is similar to a learning paradigm called meta-learning in machine learning. And because of its similarity to biology, Zador is very much sympathetic to meta-learning. But in my mind the jury is still out on whether meta-learning has any significant advantages over other standard learning paradigms in machine learning. There are recent results suggesting that in practical problems one doesn’t really need the two separate optimization loops in meta-learning (one loop is all you need!). Moreover, if one trains one’s model in a sufficiently diverse range of problems (but crucially using a standard learning paradigm, such as supervised learning or reinforcement learning), “meta-learning” like effects emerge automatically without any need for two separate optimization loops.
The core problem here, I think, is again conflating biology and simulation. Just because we see something in biology doesn’t mean we should emulate it blindly. Biology is constrained in many ways simulation isn’t (and vice versa). Of course it makes sense to use two separate optimization loops in biology, because individual lifespans are limited, but this isn’t true in simulation! We can run our models arbitrarily long on arbitrarily many tasks in simulation.
I think this (i.e. the mismatch between biology and simulation) is also why naive ways of emulating the brain’s innate inductive biases, like trying to directly replicate the concept of “cell types” in the brain is usually not very effective in artificial neural networks, because in my opinion these features are essentially consequences of the brain’s suboptimal learning algorithms (over developmental time scales), so that it has to off-load a significant chunk of the optimization burden to evolution, which needs to craft these intricate cell types to compensate for the suboptimality of learning (over developmental time scales). Learning in artificial neural networks, on the other, is much more powerful, it is not constrained by all the things that biological learning is constrained by (for example, locality and limited individual lifespans), so it doesn’t really need to resort to these kinds of tricks (like different innate cell types) to easily learn something functionally equivalent over the individual lifespan.