We just held the 7th (Virtual) LightOn AI Meetup, and it was a blast! For this chapter, we had Adam Gaier, Senior AI Research Scientist at Autodesk Research presenting his work on Weight Agnostic Neural Networks, done while at Google Brain and INRIA. You may check out the slides for this talk while we prepare the video of the meetup for upload. It will be posted soon on the Youtube channel of LightOn, subscribe to get notified, and subscribe to our Meetup to get notified of the next events, there is more cool stuff coming!
The presentation started from a discussion on innate abilities in animals, like hatchling marine iguanas able to survive snakes attacking en masse and innate abilities in machines, exemplified by Deep Image Prior and Liquid State Machines.
Not all neural network architectures are created equal, some perform much better than others for certain tasks — but how important are the weight parameters of a neural network compared to its architecture?
To answer this question, a different kind of neural architecture search is needed: one that de-emphasizes training and weights. Ideally one would repeatedly sample weights and test, but this becomes quickly unfeasible because of the curse of dimensionality. The solution is simple: use a single shared weight for all connections, and estimate performance on several roll-outs. This mechanism produces Weight Agnostic Neural Networks, or WANNs, networks that can perform well with a single shared weight, even when it is assigned randomly!
The WANNs are obtained through a genetic algorithm called NEAT.
The networks are evolved through a set of variation operations
Reinforcement learning however is more forgiving than for example classification problems: it deals with low-dimensional inputs, and embodied agents that interact with the environment. Furthermore, unlike reinforcement, architecture design has been fine-tuned for decades for classification problems. These reasons motivated the investigation of the performance of WANNs on MNIST.
Interestingly WANNs still perform surprisingly well in the high-dimensional setting, reaching 91.6% test accuracy when ensembled, and 94.2% when the weights are trained after the evolution process has finished.
Adam concluded providing a point of view contrasting with most of the neural architecture search efforts undertaken lately: while rearranging components for better learning is at the heart of most architecture search methods, choosing the correct architecture (e.g. Convolutional, LSTMs) does more than any hyperparameter search. This should enlighten the path forward: search for innate architectures and blocks deserves attention too!
We delved more deeply into this topic in the Q&A , coming to the conclusion that this is partly due to the Hardware Lottery: our best architectures right now (CNNs, Transformers) are also the better suited to our current hardware stack, this is not by chance. Among other things, our mission at LightOn is to develop hardware to give different algorithms the chance they deserve!
At LightOn we compute dense random projections literally at the speed of light. If you want to try out your latest idea, you can register to the LightOn Cloud or apply to the LightOn Cloud for Research Program!
We would like to thank Victoire Louis for undertaking most of the effort to organize this meetup.