Software is eating the world, machine learning is eating software, and, well, transformers 🤖 are eating machine learning.
We discussed this in our previous meetup: attention mechanisms are very apt to parallelization ⛓️ and avoid catastrophic forgetting 😶🌫️ , however, their limited scalability is a roadblock 🚧 for certain applications. We have also previously talked about Reservoir Computers, which are fast and lightweight neural networks.
Reservoir Transformer borrows from a long history of research on the power of 🎲 random transformations: the Johnson-Lindenstrauss Lemma, random kitchen sinks, Cover’s Theorem, and Echo State Networks.
The 🔑 idea of the work of Douwe and collaborators is strikingly simple: 🥶 freeze some layers at initialization.
They evaluate different freezing schemes with different “reservoir” layers: an encoder reservoir, a feedforward network without attention, and also CNN and BiGRU. All of these layers seem to work well, and the best freezing scheme is alternating trainable layers with frozen ones.
The authors show improvements in 🕓 clock compute time until convergence, as well as overall performance 🏆 on various machine translation and language modeling tasks. The pretraining perplexity is similar, the training time is reduced up to 25%, and, strikingly, the downstream performance is better overall!
Reservoir layers seem to improve efficiency and generalization, acting as “cheap” additional parameters.
The better efficiency stems from 🦘 skipping the weight update portion for some of the weights (this is so simple, and at the same time quite impressive). The paper presents some preliminary results on 🔙🦘 backskipping, where the entire backward pass for reservoir layers is skipped altogether, leading to further efficiency gains.
This work opens incredibly exciting perspectives, especially in the field of hardware for machine learning, a topic we also addressed with Sara Hooker last December.
At LightOn we build optical processing units that you can use to compute random transformations at scale. If you want to try out your latest idea, you can register to the LightOn Cloud for a Free Trial or apply to the LightOn Cloud for Research Program!