Beidi Chen, a Postdoc Researcher at Stanford, was the guest of LightOn’s 13th AI Meetup and presented her work on SLIDE&MONGOOSE: LSH Frameworks for Efficient Neural Networks Training⚡ to appear at ICLR 2021.
We have previously talked about co-designing hardware and software ↔️ to unlock the next generation of machine learning models 🤖, and this talk fits perfectly in that narrative.
The underlying idea is to rely on Locality Sensitive Hashing (LSH) to trade accuracy for efficiency in performing the giant matrix multiplications that we find in attention layers and large output layers for extreme classificationproblems. However, a naive application of LSH introduces high overhead and is therefore not useful in practice.
The main insight 🔎 behind MONGOOSE is that most weights do not change enough during training to trigger LSH updates: changes are large at the beginning but plateau early, with only 1–5% of hash codes changing per epoch on average. If we had an oracle 🔱, we could reduce the frequency of LSH update by a factor of 100x.
The solution it to maintain a low-cost structure that acts as a scheduler for the updates: this structure holds two copies of the network parameters that differ only for a few coordinates. When the difference between the two copies exceeds some value, or when there are many coordinates close to the decision boundary, the hash codes are updated. The last hypothesis is the key trick that allows us to leverage efficient parallelism.
The second trick is to introduce Learnable LSH to better separate the data in the nearest neighbor search. Every time the scheduler triggers an update, a training signal based on a triplet loss is computed to train the LSH parameters.
Since the update time is related to the query time, the overall update overhead can be reduced even if a training phase is introduced, provided that the scheduler performs well.
MONGOOSE is 5–20x faster and up to 4x more memory efficient than the baseline, without sacrificing accuracy on extreme classification and NLP tasks.