Summary of LightOn AI meetup #14

WeightWatcher a Diagnostic Tool for Deep Neural Networks

June 4, 2021

TL;DR

t is about the one-year mark since we started our (virtual) LightOn AI Meetups, and to mark the anniversary 🥳, we had Charles Martin as a guest. Charles is the Chief Scientist at Calculation Consulting and he presented his work on WeightWatcher: a Diagnostic Tool for Deep Neural Networks, a Python package built around a series of papers.

The 📺 recording of the meetup is on LightOn’s Youtube channel. Subscribe to the channel and subscribe to our Meetup to get notified of the next videos and events!

Weightwatcher is a Python package dedicated to analyze trained models and inspect models that are difficult to train🏋️. It can be used to gauge improvements in model performance and predict test accuracies across different models 🔮(without ever looking at the data!). It can also detect potential problems when compressing or fine-tuning pre-trained models 🗜️.

It is based on ideas from Random Matrix Theory, Statistical Mechanics, and Strongly Correlated Systems. The main idea is to fit a power law to the tail of the empirical spectral density (ESD) of the layer weights. The power-law exponent α is what helps us detect potential problems.

Fitting a power law in log-log to the tail of ESD needs to be done carefully!

Poorly trained models tend to have large layer α, as can be seen for example comparing GPT and GPT-2: the same model trained on dirty versus well-curated data.

GPT is trained on dirtier data than GPT-2, and it shows in the unusually large α values for some of the layers.

In particular, a weighted α can predict the test accuracy for models in the same architecture series across varying depths and other architectures and regularization parameters 📉.

The correlation between test accuracy and weighted alpha is remarkable.

Finally, there is some early research to extend this idea on when to perform optimal early stopping 🛑, or per-layer learning rate settings 🎛️, or detect over-fitting 🔍. Quite a program! We look forward to even more insightful empirical metrics in Charles’ WeightWatcher in the future. The video of the meetup is here.

Ready to Transform Your Enterprise?

TL;DR

Recent Blogs

Sodern, a subsidiary of ArianeGroup, Selects LightOn to Industrialize Generative AI Within a Secure Framework

Announcing BioClinical ModernBERT: a new SOTA encoder model for Medical NLP

LightOn Unlocks Agentic RAG with new SOTA Model Reason-ModernColBERT

Ready to Transform Your Enterprise?