ModernBERT: Better, Faster, Stronger Knowledge Retrieval and Classification

Accelerate your workforce with secure access to all the expertise your company has built over the years.

December 19, 2024

TL;DR

"This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board, with a 8192 sequence length, better downstream performance and much faster processing.

ModernBERT is available as a slot-in replacement for any BERT-like models, with both a base (149M params) and large (395M params) model size."

Your company's knowledge and expertise are buried under layers of documentation scattered across multiple locations, organized in heterogeneous ways. Large language Models (LLMs), like ChatGPT or Claude, are great at making sense of vast knowledge bases and synthesizing key information in a digestible format. However, they cannot scale beyond a certain limit due to latency or performance constraints: you would simply have to wait too long for answers, while distracting the generative engine with too much irrelevant information for the task at hand.

To address this challenge, we typically run LLMs in a Retrieval Augmented Generation (RAG) pipeline. Once we receive a user query, we first perform a search to find relevant sources, and then feed the retrieved information to the LLM together with the user query in order to obtain the answer.

Since its release in 2018, transformer models based on the BERT architecture have been the cornerstone of search. Even if LLMs can be used for these same tasks, their use at scale, in environments with high concurrency, is prohibitive. BERT-like models instead are much cheaper to serve, offer a better latency, and are much more controllable.

Our customers want private, secure, and performant solutions to unearth the know-how submerged in the company knowledge base. In effect, this means deploying on customer infrastructure, and rightsizing our solution to minimize total cost of ownership (TCO).

Stepping back to take a global view of the challenges ahead of us, it was nearly obvious to invest time and GPUs to finally bring BERT to 2024, to the modern ages.

Together with Answer.AI and friends, we announce the release of ModernBERT today.

ModernBERT improves over many different axes with respect to its ancestor. It performs better on benchmarks, it can handle a larger context size, it can handle code, it is faster to finetune on your data than any other model, and finally it offers very fast inference on GPU-constrained setups.

We release a long list of artefacts together with the models, and we encourage the community to build on them.

Paradigm unearths the know-how of your company to accelerate your work, while keeping it safe. Try Paradigm now!

Links

LightOn sponsored the compute for this project on Orange Business Cloud Avenue

Ready to Transform Your Enterprise?

TL;DR

Links

Recent Blogs

Making Knowledge Machine-Readable

When buzzwords meet reality: MCP integration comes to LightOn

LightOn, Edarat Group, and NVIDIA collaborate with Orange Business to launch Live Intelligence in the Gulf region

Ready to Transform Your Enterprise?