TL;DR
Your company's knowledge and expertise are buried under layers of documentation scattered across multiple locations, organized in heterogeneous ways. Large language Models (LLMs), like ChatGPT or Claude, are great at making sense of vast knowledge bases and synthesizing key information in a digestible format. However, they cannot scale beyond a certain limit due to latency or performance constraints: you would simply have to wait too long for answers, while distracting the generative engine with too much irrelevant information for the task at hand.
To address this challenge, we typically run LLMs in a Retrieval Augmented Generation (RAG) pipeline. Once we receive a user query, we first perform a search to find relevant sources, and then feed the retrieved information to the LLM together with the user query in order to obtain the answer.
Since its release in 2018, transformer models based on the BERT architecture have been the cornerstone of search. Even if LLMs can be used for these same tasks, their use at scale, in environments with high concurrency, is prohibitive. BERT-like models instead are much cheaper to serve, offer a better latency, and are much more controllable.
Our customers want private, secure, and performant solutions to unearth the know-how submerged in the company knowledge base. In effect, this means deploying on customer infrastructure, and rightsizing our solution to minimize total cost of ownership (TCO).
Stepping back to take a global view of the challenges ahead of us, it was nearly obvious to invest time and GPUs to finally bring BERT to 2024, to the modern ages.
Together with Answer.AI and friends, we announce the release of ModernBERT today.
ModernBERT improves over many different axes with respect to its ancestor. It performs better on benchmarks, it can handle a larger context size, it can handle code, it is faster to finetune on your data than any other model, and finally it offers very fast inference on GPU-constrained setups.
We release a long list of artefacts together with the models, and we encourage the community to build on them.
Paradigm unearths the know-how of your company to accelerate your work, while keeping it safe. Try Paradigm now!
Links
LightOn sponsored the compute for this project on Orange Business Cloud Avenue