PyLate-rs: a lightweight tool to compute lightning-fast embeddings

PyLate-rs is a high-performance inference engine for PyLate models, meticulously crafted in Rust for optimal speed and efficiency.

July 8, 2025

TL;DR

"pylate-rs is a lightweight Rust rewrite of PyLate that runs ColBERT models 97% faster by dropping PyTorch/Transformers dependencies. Built with Candle, it works in Python, Rust, and even browsers (WebAssembly).

Key features: ultra-fast ColBERT inference, token pooling for efficiency, and integration with fast-plaid for vector search. Perfect for production deployments where you need millisecond model initialization without the heavy ML framework overhead.

Essentially: ColBERT inference that's fast enough for real-time applications."

PyLate, is a powerful tool for research and training with ColBERT developed at LightOn. It carries a heavy set of dependencies. That's fine for most environments and especially to train state-of-the-art information retrieval models, but it can be a real headache when you just want to run inference in a live application and spawn your model in milliseconds.That's why we built pylate-rs. The main difference is that we've completely removed the PyTorch and Transformers dependencies. Instead, we went a different route and built it with Candle, the deep-learning crate made with Rust. The goal was to create a focused, lightweight tool that does one thing well: compute ColBERT embeddings.

To continue, go read this blog entry.

Ready to Transform Your Enterprise?

TL;DR

Recent Blogs

LightOn Enhances Its Paradigm Platform with Real-Time Web Access Through a Strategic Partnership with Linkup

NOVA: A Guide to Actually Measuring How Your Agent Works on Your Data

Scaling enterprise OCR means solving quality and cost. At the same time.

Ready to Transform Your Enterprise?