61.4 M

HuggingFace Downloads

105 models + 69 datasets

3,941

HuggingFace Likes

across HuggingFace

2,449

GitHub Stars

38 repos

1,05 M

PyPI Downloads

All time

11,380

Crates.io Downloads

7 crates

Published on April 21, 2026

Retrieval Performance

BEIR vs. Decontaminated BEIR

BEIR · Metric: NDCG@10

Decontaminated BEIR · Metric: NDCG@10

Higher is better

Published on April 7, 2026

OCR Table Benchmark

Offenburg University & University of Mannheim

OCR Table · Metric: LLM judge score

Higher is better

Published on January 19, 2026

OCR Model Performance

Accuracy vs. Throughput

olmOCR-bench · Metric: Olmo Score

Published on May 12, 2026

Accuracy vs. search calls

BrowseComp-Plus: very challenging questions over 100,000 documents requiring multi-step reasoning.
Each point is one LLM agent paired with one retriever.

BrowseComp-Plus · Metric: Accuracy

Open-source LLM + LightOn retrieval matches Frontier LLM + LLM embedder.

Frontier LLM + LightOn retrieval is the best at answering highly complex queries with fewer tokens.

Token use: Open-source LLM + LightOn retrieval matches Frontier LLM + LightOn retrieval.

GPT-OSS-120B BACKBONE

BM25

baseline

Qwen3-Embed-8B

Reason-ModernColBERT

149M

Reason-ModernColBERT

(get_document)

Agent-ModernColBERT

149M

Agent-ModernColBERT

(get_document)

GPT-5 BACKBONE

BM25

baseline

Qwen3-Embed-8B

Reason-ModernColBERT

149M

Reason-ModernColBERT

(get_document)

10 open source contributions released in 18 months. Retrieval, OCR, encoders, long-document reasoning. All data, weights, and recipes published openly.