R&D Benchmarks 2024-2026
10 open source contributions released in 18 months. Retrieval, OCR, encoders, long-document reasoning. All data, weights, and recipes published openly.
51.7M
HF Downloads
105 models + 69 datasets
3,871
HF Likes
across HF
2,371
GitHub Stars
38 repos
969K
PyPI Downloads
all time
9,347
Crates.io Downloads
7 crates
BEIR nDCG@10
Retrieval
Decontaminated Β· multi-vector (ColBERT)
| # | Model | Submitted by | Size | Score |
|---|---|---|---|---|
| 1 | LateOn | LightOn | 149M | 57.22 |
| 2 | ColBERT-Zero | LightOn | 149M | 55.39 |
| 3 | GTE-ModernColBERT | LightOn | 149M | 54.75 |
| 4 | ColBERT-small | Answer.AI | 33M | 53.79 |
| 5 | Jina-ColBERT-v2 | Jina AI | 600M | 51.85 |
BrowseComp-Plus
Deep Research
Accuracy Β· all metrics
| # | Model | Submitted by | Score |
|---|---|---|---|
| 1 | Reason-ModernColBERT | LightOn | 87.59% |
| 2 | openJiuwen-deepsearch | openJiuwen | 80.00% |
| 3 | Reason-ModernColBERT | LightOn | 79.52% |
| 4 | Mixedbread Search | Agentica | 78.41% |
| 5 | Mixedbread Search | Mixedbread | 74.10% |
olmOCR-bench
OCR
Overall score
| # | Model | Submitted by | Size | Score |
|---|---|---|---|---|
| 1 | LightOnOCR-2-1B | LightOn | 1B | 83.2 |
| 2 | OlmOCR-7B | Allen AI | 7B | 83.0 |
| 3 | Qwen2.5-VL 7B | Alibaba / Qwen | 7B | 78.5 |
| 4 | LightOnOCR-1B | LightOn | 1B | 76.1 |
Models & contributions
9
| Model | Category | Description | Key metrics | Link |
|---|
Benchmark comparisons
4 benchmarks
BEIR nDCG@10: ColBERT (multi-vector)nDCG@10
olmOCR-bench: overall scorescore /100
MMLongBenchDoc: long document VQAscore /100
BRIGHT: reasoning-intensive retrievalscore
R&D timeline
9 milestones
Date
Model
Key result
