modernvbert

Public

ModernVBERT is a 250M-parameter vision–language encoder that aligns a text-encoder (Ettin-150M) with a vision-encoder (SigLIP2-B) through a MLM objective. When fine-tuned for document retrieval, ModernVBERT sets a new state of the art for sub-1B models on ViDoRe tasks.

MIT License

Updated Jan 29, 2026

Created Sep 30, 2025

12 stars

0 forks

0 watchers

5 open issues

Languages

Codebase composition by bytes

Top Contributors

PA paultltc

1 commits