Last week was big for open source LLMs. We got:
- Qwen 2.5 VL (72b and 32b)
- Gemma-3 (27b)
- DeepSeek-v3-0324
And a couple weeks ago we got the new mistral-ocr model. We updated our OCR benchmark to include the new models.
We evaluated 1,000 documents for JSON extraction accuracy. Major takeaways:
- Qwen 2.5 VL (72b and 32b) are by far the most impressive. Both landed right around 75% accuracy (equivalent to GPT-4o’s performance). Qwen 72b was only 0.4% above 32b. Within the margin of error.
- Both Qwen models passed mistral-ocr (72.2%), which is specifically trained for OCR.
- Gemma-3 (27B) only scored 42.9%. Particularly surprising given that it's architecture is based on Gemini 2.0 which still tops the accuracy chart.
The data set and benchmark runner is fully open source. You can check out the code and reproduction steps here:
- https://getomni.ai/blog/benchmarking-open-source-models-for-...
- https://github.com/getomni-ai/benchmark
- https://huggingface.co/datasets/getomni-ai/ocr-benchmark
Comments URL: https://news.ycombinator.com/item?id=43549072
Points: 61
# Comments: 13
Connectez-vous pour ajouter un commentaire
Autres messages de ce groupe

Article URL: https://www.ubicloud.com/blog/building-burstables-cpu-slicing-with-cgroups
Comments URL

Article URL: https://www.ycombinator.com/companies/toma/jobs

Article URL: https://matthewstrom.com/writing/album-art/
Comments URL: http

Hi HN!
BLAST is a high-performance serving engine for browser-augmented LLMs, designed to make deploying web-browsing AI easy, fast, and cost-manageable.
The goal with BLAST is to ultimately a

Article URL: https://www.hustwit.com/rams
Comments URL: https://news.ycombinator.com/item?
I've been working on the Anukari 3D Physics Synthesizer for a little over two years now. It's one of the earliest virtual instruments to rely on the GPU for audio processing, which has been incred