Show HN: Qwen-2.5-32B is now the best open source OCR model

Last week was big for open source LLMs. We got:

- Qwen 2.5 VL (72b and 32b)

- Gemma-3 (27b)

- DeepSeek-v3-0324

And a couple weeks ago we got the new mistral-ocr model. We updated our OCR benchmark to include the new models.

We evaluated 1,000 documents for JSON extraction accuracy. Major takeaways:

- Qwen 2.5 VL (72b and 32b) are by far the most impressive. Both landed right around 75% accuracy (equivalent to GPT-4o’s performance). Qwen 72b was only 0.4% above 32b. Within the margin of error.

- Both Qwen models passed mistral-ocr (72.2%), which is specifically trained for OCR.

- Gemma-3 (27B) only scored 42.9%. Particularly surprising given that it's architecture is based on Gemini 2.0 which still tops the accuracy chart.

The data set and benchmark runner is fully open source. You can check out the code and reproduction steps here:

- https://getomni.ai/blog/benchmarking-open-source-models-for-...

- https://github.com/getomni-ai/benchmark

- https://huggingface.co/datasets/getomni-ai/ocr-benchmark

Comments URL: https://news.ycombinator.com/item?id=43549072

Points: 61

# Comments: 13

https://github.com/getomni-ai/benchmark/blob/main/README.md

Creato 29d | 1 apr 2025, 21:40:16

Accedi per aggiungere un commento

Altri post in questo gruppo

Wyze pays $255k of tariffs on $167k of floodlights

Article URL: https://twitter.com/WyzeCam/status/1917662183036706849

Comments URL:

1 mag 2025, 06:20:28 | Hacker news

US defense secretary circumvents the official communications equipment

Article URL: https://www.electrospaces.net/2025/04/how-us-defense-secretary-hegseth.html

Comments U

1 mag 2025, 04:10:14 | Hacker news

Windows RDP lets you log-in using revoked passwords. Microsoft is ok with that

Article URL: https://arstechnica.com/security/2025/04/windows-rdp-

1 mag 2025, 04:10:13 | Hacker news

108B Pixel Scan of Johannes Vermeer's Girl with a Pearl Earring

Article URL: https://www.hirox-europe.com/gigapixel/girl-with-a-pearl-earring/

Comments URL:

1 mag 2025, 04:10:12 | Hacker news

Show HN: Convert Large CSV/XLSX to JSON or XML in Browser

Hello HN, I'm excited to share a project I've been working on: A simple, fast way to process huge CSV and XLSX files directly in your browser and export them as clean JSON or XML

Here's a few th

1 mag 2025, 04:10:11 | Hacker news

Company built its own rail terminal in NYC to avoid relying on trucks

Article URL: https://www.fastcompany.com/91324241/this-company-built-

1 mag 2025, 04:10:11 | Hacker news

Milwaukee police trade: 2.5M mugshots for free facial recognition access

Article URL: https://www.jsonline.com/story/ne

1 mag 2025, 04:10:11 | Hacker news

Techie