Rd-TableBench – Accurately evaluating table extraction

Hey HN!

A ton of document parsing solutions have been coming out lately, each claiming SOTA with little evidence. A lot of these turned out to be LLM or LVM wrappers that hallucinate frequently on complex tables.

We just released RD-TableBench, an open benchmark to help teams evaluate extraction performance for complex tables. The benchmark includes a variety of challenging scenarios including scanned tables, handwriting, language detection, merged cells, and more.

We employed an independent team of PhD-level human labelers who manually annotated 1000 complex table images from a diverse set of publicly available documents.

Alongside this, we also release a new bioinformatics inspired algorithm for grading table similarity. Would love to hear any feedback!

-Raunak


Comments URL: https://news.ycombinator.com/item?id=42054144

Points: 25

# Comments: 6

https://reducto.ai/blog/rd-tablebench

Created 6mo | Nov 5, 2024, 11:10:09 PM


Login to add comment

Other posts in this group

Ask HN: Is Slack Down?

Are other people unable to load anything in slack all of a sudden? Status page hasn't been updated for anything yet


Comments URL:

May 12, 2025, 10:50:09 PM | Hacker news
HealthBench
May 12, 2025, 8:40:20 PM | Hacker news