Hey HN!
A ton of document parsing solutions have been coming out lately, each claiming SOTA with little evidence. A lot of these turned out to be LLM or LVM wrappers that hallucinate frequently on complex tables.
We just released RD-TableBench, an open benchmark to help teams evaluate extraction performance for complex tables. The benchmark includes a variety of challenging scenarios including scanned tables, handwriting, language detection, merged cells, and more.
We employed an independent team of PhD-level human labelers who manually annotated 1000 complex table images from a diverse set of publicly available documents.
Alongside this, we also release a new bioinformatics inspired algorithm for grading table similarity. Would love to hear any feedback!
-Raunak
Comments URL: https://news.ycombinator.com/item?id=42054144
Points: 25
# Comments: 6
Login to add comment
Other posts in this group

Article URL: https://thehyperplane.substack.com/p/build-your-own-siri-locally-on-device
Comments URL

Article URL: https://dynomight.net/titles/
Comments URL: https://news.ycombinator.com/ite
Article URL: https://www.opte.org/the-internet
Comments URL: https://news.ycombinator
Article URL: https://openai.com/index/healthbench/
Comments URL: https://news.yco

Article URL: https://arxiv.org/abs/2505.05654
Comments URL: https://news.ycombinator.c

Article URL: https://wts.dev/posts/tcc-who/
Comments URL: https://news.ycombinator.com/i