Show HN: Data Bonsai: a Python package to clean your data with LLMs

I've been doing some data cleaning for my fine tuning projects using LLMs, and decided to just build a package for it as a side project. Check it out here: https://github.com/databonsai/databonsai

Some features:

- categorization (labelling), transformation and decomposition (text into structured format) - validates llm outputs

- batch mode batches up the inputs/outputs so you don't send the prompt (schema, fewshot examples) for every row of data, saving a significant amount of tokens

There are some similarities to the Instructor repo, but this is simpler and made for datasets. Would love any feedback/suggestions (and a star if you like it!)

Comments URL: https://news.ycombinator.com/item?id=40184372

Points: 11

# Comments: 1

https://github.com/databonsai/databonsai

Created 1y | Apr 28, 2024, 10:20:04 AM

Login to add comment

Other posts in this group

Chrome's SSL Bypass Cheatcode

Chrome's SSL Bypass Cheatcode

Article URL: https://thomascountz.com/2025/07/17/chromes-ssl-bypass-cheatcode

Comments URL:

Jul 17, 2025, 7:50:19 PM | Hacker news

All AI Models Might be The Same

All AI Models Might be The Same

Article URL: https://blog.jxmo.io/p/there-is-only-one-model

Comments URL:

Jul 17, 2025, 7:50:18 PM | Hacker news

Run TypeScript code without worrying about configuration

Run TypeScript code without worrying about configuration

Article URL: https://tsx.is/

Comments URL: https://news.ycombinator.com/item?id=44595824

Poi

Jul 17, 2025, 7:50:18 PM | Hacker news

Vibe Check: OpenAI Enters the Browser Wars with ChatGPT Agent

Vibe Check: OpenAI Enters the Browser Wars with ChatGPT Agent

Article URL: https://every.to/vibe-check/vibe-check-openai-enters-the-browser-wars-with-chatgpt-a

Jul 17, 2025, 7:50:12 PM | Hacker news

Apple Intelligence Foundation Language Models Tech Report 2025

Apple Intelligence Foundation Language Models Tech Report 2025

Article URL: https://machinelearning.apple.com/research/apple-foundation-models-tech-report-2025

Jul 17, 2025, 7:50:11 PM | Hacker news

First Come First Served: The Impact of File Position on Code Review

First Come First Served: The Impact of File Position on Code Review

Article URL: https://arxiv.org/abs/2208.04259

Comments URL: https://news.ycombinator.c

Jul 17, 2025, 7:50:10 PM | Hacker news

My Experience with Claude Code After 2 Weeks of Adventures

My Experience with Claude Code After 2 Weeks of Adventures

Article URL: https://sankalp.bearblog.dev/my-claude-code-experience-after-2-weeks-of-usage/

Comm

Jul 17, 2025, 7:50:09 PM | Hacker news

Techie