Show HN: Data Bonsai: a Python package to clean your data with LLMs

I've been doing some data cleaning for my fine tuning projects using LLMs, and decided to just build a package for it as a side project. Check it out here: https://github.com/databonsai/databonsai

Some features:

- categorization (labelling), transformation and decomposition (text into structured format) - validates llm outputs

- batch mode batches up the inputs/outputs so you don't send the prompt (schema, fewshot examples) for every row of data, saving a significant amount of tokens

There are some similarities to the Instructor repo, but this is simpler and made for datasets. Would love any feedback/suggestions (and a star if you like it!)


Comments URL: https://news.ycombinator.com/item?id=40184372

Points: 11

# Comments: 1

https://github.com/databonsai/databonsai

Établi 1y | 28 avr. 2024, 10:20:04


Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

Show HN: We made our own inference engine for Apple Silicon

We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel an

15 juil. 2025, 16:50:31 | Hacker news
Ask HN: Is it time to fork HN into AI/LLM and "Everything else/other?"

I would very much like to enjoy HN the way I did years ago, as a place where I'd discover things that I never otherwise would have come across.

The increasing AI/LLM domination of the site has m

15 juil. 2025, 16:50:28 | Hacker news
Ask HN: What's Your Useful Local LLM Stack?

What I’m asking HN:

What does your actually useful local LLM stack look like?

I’m looking for something that provides you with real value — not just a sexy demo.

---

After a recent interne

15 juil. 2025, 16:50:26 | Hacker news