I've been doing some data cleaning for my fine tuning projects using LLMs, and decided to just build a package for it as a side project. Check it out here: https://github.com/databonsai/databonsai
Some features:
- categorization (labelling), transformation and decomposition (text into structured format) - validates llm outputs
- batch mode batches up the inputs/outputs so you don't send the prompt (schema, fewshot examples) for every row of data, saving a significant amount of tokens
There are some similarities to the Instructor repo, but this is simpler and made for datasets. Would love any feedback/suggestions (and a star if you like it!)
Comments URL: https://news.ycombinator.com/item?id=40184372
Points: 11
# Comments: 1
Connectez-vous pour ajouter un commentaire
Autres messages de ce groupe

We wrote our inference engine on Rust, it is faster than llama cpp in all of the use cases. Your feedback is very welcomed. Written from scratch with idea that you can add support of any kernel an

I would very much like to enjoy HN the way I did years ago, as a place where I'd discover things that I never otherwise would have come across.
The increasing AI/LLM domination of the site has m
Article URL: https://www.blender.org/download/releases/4-5/
What I’m asking HN:
What does your actually useful local LLM stack look like?
I’m looking for something that provides you with real value — not just a sexy demo.
---
After a recent interne

Article URL: https://systemf.epfl.ch/blog/rust-regex-lookbehinds/
Article URL: https://www.matthieulc.com/posts/shoggoth-mini