On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/
Connectez-vous pour ajouter un commentaire
Autres messages de ce groupe
On this episode: The FTC bans most noncompete agreements, the implications of the TikTok “ban,” why a 2017 law is hitting startups with huge tax bills seven years later, and the return of net neutrali
Dr. Richard Hipp, creator of SQLite, shares how he taught himself to program, the challenges he faced in creating SQLite, and the importance of testing and maintaining the software for long-term suppo
Should a language be easy or comprehensive? https://stackoverflow.blog/2024/04/25/what-language-should-beginning-programmers-choose/
The home team talks about the current state of the software job market, the changing sentiments around AI job opportunities, the impact of big players like Facebook and OpenAI on the space, and the ch
Ben and Ryan explore why configuration is so complicated, the right to repair, the best programming languages for beginners, how AI is grading exams in Texas, Automattic’s $125M acquisition of Beeper,
Ben talks with Shane McAllister, lead developer advocate at MongoDB, Stanimira Vlaeva, senior developer advocate at MongoDB, and Miku Jha, director, AI/ML and generative AI at Google Cloud, about the
The key strategies for building a headache-free data platform https://stackoverflow.blog/2024/04/15/how-to-succeed-as-a-data-engineer-without-the-burnout/