How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/

Établi 1y | 16 avr. 2024, 05:50:02

Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

Why call one API when you can use GraphQL to call them all?

Ryan welcomes Matt DeBergalis, CTO at Apollo GraphQL, to discuss the evolution and future of API orchestration, the benefits of GraphQL in managing API complexity, its seamless integration with AI and

4 juil. 2025, 06:30:08 | StackOverflow blog

Programming problems that seem easy, but aren't, featuring Jon Skeet

Jon Skeet, the first Stack Overflow user with a million reputation, sits down with Ryan to share his wealth of knowledge on all things development: the deceptively simple but actually complicated prob

1 juil. 2025, 06:40:04 | StackOverflow blog

Reliability for unreliable LLMs

Large language models are non-deterministic by design. Here's how you can inject a little bit of determinism into GenAI workflows. https://stackoverflow.blog/2025/06/30/reliability-for-unreliable-llm

30 juin 2025, 14:30:02 | StackOverflow blog

You’ve got 99 problems but data shouldn’t be one

Ryan is joined by Tobiko Data co-founders Toby Mao and Iaroslav Zeigerman to talk about the crucial role of rigorous data practices and tooling, the innovations of Tobiko Data’s SQLMesh and SQLGlot, a

27 juin 2025, 05:20:07 | StackOverflow blog

Not an option, but a necessity: How organizations are adopting and implementing AI internally

AI is no longer just a luxury for the most tech savvy companies — it's now a necessity for organizational transformation. How are real teams successfully leveraging and innovating with these new tools

25 juin 2025, 13:50:07 | StackOverflow blog

You've vibe coded an app. Now what?

On this episode, Ryan chats with Vish Abrams, chief architect at Heroku, about all the work that needs to be done after you’ve vibe coded your dream app. https://stackoverflow.blog/2025/06/25/you-ve-

25 juin 2025, 06:50:08 | StackOverflow blog

How to build your prototypes without a 35% tariff

Ryan and Ben welcome Alex Malcoci, CEO and founder of MiniProto, to talk innovations in hardware prototyping, the evolving complexities of the global supply chain, the impact of the US-China trade war

24 juin 2025, 05:20:11 | StackOverflow blog

Tomas_r2