How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/

Creato 1y | 16 apr 2024, 05:50:02

Accedi per aggiungere un commento

Altri post in questo gruppo

Attention isn’t all we need; we need ownership too

Ryan welcomes Illia Polosukhin, co-author of the original "Attention Is All You Need" Transformers paper and co-founder of NEAR, on the show to talk about the development and impact of the Transformer

8 lug 2025, 05:40:02 | StackOverflow blog

Getting creative with Coding Challenges

An experiment to level up your coding skills on Stack Overflow, while learning in a space that welcomes creative problem-solving. Discover how we built it. https://stackoverflow.blog/2025/07/07/gettin

7 lug 2025, 13:20:10 | StackOverflow blog

Why call one API when you can use GraphQL to call them all?

Ryan welcomes Matt DeBergalis, CTO at Apollo GraphQL, to discuss the evolution and future of API orchestration, the benefits of GraphQL in managing API complexity, its seamless integration with AI and

4 lug 2025, 06:30:08 | StackOverflow blog

Programming problems that seem easy, but aren't, featuring Jon Skeet

Jon Skeet, the first Stack Overflow user with a million reputation, sits down with Ryan to share his wealth of knowledge on all things development: the deceptively simple but actually complicated prob

1 lug 2025, 06:40:04 | StackOverflow blog

Reliability for unreliable LLMs

Large language models are non-deterministic by design. Here's how you can inject a little bit of determinism into GenAI workflows. https://stackoverflow.blog/2025/06/30/reliability-for-unreliable-llm

30 giu 2025, 14:30:02 | StackOverflow blog

You’ve got 99 problems but data shouldn’t be one

Ryan is joined by Tobiko Data co-founders Toby Mao and Iaroslav Zeigerman to talk about the crucial role of rigorous data practices and tooling, the innovations of Tobiko Data’s SQLMesh and SQLGlot, a

27 giu 2025, 05:20:07 | StackOverflow blog

Not an option, but a necessity: How organizations are adopting and implementing AI internally

AI is no longer just a luxury for the most tech savvy companies — it's now a necessity for organizational transformation. How are real teams successfully leveraging and innovating with these new tools

25 giu 2025, 13:50:07 | StackOverflow blog

Tomas_r2