How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/

Creado 1y | 16 abr 2024, 5:50:02

Inicia sesión para agregar comentarios

Otros mensajes en este grupo.

Svelte was built on “slinging code for the sheer love of it”

Rich Harris, creator of Svelte and software engineer at Vercel, joins Ryan on the show to dive into the evolution and future of web frameworks. They discuss the birth and growth of Svelte during the r

26 ago 2025, 5:10:15 | StackOverflow blog

The server-side rendering equivalent for LLM inference workloads

Ryan is joined by Tuhin Srivastava, CEO and co-founder of Baseten, to explore the evolving landscape of AI infrastructure and inference workloads, how the shift from traditional machine learning model

25 ago 2025, 15:10:15 | StackOverflow blog

Making continuous learning work at work

The most effective learning doesn’t happen in a classroom. It happens during work. https://stackoverflow.blog/2025/08/25/making-continuous-learning-work-at-work/

25 ago 2025, 15:10:13 | StackOverflow blog

Robots in the skies (and they use Transformer models)

Ryan welcomes Nathan Michael, CTO at Shield AI, to discuss what AI looks like in defense technologies, both technically and ethically. https://stackoverflow.blog/2025/08/22/robots-in-the-skies-and-the

22 ago 2025, 17:30:21 | StackOverflow blog

Research roadmap update, August 2025

User research for the next era of Stack Overflow https://stackoverflow.blog/2025/08/21/research-roadmap-update-august-2025/

21 ago 2025, 16:10:07 | StackOverflow blog

Learning in the flow: Unlocking employee potential through continuous learning

In this episode of Leaders of Code, Stack Overflow CEO Prashanth Chandrasekar and Christina Dacauaziliqua, Senior Learning Specialist at Morgan Stanley, talk about the importance of experiential learn

21 ago 2025, 6:40:15 | StackOverflow blog

Documents: The architect’s programming language

Senior developers know how to deploy code to systems made of code. Architects know how to deploy ideas to systems made of people. https://stackoverflow.blog/2025/08/20/documents-the-architect-s-progra

20 ago 2025, 14:30:12 | StackOverflow blog

Tomas_r2