On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/
Inicia sesión para agregar comentarios
Otros mensajes en este grupo.

Rich Harris, creator of Svelte and software engineer at Vercel, joins Ryan on the show to dive into the evolution and future of web frameworks. They discuss the birth and growth of Svelte during the r

Ryan is joined by Tuhin Srivastava, CEO and co-founder of Baseten, to explore the evolving landscape of AI infrastructure and inference workloads, how the shift from traditional machine learning model

The most effective learning doesn’t happen in a classroom. It happens during work. https://stackoverflow.blog/2025/08/25/making-continuous-learning-work-at-work/

Ryan welcomes Nathan Michael, CTO at Shield AI, to discuss what AI looks like in defense technologies, both technically and ethically. https://stackoverflow.blog/2025/08/22/robots-in-the-skies-and-the

User research for the next era of Stack Overflow https://stackoverflow.blog/2025/08/21/research-roadmap-update-august-2025/

In this episode of Leaders of Code, Stack Overflow CEO Prashanth Chandrasekar and Christina Dacauaziliqua, Senior Learning Specialist at Morgan Stanley, talk about the importance of experiential learn

Senior developers know how to deploy code to systems made of code. Architects know how to deploy ideas to systems made of people. https://stackoverflow.blog/2025/08/20/documents-the-architect-s-progra