On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/
Melden Sie sich an, um einen Kommentar hinzuzufügen
Andere Beiträge in dieser Gruppe

Ryan welcomes Evan You, the creator of Vue.js, to explore the origins of Vue.js, the challenges faced during its development, and the project’s growth over a decade. They dive into potential integrati

Wenjing Zhang, VP of Engineering, and Caleb Johnson, Principal Engineer at LinkedIn, sit down with Ryan to discuss how semantic search and AI have transformed LinkedIn’s job search feature. They explo

Improving the place where developers have real conversations and real collaboration https://stackoverflow.blog/2025/08/11/renewing-chat-on-stack-overflow/

Ryan welcomes Paul Everitt, developer advocate at JetBrains and an early adopter of Python, to discuss the history, growth, and future of Python. They cover Python’s pivotal moments and rise alongside

In the age of AI, being able to make applications and create code has never been easier. But is it any good? Here's what vibe coding is like for someone without technical skills. https://stackoverflo

Quinn Slack, CEO and co-founder of Sourcegraph, joins the show to dive into the implications of AI coding tools on the software engineering lifecycle. They explore how AI tools are transforming the wo

Innovation is at the heart of any successful, growing company, and often that culture begins with an engaged, interconnected organization. https://stackoverflow.blog/2025/08/04/cross-pollination-as-a