How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/

Erstellt 1y | 16.04.2024, 05:50:02

Melden Sie sich an, um einen Kommentar hinzuzufügen

Andere Beiträge in dieser Gruppe

Being unambiguous in what you want: the software engineer in a vibe coding world

Quinn Slack, CEO and co-founder of Sourcegraph, joins the show to dive into the implications of AI coding tools on the software engineering lifecycle. They explore how AI tools are transforming the wo

05.08.2025, 05:40:11 | StackOverflow blog

Cross-pollination as a strategic advantage for forward-thinking organizations

Innovation is at the heart of any successful, growing company, and often that culture begins with an engaged, interconnected organization. https://stackoverflow.blog/2025/08/04/cross-pollination-as-a

04.08.2025, 13:30:11 | StackOverflow blog

Diving into the results of the 2025 Developer Survey

Ryan and Eira welcome Erin Yepis, Senior Analyst at Stack Overflow, to the show to discuss the newly released 2025 Developer Survey results. They explore the decline in trust in AI tools, shifts in po

01.08.2025, 06:50:07 | StackOverflow blog

Do AI coding tools help with imposter syndrome or make it worse?

Spoiler: Yes. https://stackoverflow.blog/2025/07/31/do-ai-coding-tools-help-with-imposter-syndrome-or-make-it-worse/

31.07.2025, 14:40:04 | StackOverflow blog

The innovation, leadership, and team agility inside U.S. Bank’s cloud journey

In this episode of Leaders of Code, Jody Bailey, Stack Overflow’s CPO, Anirudh Kaul, Senior Director of Software Engineering, and Paul Petersen, Cloud Platform Engineering Manager, discuss the U.S. Ba

31.07.2025, 05:20:12 | StackOverflow blog

Developers remain willing but reluctant to use AI: The 2025 Developer Survey results are here

No need to bury the lede: more developers are using AI tools, but their trust in those tools is falling. https://stackoverflow.blog/2025/07/29/developers-remain-willing-but-reluctant-to-use-ai-the-20

29.07.2025, 16:20:08 | StackOverflow blog

That custom gift for your mom takes more work than you think

Ryan welcomes Mahir Yavuz, Senior Director of Engineering at Etsy, to the show to explore the unique challenges that Etsy’s marketplace faces and how Etsy’s teams leverage machine learning and AI to m

29.07.2025, 06:50:11 | StackOverflow blog

Tomas_r2