How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/

Creato 1y | 16 apr 2024, 05:50:02


Accedi per aggiungere un commento

Altri post in questo gruppo

Research roadmap update, May 2025

An update to the research that the User Experience team is running over the next quarter. https://stackoverflow.blog/2025/05/19/research-roadmap-update-may-2025/

19 mag 2025, 14:10:11 | StackOverflow blog
Salesforce wants to do for agentic AI what they did for SaaS

Christophe Coenraets, SVP of Developer Relations at Salesforce, tells Eira and Ben about building the new Salesforce Developer Edition, which includes access to the company’s agentic AI platform, Agen

16 mag 2025, 04:50:09 | StackOverflow blog
Next-level observability: live breakpoint debugging

On this episode, Ryan chats with Hendrik Rexed, Cloud Native Advocate at Dynatrace, about debugging cloud-based applications like you would a local app. https://stackoverflow.blog/2025/05/13/next-lev

14 mag 2025, 06:40:02 | StackOverflow blog
Is the enterprise (actually) ready for AI?

Maryam Ashoori, Head of Product for watsonx.ai at IBM, joins Ryan and Eira to talk about the complexity of enterprise AI, the role of governance, the AI skill gap among developers, how AI coding tools

13 mag 2025, 05:10:07 | StackOverflow blog
Using AI to find patient zero in  marketing campaigns

Ben Popper chats with CTO Abby Kearns about how Alembic is using composite AI and lessons learned from contract tracing and epidemiology to help companies map customer journeys and understand the ROI

9 mag 2025, 06:10:02 | StackOverflow blog