How do you evaluate an LLM? Try an LLM.

On this episode: Stack Overflow senior data scientist Michael Geden tells Ryan and Ben about how data scientists evaluate large language models (LLMs) and their output. They cover the challenges involved in evaluating LLMs, how LLMs are being used to evaluate other LLMs, the importance of data validating, the need for human raters, and more needs and tradeoffs involved in selecting and fine-tuning LLMs. https://stackoverflow.blog/2024/04/16/how-do-you-evaluate-an-llm-try-an-llm/

Létrehozva 1y | 2024. ápr. 16. 5:50:02


Jelentkezéshez jelentkezzen be

EGYÉB POSTS Ebben a csoportban

Research roadmap update, May 2025

An update to the research that the User Experience team is running over the next quarter. https://stackoverflow.blog/2025/05/19/research-roadmap-update-may-2025/

2025. máj. 19. 14:10:11 | StackOverflow blog
Salesforce wants to do for agentic AI what they did for SaaS

Christophe Coenraets, SVP of Developer Relations at Salesforce, tells Eira and Ben about building the new Salesforce Developer Edition, which includes access to the company’s agentic AI platform, Agen

2025. máj. 16. 4:50:09 | StackOverflow blog
Next-level observability: live breakpoint debugging

On this episode, Ryan chats with Hendrik Rexed, Cloud Native Advocate at Dynatrace, about debugging cloud-based applications like you would a local app. https://stackoverflow.blog/2025/05/13/next-lev

2025. máj. 14. 6:40:02 | StackOverflow blog
Is the enterprise (actually) ready for AI?

Maryam Ashoori, Head of Product for watsonx.ai at IBM, joins Ryan and Eira to talk about the complexity of enterprise AI, the role of governance, the AI skill gap among developers, how AI coding tools

2025. máj. 13. 5:10:07 | StackOverflow blog
Using AI to find patient zero in  marketing campaigns

Ben Popper chats with CTO Abby Kearns about how Alembic is using composite AI and lessons learned from contract tracing and epidemiology to help companies map customer journeys and understand the ROI

2025. máj. 9. 6:10:02 | StackOverflow blog