Ben and Ryan are joined by Robin Gupta for a conversation about benchmarking and testing AI systems. They talk through the lack of trust and confidence in AI, the inherent challenges of nondeterministic systems, the role of human verification, and whether we can (or should) expect an AI to be reliable. https://stackoverflow.blog/2024/05/24/would-you-board-a-plane-safety-tested-by-genai/
Login to add comment
Other posts in this group

As a generation characterized as "digital natives," the way Gen Z interacts with and consumes knowledge is rooted in their desire for instant gratification and personalization. How will this affect th

Read on to see the latest features coming to Stack Overflow for Teams Business users! https://stackoverflow.blog/2025/06/18/smarter-teams-brighter-insights-stack-overflow-for-teams-business-summer-bun

It’s Java’s 30th anniversary! Ryan welcomes back Georges Saab, Senior VP of Development for the Java Platform Group and Chair of the OpenJDK Governing Board, to reflect on Java’s changes over the las

Ryan Donovan and Ben Popper sit down with Jamie de Guerre, SVP of Product at Together AI, to discuss the evolving landscape of AI and open-source models. They explore the significance of infrastructur

Diverse, high-quality data is a prerequisite for reliable, effective, and ethical AI solutions. https://stackoverflow.blog/2025/06/11/why-you-need-diverse-third-party-data-to-deliver-trusted-ai-soluti

Ryan and Ben welcome Tulsee Doshi and Logan Kilpatrick from Google's DeepMind to discuss the advanced capabilities of the new Gemini 2.5, the importance of feedback loops for model improvement and red

Kathleen Vignos, VP of Software Engineering at Capital One, sits down with Ryan to explore shifting to 100% serverless architecture in enterprise, deploying talent for better customer experience, and