Ben and Ryan are joined by Robin Gupta for a conversation about benchmarking and testing AI systems. They talk through the lack of trust and confidence in AI, the inherent challenges of nondeterministic systems, the role of human verification, and whether we can (or should) expect an AI to be reliable. https://stackoverflow.blog/2024/05/24/would-you-board-a-plane-safety-tested-by-genai/
Chcete-li přidat komentář, přihlaste se
Ostatní příspěvky v této skupině

In this episode of Leaders of Code, Jody Bailey, Chief Product and Technology Officer at Stack Overflow, sits down with Dane Knecht, the newly appointed Chief Technology Officer at Cloudflare. https:

As a generation characterized as "digital natives," the way Gen Z interacts with and consumes knowledge is rooted in their desire for instant gratification and personalization. How will this affect th

Read on to see the latest features coming to Stack Overflow for Teams Business users! https://stackoverflow.blog/2025/06/18/smarter-teams-brighter-insights-stack-overflow-for-teams-business-summer-bun

It’s Java’s 30th anniversary! Ryan welcomes back Georges Saab, Senior VP of Development for the Java Platform Group and Chair of the OpenJDK Governing Board, to reflect on Java’s changes over the las

Ryan Donovan and Ben Popper sit down with Jamie de Guerre, SVP of Product at Together AI, to discuss the evolving landscape of AI and open-source models. They explore the significance of infrastructur

Diverse, high-quality data is a prerequisite for reliable, effective, and ethical AI solutions. https://stackoverflow.blog/2025/06/11/why-you-need-diverse-third-party-data-to-deliver-trusted-ai-soluti

Ryan and Ben welcome Tulsee Doshi and Logan Kilpatrick from Google's DeepMind to discuss the advanced capabilities of the new Gemini 2.5, the importance of feedback loops for model improvement and red