Ben and Ryan are joined by Robin Gupta for a conversation about benchmarking and testing AI systems. They talk through the lack of trust and confidence in AI, the inherent challenges of nondeterministic systems, the role of human verification, and whether we can (or should) expect an AI to be reliable. https://stackoverflow.blog/2024/05/24/would-you-board-a-plane-safety-tested-by-genai/
Login to add comment
Other posts in this group
On today’s episode we chat with Jared Palmer, VP of AI at Vercel, who says the company has three key goals. First, support AI native web apps like ChatGPT and Claude. Second, use GenAI to make it easi
In this episode, Alexa Montelibano and Tiago Torre, sales engineers at Stack Overflow, take you behind the scenes to show how customer feedback shapes our products, including OverflowAI. Alexa and Tia
It’s easy to generate code, but not so easy to generate good code. https://stackoverflow.blog/2024/06/10/generative-ai-is-not-going-to-build-your-engineering-team-for-you/
In this episode we chat with Saumil Patel, co-founder and CEO of Squire AI. The company uses an agentic workflow to automatically review your code, write your pull requests, and even review and provid
In this episode we chat with Saumil Patel, co-founder and CEO of Squire AI. The company uses an agentic workflow to automatically review your code, write your pull requests, and even review and provid
A look at some of the current thinking around chunking data for retrieval-augmented generation (RAG) systems. https://stackoverflow.blog/2024/06/06/breaking-up-is-hard-to-do-chunking-in-rag-applicatio
Learn about the workflow designed to help new askers improve their questions on Stack Overflow. https://stackoverflow.blog/2024/06/04/introducing-staging-ground-the-private-space-to-get-feedback-on-qu