Ben and Ryan are joined by Robin Gupta for a conversation about benchmarking and testing AI systems. They talk through the lack of trust and confidence in AI, the inherent challenges of nondeterministic systems, the role of human verification, and whether we can (or should) expect an AI to be reliable. https://stackoverflow.blog/2024/05/24/would-you-board-a-plane-safety-tested-by-genai/
Chcete-li přidat komentář, přihlaste se
Ostatní příspěvky v této skupině


Avoiding bad data is just as important in AI; it can open you to fines, lawsuits, and lost customers. https://stackoverflow.blog/2025/05/01/without-foundational-governance-every-ai-deployment-is-a-lia

Ryan talks with Greg Fallon, CEO of Geminus, about the intersection of AI and physical infrastructure, the evolution of simulation technology, the role of synthetic data in machine learning, and the i

Self-supervised learning is a key advancement that revolutionized natural language processing and generative AI. Here’s how it works and two examples of how it is used to train language models. https:

Financial institutions face a balancing act between tech innovation and strict regulations. As customer expectations for improved user experience and demands from those tasked with enhancing features

Today’s episode is a roundup of spontaneous, on-the-ground conversations from HumanX 2025, featuring guests from CodeConductor, DDN, Cloudflare, and Galileo. https://stackoverflow.blog/2025/04/25/grab

In this episode of Leaders of Code, we chat with guests from Lloyds Banking Group about their focus on engineering excellence and the need for organizations to adapt to new technologies while ensuring