The Verification Gap: What Separates LLM Demos from Production Agents
Andriy Batutin
MacPaw
Abstract
Every LLM demo looks magical. Then reality hits: hallucinations, edge cases, user trust erosion. This talk presents case studies on what it actually takes to bring AI Agents to production – and how organizations can build reliable agentic workflows without an OpenAI-scale budget.
We’ll cover practical verification patterns, failure modes that only surface at scale, and the architectural decisions that separate impressive prototypes from systems users actually trust.
Bio
Andriy Batutin is a Senior AI Engineer at MacPaw, where he builds production AI agent systems with 200+ tool integrations serving millions of users. With over 10 years in IT and 6 years dedicated to AI/ML, he specializes in the hard problem of making agentic workflows reliable - bridging the gap between demos that impress and systems that actually work.