BAML

Deep technical analysis of Anthropic's August 2024 incidents, exploring how floating-point precision, context window routing, and distributed token selection can break production AI systems at scale.

Video (1h)

Episode Summary

Vaibhav Gupta and Aaron (co-founder, former AWS EC2/Prime Video engineer) dissect Anthropic's detailed post-mortem of three critical bugs that affected their production systems. They explore the technical intricacies of how models select tokens across distributed GPUs, why longer context windows can degrade performance, and how compiler optimizations mixing 16-bit and 32-bit floating-point math led to incorrect token selection. The discussion extends to practical lessons for AI engineers: building observability into AI systems, using "vibe checks" from social media for anomaly detection, and the critical importance of rollback strategies. They also analyze OpenAI's new Agent Builder and the broader trend of visual workflow tools for non-technical users.

Key Technical Deep Dives

Context Window Routing Bug

Impact: 30% of Claude Code users affected
Root Cause: Million-token context windows degraded performance on smaller requests
Lesson: Less context often yields better results - models trained on different context lengths perform differently when information needs to bridge across tokens
Technical Detail: RoPE (Rotary Position Embedding) scaling changes how models perceive token positions when expanding context

Floating Point Precision Bug

Impact: 0.8% of traffic affected, but critical for temperature=0 use cases
Root Cause: TPU compiler randomly optimized some operations to FP32 instead of FP16
Issue: In floating point math, a × b × c ≠ c × b × a, and FP16 vs FP32 results differ
Result: Wrong tokens selected when comparing probabilities near boundaries (e.g., 0.509 vs 0.501)