🦄 Anthropic Post Mortem
In this conversation, Vaibhav Gupta and Aaron discuss various aspects of AI model performance, focusing on the recent downtime experienced by Anthropic and the implications for AI systems. They explore the sensitivity of models to context windows, the challenges of output corruption, and the complexities of token selection mechanisms. The discussion also highlights the importance of debugging and observability in AI systems, as well as the role of user-friendly workflows and integrations in making AI accessible to non-technical users. The conversation concludes with thoughts on the future of AI development and the need for effective metrics to monitor product performance.
Project Details
Open in GitHubDeep technical analysis of Anthropic's August 2024 incidents, exploring how floating-point precision, context window routing, and distributed token selection can break production AI systems at scale.
Video (1h)
Episode Summary
Vaibhav Gupta and Aaron (co-founder, former AWS EC2/Prime Video engineer) dissect Anthropic's detailed post-mortem of three critical bugs that affected their production systems. They explore the technical intricacies of how models select tokens across distributed GPUs, why longer context windows can degrade performance, and how compiler optimizations mixing 16-bit and 32-bit floating-point math led to incorrect token selection. The discussion extends to practical lessons for AI engineers: building observability into AI systems, using "vibe checks" from social media for anomaly detection, and the critical importance of rollback strategies. They also analyze OpenAI's new Agent Builder and the broader trend of visual workflow tools for non-technical users.
Key Technical Deep Dives
Context Window Routing Bug
- Impact: 30% of Claude Code users affected
- Root Cause: Million-token context windows degraded performance on smaller requests
- Lesson: Less context often yields better results - models trained on different context lengths perform differently when information needs to bridge across tokens
- Technical Detail: RoPE (Rotary Position Embedding) scaling changes how models perceive token positions when expanding context
Floating Point Precision Bug
- Impact: 0.8% of traffic affected, but critical for temperature=0 use cases
- Root Cause: TPU compiler randomly optimized some operations to FP32 instead of FP16
- Issue: In floating point math,
a × b × c ≠c × b × a, and FP16 vs FP32 results differ - Result: Wrong tokens selected when comparing probabilities near boundaries (e.g., 0.509 vs 0.501)




























