Boundary

HomeBlogPodcastTeam
DocsStar on GitHub
2,107

🦄 ai that works

A weekly conversation about how we can all get the most juice out of todays models with @hellovai & @dexhorthy

Every Tuesday at 10 AM PST

1 hour of live code, Q&A with some prepped content to help you take your AI app from a demo to production.

Join the conversation

📅Event Calendar💬Discord🚀GitHub📺YouTube
Upcoming • in 4 days
#21
in 4 days

Voice Agents and Supervisor Threading

Exploring voice-based AI agents and supervisor threading patterns for managing complex conversational workflows.

RSVP
RSVP
Claude for Non-Code Tasks
#20
3 days ago

Claude for Non-Code Tasks

On #17 we talked about advanced context engineering workflows for using Claude code to work in complex codebases. This week, we're gonna get a little weird with it, and show off a bunch of ways you can use Claude Code as a generic agent to handle non-coding tasks. We'll learn things like: Skipping the MCP and having claude write its own scripts to interact with external systems, Creating internal knowledge graphs with markdown files, How to blend agentic retrieval and search with deterministic context packing

Demo CodeCode
Watch
Interruptible Agents
#19
10 days ago

Interruptible Agents

Anyone can build a chatbot, but the user experience is what truly sets it apart. Can you cancel a message? Can you queue commands while it's busy? How finely can you steer the agent? We'll explore these questions and code a solution together.

Demo CodeCode
Watch
Decoding Context Engineering Lessons from Manus
#18
17 days ago

Decoding Context Engineering Lessons from Manus

A few weeks ago, the Manus team published an excellent paper on context engineering. It covered KV Cache, Hot-swapping tools with custom samplers, and a ton of other cool techniques. On this week's episode, we'll dive deep on the manus Article and put some of the advice into practice, exploring how a deep understanding of models and inference can help you to get the most out of today's LLMs.

Demo CodeCode
Watch
Context Engineering for Coding Agents
#17
24 days ago

Context Engineering for Coding Agents

By popular demand, AI That Works #17 will dive deep on a new kind of context engineering: managing research, specs, and planning to get the most of coding agents and coding CLIs. You've heard people bragging about spending thousands/mo on Claude Code, maxing out Amp limits, and much more. Now Dex and Vaibhav are gonna share some tips and tricks for pushing AI coding tools to their absolute limits, while still shipping well-tested, bug-free code. This isn't vibe-coding, this is something completely different.

Demo CodeCode
Watch
Evaluating Prompts Across Models
#16
about 1 month ago

Evaluating Prompts Across Models

AI That Works #16 will be a super-practical deep dive into real-world examples and techniques for evaluating a single prompt against multiple models. While this is a commonly heralded use case for Evals, e.g. 'how do we know if the new model is better' / 'how do we know if the new model breaks anything', there's not a ton of practical examples out there for real-world use cases.

Demo CodeCode
Watch
PDFs, Multimodality, Vision Models
#15
about 1 month ago

PDFs, Multimodality, Vision Models

Dive deep into practical PDF processing techniques for AI applications. We'll explore how to extract, parse, and leverage PDF content effectively in your AI workflows, tackling common challenges like layout preservation, table extraction, and multi-modal content handling.

Demo CodeCode
Watch
Implementing Decaying-Resolution Memory
#14
about 2 months ago

Implementing Decaying-Resolution Memory

Last week on #13, we did a conceptual deep dive on context engineering and memory - this week, we're going to jump right into the weeds and implement a version of Decaying-Resolution Memory that you can pick up and apply to your AI Agents today. For this episode, you'll probably want to check out episode #13 in the session listing to get caught up on DRM and why its worth building from scratch.

Demo CodeCode
Watch
Building AI with Memory & Context
#13
about 2 months ago

Building AI with Memory & Context

How do we build agents that can remember past conversations and learn over time? We'll explore memory and context engineering techniques to create AI systems that maintain state across interactions.

Demo CodeCode
Watch
Boosting AI Output Quality
#12
about 2 months ago

Boosting AI Output Quality

This week's session was a bit meta! We explored 'Boosting AI Output Quality' by building the very AI pipeline that generated this email from our Zoom recording. The real breakthrough: separating extraction from polishing for high-quality AI generation.

Demo CodeCode
Watch
Building an AI Content Pipeline
#11
2 months ago

Building an AI Content Pipeline

Content creation involves a lot of manual work - uploading videos, sending emails, and other follow-up tasks that are easy to drop. We'll build an agent that integrates YouTube, email, GitHub and human-in-the-loop to fully automate the AI that Works content pipeline, handling all the repetitive work while maintaining quality.

Demo CodeCode
Watch
Entity Resolution: Extraction, Deduping, and Enriching
#10
2 months ago

Entity Resolution: Extraction, Deduping, and Enriching

Disambiguating many ways of naming the same thing (companies, skills, etc.) - from entity extraction to resolution to deduping. We'll explore breaking problems into extraction → resolution → enrichment stages, scaling with two-stage designs, and building async workflows with human-in-loop patterns for production entity resolution systems.

Demo CodeCode
Watch
Cracking the Prompting Interview
#9
3 months ago

Cracking the Prompting Interview

Ready to level up your prompting skills? Join us for a deep dive into advanced prompting techniques that separate good prompt engineers from great ones. We'll cover systematic prompt design, testing tools / inner loops, and tackle real-world prompting challenges. Perfect prep for becoming a more effective AI engineer.

Demo CodeCode
Watch
Humans as Tools: Async Agents and Durable Execution
#8
3 months ago

Humans as Tools: Async Agents and Durable Execution

Agents are great, but for the most accuracy-sensitive scenarios, we some times want a human in the loop. Today we'll discuss techniques for how to make this possible. We'll dive deep into concepts from our 4/22 session on 12-factor agents and extend them to handle asynchronous operations where agents need to contact humans for help, feedback, or approvals across a variety of channels.

Demo CodeCode
Watch
12-factor agents: selecting from thousands of MCP tools
#7
3 months ago

12-factor agents: selecting from thousands of MCP tools

MCP is only as great as your ability to pick the right tools. We'll dive into showing how to leverage MCP servers and accurately use the right ones when only a few have actually relevant tools.

Demo CodeCode
Watch
Policy to Prompt: Evaluating w/ the Enron Emails Dataset
#6
3 months ago

Policy to Prompt: Evaluating w/ the Enron Emails Dataset

One of the most common problems in AI engineering is looking at a set of policies/rules and evaluating evidence to determine if the rules were followed. In this session we'll explore turning policies into prompts and pipelines to evaluate which emails in the massive Enron email dataset violated SEC and Sarbanes-Oxley regulations.

Demo CodeCode
Watch
Designing Evals
#5
4 months ago

Designing Evals

Minimalist and high-performance testing/evals for LLM applications. Stay tuned for our season 2 kickoff topic on testing and evaluation strategies.

Demo CodeCode
Watch
Twelve Factor Agents
#4
4 months ago

Twelve Factor Agents

Learn how to build production-ready AI agents using the twelve-factor methodology. We'll cover the core concepts and build a real agent from scratch.

Demo CodeCode
Watch
Code Generation with Small Models
#3
5 months ago

Code Generation with Small Models

Large models can do a lot, but so can small models. We'll discuss techniques for how to leverage extremely small models for generating diffs and making changes in complete codebases.

Demo CodeCode
Watch
Reasoning Models vs Reasoning Prompts
#2
5 months ago

Reasoning Models vs Reasoning Prompts

Models can reason but you can also reason within a prompt. Which technique wins out when and why? We'll find out by adding reasoning to an existing movie chat agent.

Demo CodeCode
Watch
Large Scale Classification
#1
5 months ago

Large Scale Classification

LLMs are great at classification from 5, 10, maybe even 50 categories. But how do we deal with situations when we have over 1000? Perhaps it's an ever changing list of categories?

Demo CodeCode
Watch

Never Miss an Episode

Join our weekly sessions and learn how to build AI that actually works in production.

📅Subscribe on Event Calendar💬Subscribe on Discord🚀Subscribe on GitHub📺Subscribe on YouTube

Boundary

Open source toolkit for AI development. Build type-safe AI applications with your team - all with confidence and reliability.

  • Company
  • About Us
  • Why BAML?
  • Privacy Policy
  • Terms of Service
  • Resources
  • Changelog
  • Docs
  • Social
  • GitHub
  • Twitter
  • Discord
  • LinkedIn
  • YouTube