In this conversation, Vaibhav Gupta and Dex explore the intricacies of building an Agentic Retrieval-Augmented Generation (RAG) system. They discuss the differences between traditional RAG and Agentic RAG, emphasizing the flexibility and decision-making capabilities of the latter. The conversation includes a live demo of a coding agent, insights into the coding architecture, challenges faced during tool implementation, and the iterative process of refining the system. They also touch on the integration of web search functionalities and the evaluation of tool effectiveness, providing a comprehensive overview of the development process and the underlying principles of Agentic RAG systems. In this conversation, Vaibhav Gupta and Dex discuss the intricacies of building dynamic AI systems, focusing on tool implementation, user interface optimization, and model performance. They explore the importance of reinforcement learning in training models, the challenges of debugging AI systems, and the significance of writing code to enhance understanding and efficiency in AI development. The dialogue emphasizes the balance between different AI approaches and the necessity of real use cases in building effective solutions.
Exploring the intricacies of building an Agentic Retrieval-Augmented Generation (RAG) system, emphasizing the flexibility and decision-making capabilities that distinguish it from traditional RAG approaches.
Episode Summary
Vaibhav Gupta demonstrates building a complete Agentic RAG system from scratch in just 3 hours, showing the crucial difference between deterministic RAG pipelines and agent-driven context assembly. The live coding session reveals that while the agent loop itself is straightforward, the real complexity lies in tool implementation details - from handling relative paths in grep results to managing working directories and truncation notices.
Key Insights
Agentic RAG vs Traditional RAG
"RAG is a system that takes in a user query and looks into some database... Traditional RAG uses vector search 100% of the time. Agentic RAG lets the model decide if it needs to RAG anything at all."
Traditional RAG: Deterministic code fetches context based on vector similarity every time
Agentic RAG: Model decides which tools to use and what context to retrieve
Trade-off: Flexibility and capability vs speed and predictability
"Most of the time actually came from UI time, not from anything else."
Critical Tool Implementation Details
What Actually Mattered:
Using relative paths instead of absolute paths in grep results
Tracking and injecting current working directory into prompts
Adding clear truncation notices with line numbers
Implementing proper timeouts for all subprocess calls
Using ripgrep (rg) instead of standard grep
What Didn't Matter:
Tool definition prompts (never changed after initial write)
Complex retry logic
Structured tool response formats
The Architecture Pattern
User Query β [Agent Loop with Tools] β Response
β
Iterations (tool calls)
β
Tool Results β Back to Agent
Each iteration can call tools or respond to user. Sub-agents spawn fresh contexts but can't spawn more sub-agents (preventing infinite recursion).
Context Engineering Lessons
"That's context engineering. How do you make it more context efficient? Every single token counts. When you save 20 tokens per call and you're gonna grep 30 times, that makes a huge difference."
Key Optimizations:
Render tools in simplified format, not full JSON
Use [Dir] and [File] prefixes in ls output
Truncate file reads at 20K chars or 5K lines with clear instructions
Strip HTML to text in web fetch, save full content to file if needed
Error Handling Philosophy
Instead of forcing models to retry on parse errors, detect intent:
If response starts with backticks but not JSON β probably meant for user
Keep error corrections temporary, don't pollute context history
Add "invalid response" feedback but remove after correction
When to Use Agentic RAG
Use Agentic RAG When:
Problem scope is unbounded
User queries vary widely
You need web search + code search + docs
Flexibility matters more than speed
Avoid Agentic RAG When:
Problem scope is well-defined
Speed is critical
Most queries follow similar patterns
You can predict needed context
"Most people should not build agentic RAG systems for their workflows. If you're building a software stack, most problems are not so wide that you need an agentic rag system."
Model Considerations
GPT-4o works well out of the box
Smaller models (GPT-4o-mini) struggle with complex tool orchestration
Line numbers in file reads work without special training
RL/fine-tuning helps but isn't required for basic functionality
The Build Philosophy
"Writing the code helps me understand how it works. If I use a Claude Agent SDK, I don't actually understand what a system is doing."
Build from first principles to understand:
System design trade-offs
Where complexity actually lives
What optimizations matter
How to debug when things fail
Practical Implementation Tips
Start with a baseline: Build deterministic RAG first, then add agent capabilities
Invest in UI early: Good debugging UI is crucial for iteration
Test with consistent queries: Use the same 10 queries repeatedly during development
Track tool sequences: Focus on which tools are called in what order, not just final output
Handle state carefully: Preserve working directory, track file modifications
Optimize for context: Every token saved in tool responses compounds across iterations
The Bottom Line
Agentic RAG isn't technically complex - you can build one in 3 hours. The challenge is making tools context-efficient and deciding if you actually need the flexibility versus a faster, deterministic pipeline. As Dex summarizes: "The answer is what solves your user's problem."
Running the Code
Quick Start
# Clone the repository
git clone https://github.com/ai-that-works/ai-that-works.git
cd ai-that-works/2025-10-21-agentic-rag-context-engineering
# Install dependencies
uv sync
# Generate BAML client
uv run baml-cli generate
# Set your API keys
export OPENAI_API_KEY="your-key-here"
export EXA_API_KEY="your-exa-key-here" # Optional, for WebSearch tool
# Run the agent (CLI mode)
uv run python main.py "What files are in this directory?"
# Run in beautiful TUI mode (recommended)
uv run python main.py "Start" --tui
# Interactive mode
uv run python main.py "Start" --interactive
Available Commands
# Single query
uv run python main.py "What does the fern folder do?"
# Work in a specific directory
uv run python main.py "Find all Python files" --dir /path/to/project
# TUI with specific directory
uv run python main.py "Start" --tui --dir ~/myproject
WebFetchTool - Fetch and process web content (requires requests + beautifulsoup4)
TodoReadTool - Read todo list (in-memory storage)
TodoWriteTool - Write todo list (in-memory storage)
WebSearchTool - Search the web (stub - requires search API)
ExitPlanModeTool - Exit planning mode
Tool Handler Pattern
All tools are handled through a single async execute_tool() function using Python 3.10+ match statements on the action field:
async def execute_tool(tool: types.AgentTools) -> str:
"""Execute a tool based on its type using match statement"""
match tool.action:
case "Bash":
return execute_bash(tool)
case "Glob":
return execute_glob(tool)
case "Agent":
return await execute_agent(tool) # Async for recursive calls
# ... etc for all 16 tools
case other:
return f"Unknown tool type: {other}"
Setup
Prerequisites
Python 3.10+ (required for match statements)
OpenAI API key (set as OPENAI_API_KEY environment variable)
Exa API key (set as EXA_API_KEY environment variable) - for WebSearch tool
# Set your API keys
export OPENAI_API_KEY="your-key-here"
export EXA_API_KEY="your-exa-key-here" # Optional, for WebSearch tool
# Run a single command (uses current directory)
uv run python main.py "What files are in this directory?"
# Interactive mode - keeps asking for commands
uv run python main.py "List files" --interactive
# TUI mode - beautiful text user interface π¨
uv run python main.py "Start" --tui
# TUI with specific directory
uv run python main.py "Start" --tui --dir ~/myproject
# Specify a working directory (CLI mode)
uv run python main.py "Find all Python files" --dir /path/to/project
# View all options
uv run python main.py --help
User Interfaces
TUI Mode (Recommended) π¨
Beautiful text user interface with real-time updates:
uv run python main.py "Start" --tui
Features:
Status Bar: Shows working directory, iteration count, and agent status
Main Log: Pretty-formatted output with panels for tools, results, and agent replies
Auto-scrolls to latest content
Real-time updates as tools execute
Todo Panel: Live view of the todo list on the right side
Input Box: Command input at the bottom
Conversation History: Maintained across commands for context continuity
Responsive UI: Agent runs in background thread, UI stays snappy
Keyboard Shortcuts:
Enter: Submit command
Enter (empty): Continue agent execution after text replies
Ctrl+R: Reset conversation history
Ctrl+X: Interrupt agent execution
Ctrl+C: Quit application
CLI Modes
Single Command Mode:
uv run python main.py "What files are in this directory?"
Interactive Mode:
uv run python main.py "Start" --interactive
Prompts for commands via input() after each task
Type exit, quit, or q to exit
Ctrl+C returns to prompt instead of exiting
Agent Loop
The agent loop:
Takes a user message
Calls the BAML AgentLoop function
Executes any tools the LLM requests
Feeds tool results back to the LLM
Repeats until the LLM replies to the user (default max: 999 iterations)
Key Features
Type-Safe Tool Calling
BAML generates Pydantic models for all tools, ensuring type safety:
Each tool includes its full usage documentation in the @description annotation, providing the LLM with comprehensive context about when and how to use each tool.
Modular Tool Handlers
Each tool has its own handler function that can be tested and maintained independently:
def execute_bash(tool: types.BashTool) -> str:
"""Execute a bash command and return the output"""
try:
result = subprocess.run(
tool.command,
shell=True,
capture_output=True,
text=True,
timeout=tool.timeout / 1000 if tool.timeout else 120,
cwd=os.getcwd()
)
return result.stdout
except Exception as e:
return f"Error: {str(e)}"
Dependencies
Core dependencies:
baml-py - BAML Python SDK
pydantic - Data validation
typing-extensions - Type hints support
python-dotenv - Environment variable management
textual - Beautiful TUI framework
rich - Rich text formatting
Optional dependencies for specific tools:
requests + beautifulsoup4 - For WebFetch tool (install with uv add requests beautifulsoup4)
exa-py - For WebSearch tool (install with uv add exa-py)
ripgrep (system package) - For Grep tool (usually pre-installed)
In-Memory State
The agent maintains in-memory state for:
Todo list - Stored in _todo_store global variable, persists for the lifetime of the process
Agent loop - Supports recursive sub-agent calls with reduced max iterations
Sub-Agents
The agent can launch sub-agents to handle focused tasks. Sub-agents use a different BAML function (SubAgentLoop) that doesn't include the AgentTool, preventing infinite recursion.
Architecture:
Main agent uses AgentLoop - has access to all tools including AgentTool
Sub-agents use SubAgentLoop - has all tools EXCEPT AgentTool
Sub-agents run in isolated message contexts
Results are returned to the main agent
In the TUI, sub-agents are visualized with indentation and depth indicators:
Important Note: Sub-agents cannot spawn additional sub-agents (by design). The main agent uses AgentLoop which includes the AgentTool, while sub-agents use SubAgentLoop which excludes it. This prevents infinite recursion and keeps sub-agents focused on their specific goals.
Example Usage
Command Line
# Find package.json files
uv run python main.py "What directory contains the file 'package.json'?"
# Work in a specific directory
uv run python main.py "List all JavaScript files" --dir ~/my-project
# Interactive mode for multiple tasks
uv run python main.py "List files" --interactive
Programmatic Usage
# Find package.json files
user_query = 'What directory contains the file "package.json"?'
result = asyncio.run(agent_loop(user_query))
The agent will:
Use GlobTool to find all package.json files
Analyze the results
Reply to the user with the answer
Command Line Options
usage: main.py [-h] [--dir DIR] [--interactive] [--tui] [--verbose] query
positional arguments:
query The query or task for the agent to perform
options:
-h, --help show this help message and exit
--dir DIR, -d DIR Working directory for the agent (defaults to current directory)
--interactive, -i Run in interactive mode (keep asking for commands)
--tui, -t Run in TUI mode (beautiful text user interface)
--verbose, -v Enable verbose output
Note: The agent runs with a very high iteration limit (999) by default, allowing it to complete complex tasks. Sub-agents get a limit of 50 iterations.
TUI Layout
The TUI provides a beautiful interface with:
Color-coded tool executions (magenta panels)
Success results in green panels
User queries in blue panels
Live todo list updates on the right
Real-time status bar at the top
See TUI_LAYOUT.md for a detailed visual layout diagram.