Effective Context Engineering for AI Agents

2,976 words → 831 · 11 min saved

Apply six context engineering principles to make AI agents faster, cheaper, and more reliable in production.

Context Engineering vs. Prompt Engineering

Prompt Engineering Context Engineering
Methods for writing and organizing LLM instructions for optimal outcomes Strategies for curating and maintaining the optimal set of tokens during LLM inference, including all information beyond prompts
Primary focus: how to write effective system prompts Managing entire context state: system instructions, tools, MCP, external data, message history
Suited to one-shot classification or text generation tasks Suited to agents operating over multiple turns and longer time horizons
Static prompt optimization Cyclically refining a constantly evolving universe of possible information into a limited context window

Why Context Is a Finite Resource

As tokens in the context window increase, the model's ability to accurately recall information decreases (context rot)

Transformer architecture requires every token to attend to every other token, creating n-squared pairwise relationships for n tokens

As context length increases, the model's ability to capture pairwise relationships gets stretched thin

Models develop attention patterns from training data where shorter sequences are more common, so models have fewer specialized parameters for context-wide dependencies

Position encoding interpolation allows handling longer sequences but with some degradation in token position understanding

Result: a performance gradient (not a hard cliff) where longer contexts show reduced precision for information retrieval and long-range reasoning

Components of Effective Context

Component Guidance Common Failure Mode
System prompts Use clear, direct language at the right altitude: specific enough to guide behavior, flexible enough to provide strong heuristics. Organize into distinct sections (background_information, instructions, tool guidance, output description) using XML tags or Markdown headers. Start with a minimal prompt on the best model, then add instructions based on observed failures. Two extremes: hardcoding complex brittle logic (creates fragility), or vague high-level guidance that fails to give concrete signals
Tools Self-contained, robust to error, extremely clear on intended use. Input parameters should be descriptive and unambiguous. Return token-efficient information. Bloated tool sets covering too much functionality or creating ambiguous decision points about which tool to use
Examples (few-shot) Curate diverse, canonical examples that portray expected behavior. Examples are the 'pictures' worth a thousand words. Stuffing a laundry list of edge cases into a prompt to articulate every possible rule

Context Retrieval: Pre-computed vs. Just-in-Time

Pre-inference Retrieval Just-in-Time Agentic Retrieval
Embedding-based retrieval surfaces context before inference Agents maintain lightweight identifiers (file paths, stored queries, web links) and dynamically load data at runtime using tools
Pre-processes all relevant data up front Progressive disclosure: agents incrementally discover context through exploration, assembling understanding layer by layer
Faster retrieval from pre-computed data Slower but avoids stale indexing; metadata (folder hierarchies, naming conventions, timestamps) provides signals for navigation
Risk of loading irrelevant information into context Self-managed context window keeps focus on relevant subsets

Hybrid Retrieval Strategy

  • Claude Code uses a hybrid model: CLAUDE.md files are naively dropped into context up front, while primitives like glob and grep allow just-in-time navigation, bypassing stale indexing and complex syntax trees
  • The hybrid strategy may suit contexts with less dynamic content, such as legal or finance work
  • As model capabilities improve, agentic design will trend towards letting intelligent models act intelligently with progressively less human curation
  • Current best advice: 'do the simplest thing that works'

Long-Horizon Context Techniques

Technique How It Works Best For
Compaction Summarize conversation nearing context limit, reinitiate with summary. Claude Code passes message history to the model to compress critical details (architectural decisions, unresolved bugs, implementation details), discards redundant tool outputs, continues with compressed context plus the five most recently accessed files. Tasks requiring extensive back-and-forth conversational flow
Structured note-taking Agent writes notes persisted to memory outside the context window, pulled back in later. Like maintaining a to-do list or NOTES.md file to track progress across complex tasks. Iterative development with clear milestones
Sub-agent architectures Specialized sub-agents handle focused tasks with clean context windows. Each subagent may use tens of thousands of tokens but returns only a condensed summary (often 1,000-2,000 tokens). Lead agent coordinates with a high-level plan. Complex research and analysis where parallel exploration pays dividends

Compaction Implementation

Tuning compaction prompts
☐ Start by maximizing recall to capture every relevant piece of information from complex agent traces
☐ Iterate to improve precision by eliminating superfluous content
☐ Clear tool calls and results deep in message history (safest, lightest-touch form of compaction, launched as a feature on the Claude Developer Platform)

Structured Note-Taking in Practice

  • Claude playing Pokemon demonstrates memory transforming agent capabilities: the agent maintains precise tallies across thousands of game steps (e.g., 'for the last 1,234 steps I've been training my Pokemon in Route 1, Pikachu has gained 8 levels toward the target of 10'), develops maps, remembers key achievements, maintains combat strategy notes
  • After context resets, the agent reads its own notes and continues multi-hour sequences, enabling long-horizon strategies impossible when keeping all information in context alone
  • Anthropic released a memory tool in public beta alongside Sonnet 4.5 on the Claude Developer Platform: a file-based system for storing and consulting information outside the context window

Core Principle

Context engineering is about finding the smallest set of high-signal tokens that maximize the likelihood of your desired outcome

Smarter models require less prescriptive engineering, allowing more agent autonomy

But even as capabilities scale, treating context as a precious finite resource remains central to building reliable agents

Read original · ← Archive