Effective Context Engineering for AI Agents

anthropic.com · April 1, 2026

2,976 words → 831 · 11 min saved

Apply six context engineering principles to make AI agents faster, cheaper, and more reliable in production.

Context Engineering vs. Prompt Engineering

Prompt Engineering	Context Engineering
Methods for writing and organizing LLM instructions for optimal outcomes	Strategies for curating and maintaining the optimal set of tokens during LLM inference, including all information beyond prompts
Primary focus: how to write effective system prompts	Managing entire context state: system instructions, tools, MCP, external data, message history
Suited to one-shot classification or text generation tasks	Suited to agents operating over multiple turns and longer time horizons
Static prompt optimization	Cyclically refining a constantly evolving universe of possible information into a limited context window

Why Context Is a Finite Resource

As tokens in the context window increase, the model's ability to accurately recall information decreases (context rot)

↓

Transformer architecture requires every token to attend to every other token, creating n-squared pairwise relationships for n tokens

↓

As context length increases, the model's ability to capture pairwise relationships gets stretched thin

↓

Models develop attention patterns from training data where shorter sequences are more common, so models have fewer specialized parameters for context-wide dependencies

↓

Position encoding interpolation allows handling longer sequences but with some degradation in token position understanding

↓

Result: a performance gradient (not a hard cliff) where longer contexts show reduced precision for information retrieval and long-range reasoning

Components of Effective Context

Component	Guidance	Common Failure Mode
System prompts	Use clear, direct language at the right altitude: specific enough to guide behavior, flexible enough to provide strong heuristics. Organize into distinct sections (background_information, instructions, tool guidance, output description) using XML tags or Markdown headers. Start with a minimal prompt on the best model, then add instructions based on observed failures.	Two extremes: hardcoding complex brittle logic (creates fragility), or vague high-level guidance that fails to give concrete signals
Tools	Self-contained, robust to error, extremely clear on intended use. Input parameters should be descriptive and unambiguous. Return token-efficient information.	Bloated tool sets covering too much functionality or creating ambiguous decision points about which tool to use
Examples (few-shot)	Curate diverse, canonical examples that portray expected behavior. Examples are the 'pictures' worth a thousand words.	Stuffing a laundry list of edge cases into a prompt to articulate every possible rule

Context Retrieval: Pre-computed vs. Just-in-Time

Pre-inference Retrieval	Just-in-Time Agentic Retrieval
Embedding-based retrieval surfaces context before inference	Agents maintain lightweight identifiers (file paths, stored queries, web links) and dynamically load data at runtime using tools
Pre-processes all relevant data up front	Progressive disclosure: agents incrementally discover context through exploration, assembling understanding layer by layer
Faster retrieval from pre-computed data	Slower but avoids stale indexing; metadata (folder hierarchies, naming conventions, timestamps) provides signals for navigation
Risk of loading irrelevant information into context	Self-managed context window keeps focus on relevant subsets

Hybrid Retrieval Strategy

Claude Code uses a hybrid model: CLAUDE.md files are naively dropped into context up front, while primitives like glob and grep allow just-in-time navigation, bypassing stale indexing and complex syntax trees
The hybrid strategy may suit contexts with less dynamic content, such as legal or finance work
As model capabilities improve, agentic design will trend towards letting intelligent models act intelligently with progressively less human curation
Current best advice: 'do the simplest thing that works'

Long-Horizon Context Techniques

Technique	How It Works	Best For
Compaction	Summarize conversation nearing context limit, reinitiate with summary. Claude Code passes message history to the model to compress critical details (architectural decisions, unresolved bugs, implementation details), discards redundant tool outputs, continues with compressed context plus the five most recently accessed files.	Tasks requiring extensive back-and-forth conversational flow
Structured note-taking	Agent writes notes persisted to memory outside the context window, pulled back in later. Like maintaining a to-do list or NOTES.md file to track progress across complex tasks.	Iterative development with clear milestones
Sub-agent architectures	Specialized sub-agents handle focused tasks with clean context windows. Each subagent may use tens of thousands of tokens but returns only a condensed summary (often 1,000-2,000 tokens). Lead agent coordinates with a high-level plan.	Complex research and analysis where parallel exploration pays dividends

Compaction Implementation

Tuning compaction prompts

☐ Start by maximizing recall to capture every relevant piece of information from complex agent traces

☐ Iterate to improve precision by eliminating superfluous content

☐ Clear tool calls and results deep in message history (safest, lightest-touch form of compaction, launched as a feature on the Claude Developer Platform)

Structured Note-Taking in Practice

Claude playing Pokemon demonstrates memory transforming agent capabilities: the agent maintains precise tallies across thousands of game steps (e.g., 'for the last 1,234 steps I've been training my Pokemon in Route 1, Pikachu has gained 8 levels toward the target of 10'), develops maps, remembers key achievements, maintains combat strategy notes
After context resets, the agent reads its own notes and continues multi-hour sequences, enabling long-horizon strategies impossible when keeping all information in context alone
Anthropic released a memory tool in public beta alongside Sonnet 4.5 on the Claude Developer Platform: a file-based system for storing and consulting information outside the context window

Core Principle

Context engineering is about finding the smallest set of high-signal tokens that maximize the likelihood of your desired outcome

↓

Smarter models require less prescriptive engineering, allowing more agent autonomy

↓

But even as capabilities scale, treating context as a precious finite resource remains central to building reliable agents

Read original · ← Archive