Apply six context engineering principles to make AI agents faster, cheaper, and more reliable in production.
| Prompt Engineering | Context Engineering |
|---|---|
| Methods for writing and organizing LLM instructions for optimal outcomes | Strategies for curating and maintaining the optimal set of tokens during LLM inference, including all information beyond prompts |
| Primary focus: how to write effective system prompts | Managing entire context state: system instructions, tools, MCP, external data, message history |
| Suited to one-shot classification or text generation tasks | Suited to agents operating over multiple turns and longer time horizons |
| Static prompt optimization | Cyclically refining a constantly evolving universe of possible information into a limited context window |
As tokens in the context window increase, the model's ability to accurately recall information decreases (context rot)
↓
Transformer architecture requires every token to attend to every other token, creating n-squared pairwise relationships for n tokens
↓
As context length increases, the model's ability to capture pairwise relationships gets stretched thin
↓
Models develop attention patterns from training data where shorter sequences are more common, so models have fewer specialized parameters for context-wide dependencies
↓
Position encoding interpolation allows handling longer sequences but with some degradation in token position understanding
↓
Result: a performance gradient (not a hard cliff) where longer contexts show reduced precision for information retrieval and long-range reasoning
| Component | Guidance | Common Failure Mode |
|---|---|---|
| System prompts | Use clear, direct language at the right altitude: specific enough to guide behavior, flexible enough to provide strong heuristics. Organize into distinct sections (background_information, instructions, tool guidance, output description) using XML tags or Markdown headers. Start with a minimal prompt on the best model, then add instructions based on observed failures. | Two extremes: hardcoding complex brittle logic (creates fragility), or vague high-level guidance that fails to give concrete signals |
| Tools | Self-contained, robust to error, extremely clear on intended use. Input parameters should be descriptive and unambiguous. Return token-efficient information. | Bloated tool sets covering too much functionality or creating ambiguous decision points about which tool to use |
| Examples (few-shot) | Curate diverse, canonical examples that portray expected behavior. Examples are the 'pictures' worth a thousand words. | Stuffing a laundry list of edge cases into a prompt to articulate every possible rule |
| Pre-inference Retrieval | Just-in-Time Agentic Retrieval |
|---|---|
| Embedding-based retrieval surfaces context before inference | Agents maintain lightweight identifiers (file paths, stored queries, web links) and dynamically load data at runtime using tools |
| Pre-processes all relevant data up front | Progressive disclosure: agents incrementally discover context through exploration, assembling understanding layer by layer |
| Faster retrieval from pre-computed data | Slower but avoids stale indexing; metadata (folder hierarchies, naming conventions, timestamps) provides signals for navigation |
| Risk of loading irrelevant information into context | Self-managed context window keeps focus on relevant subsets |
| Technique | How It Works | Best For |
|---|---|---|
| Compaction | Summarize conversation nearing context limit, reinitiate with summary. Claude Code passes message history to the model to compress critical details (architectural decisions, unresolved bugs, implementation details), discards redundant tool outputs, continues with compressed context plus the five most recently accessed files. | Tasks requiring extensive back-and-forth conversational flow |
| Structured note-taking | Agent writes notes persisted to memory outside the context window, pulled back in later. Like maintaining a to-do list or NOTES.md file to track progress across complex tasks. | Iterative development with clear milestones |
| Sub-agent architectures | Specialized sub-agents handle focused tasks with clean context windows. Each subagent may use tens of thousands of tokens but returns only a condensed summary (often 1,000-2,000 tokens). Lead agent coordinates with a high-level plan. | Complex research and analysis where parallel exploration pays dividends |
Context engineering is about finding the smallest set of high-signal tokens that maximize the likelihood of your desired outcome
↓
Smarter models require less prescriptive engineering, allowing more agent autonomy
↓
But even as capabilities scale, treating context as a precious finite resource remains central to building reliable agents