{
  "title": "Session Pruning",
  "content": "Session pruning trims **old tool results** from the in-memory context right before each LLM call. It does **not** rewrite the on-disk session history (`*.jsonl`).\n\n* When `mode: \"cache-ttl\"` is enabled and the last Anthropic call for the session is older than `ttl`.\n* Only affects the messages sent to the model for that request.\n* Only active for Anthropic API calls (and OpenRouter Anthropic models).\n* For best results, match `ttl` to your model `cacheControlTtl`.\n* After a prune, the TTL window resets so subsequent requests keep cache until `ttl` expires again.\n\n## Smart defaults (Anthropic)\n\n* **OAuth or setup-token** profiles: enable `cache-ttl` pruning and set heartbeat to `1h`.\n* **API key** profiles: enable `cache-ttl` pruning, set heartbeat to `30m`, and default `cacheControlTtl` to `1h` on Anthropic models.\n* If you set any of these values explicitly, OpenClaw does **not** override them.\n\n## What this improves (cost + cache behavior)\n\n* **Why prune:** Anthropic prompt caching only applies within the TTL. If a session goes idle past the TTL, the next request re-caches the full prompt unless you trim it first.\n* **What gets cheaper:** pruning reduces the **cacheWrite** size for that first request after the TTL expires.\n* **Why the TTL reset matters:** once pruning runs, the cache window resets, so follow‑up requests can reuse the freshly cached prompt instead of re-caching the full history again.\n* **What it does not do:** pruning doesn’t add tokens or “double” costs; it only changes what gets cached on that first post‑TTL request.\n\n## What can be pruned\n\n* Only `toolResult` messages.\n* User + assistant messages are **never** modified.\n* The last `keepLastAssistants` assistant messages are protected; tool results after that cutoff are not pruned.\n* If there aren’t enough assistant messages to establish the cutoff, pruning is skipped.\n* Tool results containing **image blocks** are skipped (never trimmed/cleared).\n\n## Context window estimation\n\nPruning uses an estimated context window (chars ≈ tokens × 4). The base window is resolved in this order:\n\n1. `models.providers.*.models[].contextWindow` override.\n2. Model definition `contextWindow` (from the model registry).\n3. Default `200000` tokens.\n\nIf `agents.defaults.contextTokens` is set, it is treated as a cap (min) on the resolved window.\n\n* Pruning only runs if the last Anthropic call is older than `ttl` (default `5m`).\n* When it runs: same soft-trim + hard-clear behavior as before.\n\n## Soft vs hard pruning\n\n* **Soft-trim**: only for oversized tool results.\n  * Keeps head + tail, inserts `...`, and appends a note with the original size.\n  * Skips results with image blocks.\n* **Hard-clear**: replaces the entire tool result with `hardClear.placeholder`.\n\n* `tools.allow` / `tools.deny` support `*` wildcards.\n* Deny wins.\n* Matching is case-insensitive.\n* Empty allow list => all tools allowed.\n\n## Interaction with other limits\n\n* Built-in tools already truncate their own output; session pruning is an extra layer that prevents long-running chats from accumulating too much tool output in the model context.\n* Compaction is separate: compaction summarizes and persists, pruning is transient per request. See [/concepts/compaction](/concepts/compaction).\n\n## Defaults (when enabled)\n\n* `ttl`: `\"5m\"`\n* `keepLastAssistants`: `3`\n* `softTrimRatio`: `0.3`\n* `hardClearRatio`: `0.5`\n* `minPrunableToolChars`: `50000`\n* `softTrim`: `{ maxChars: 4000, headChars: 1500, tailChars: 1500 }`\n* `hardClear`: `{ enabled: true, placeholder: \"[Old tool result content cleared]\" }`\n\nEnable TTL-aware pruning:\n\nRestrict pruning to specific tools:\n\nSee config reference: [Gateway Configuration](/gateway/configuration)",
  "code_samples": [
    {
      "code": "Enable TTL-aware pruning:",
      "language": "unknown"
    },
    {
      "code": "Restrict pruning to specific tools:",
      "language": "unknown"
    }
  ],
  "headings": [
    {
      "level": "h2",
      "text": "When it runs",
      "id": "when-it-runs"
    },
    {
      "level": "h2",
      "text": "Smart defaults (Anthropic)",
      "id": "smart-defaults-(anthropic)"
    },
    {
      "level": "h2",
      "text": "What this improves (cost + cache behavior)",
      "id": "what-this-improves-(cost-+-cache-behavior)"
    },
    {
      "level": "h2",
      "text": "What can be pruned",
      "id": "what-can-be-pruned"
    },
    {
      "level": "h2",
      "text": "Context window estimation",
      "id": "context-window-estimation"
    },
    {
      "level": "h2",
      "text": "Mode",
      "id": "mode"
    },
    {
      "level": "h3",
      "text": "cache-ttl",
      "id": "cache-ttl"
    },
    {
      "level": "h2",
      "text": "Soft vs hard pruning",
      "id": "soft-vs-hard-pruning"
    },
    {
      "level": "h2",
      "text": "Tool selection",
      "id": "tool-selection"
    },
    {
      "level": "h2",
      "text": "Interaction with other limits",
      "id": "interaction-with-other-limits"
    },
    {
      "level": "h2",
      "text": "Defaults (when enabled)",
      "id": "defaults-(when-enabled)"
    },
    {
      "level": "h2",
      "text": "Examples",
      "id": "examples"
    }
  ],
  "url": "llms-txt#session-pruning",
  "links": []
}