openclaw-skill/openclaw-knowhow-skill/output/openclaw-docs_data/pages/Streaming_chunking_7e9d8c9cab.json

{
  "title": "Streaming + chunking",
  "content": "OpenClaw has two separate “streaming” layers:\n\n* **Block streaming (channels):** emit completed **blocks** as the assistant writes. These are normal channel messages (not token deltas).\n* **Token-ish streaming (Telegram only):** update a **draft bubble** with partial text while generating; final message is sent at the end.\n\nThere is **no real token streaming** to external channel messages today. Telegram draft streaming is the only partial-stream surface.\n\n## Block streaming (channel messages)\n\nBlock streaming sends assistant output in coarse chunks as it becomes available.\n\n* `text_delta/events`: model stream events (may be sparse for non-streaming models).\n* `chunker`: `EmbeddedBlockChunker` applying min/max bounds + break preference.\n* `channel send`: actual outbound messages (block replies).\n\n* `agents.defaults.blockStreamingDefault`: `\"on\"`/`\"off\"` (default off).\n* Channel overrides: `*.blockStreaming` (and per-account variants) to force `\"on\"`/`\"off\"` per channel.\n* `agents.defaults.blockStreamingBreak`: `\"text_end\"` or `\"message_end\"`.\n* `agents.defaults.blockStreamingChunk`: `{ minChars, maxChars, breakPreference? }`.\n* `agents.defaults.blockStreamingCoalesce`: `{ minChars?, maxChars?, idleMs? }` (merge streamed blocks before send).\n* Channel hard cap: `*.textChunkLimit` (e.g., `channels.whatsapp.textChunkLimit`).\n* Channel chunk mode: `*.chunkMode` (`length` default, `newline` splits on blank lines (paragraph boundaries) before length chunking).\n* Discord soft cap: `channels.discord.maxLinesPerMessage` (default 17) splits tall replies to avoid UI clipping.\n\n**Boundary semantics:**\n\n* `text_end`: stream blocks as soon as chunker emits; flush on each `text_end`.\n* `message_end`: wait until assistant message finishes, then flush buffered output.\n\n`message_end` still uses the chunker if the buffered text exceeds `maxChars`, so it can emit multiple chunks at the end.\n\n## Chunking algorithm (low/high bounds)\n\nBlock chunking is implemented by `EmbeddedBlockChunker`:\n\n* **Low bound:** don’t emit until buffer >= `minChars` (unless forced).\n* **High bound:** prefer splits before `maxChars`; if forced, split at `maxChars`.\n* **Break preference:** `paragraph` → `newline` → `sentence` → `whitespace` → hard break.\n* **Code fences:** never split inside fences; when forced at `maxChars`, close + reopen the fence to keep Markdown valid.\n\n`maxChars` is clamped to the channel `textChunkLimit`, so you can’t exceed per-channel caps.\n\n## Coalescing (merge streamed blocks)\n\nWhen block streaming is enabled, OpenClaw can **merge consecutive block chunks**\nbefore sending them out. This reduces “single-line spam” while still providing\nprogressive output.\n\n* Coalescing waits for **idle gaps** (`idleMs`) before flushing.\n* Buffers are capped by `maxChars` and will flush if they exceed it.\n* `minChars` prevents tiny fragments from sending until enough text accumulates\n  (final flush always sends remaining text).\n* Joiner is derived from `blockStreamingChunk.breakPreference`\n  (`paragraph` → `\\n\\n`, `newline` → `\\n`, `sentence` → space).\n* Channel overrides are available via `*.blockStreamingCoalesce` (including per-account configs).\n* Default coalesce `minChars` is bumped to 1500 for Signal/Slack/Discord unless overridden.\n\n## Human-like pacing between blocks\n\nWhen block streaming is enabled, you can add a **randomized pause** between\nblock replies (after the first block). This makes multi-bubble responses feel\nmore natural.\n\n* Config: `agents.defaults.humanDelay` (override per agent via `agents.list[].humanDelay`).\n* Modes: `off` (default), `natural` (800–2500ms), `custom` (`minMs`/`maxMs`).\n* Applies only to **block replies**, not final replies or tool summaries.\n\n## “Stream chunks or everything”\n\n* **Stream chunks:** `blockStreamingDefault: \"on\"` + `blockStreamingBreak: \"text_end\"` (emit as you go). Non-Telegram channels also need `*.blockStreaming: true`.\n* **Stream everything at end:** `blockStreamingBreak: \"message_end\"` (flush once, possibly multiple chunks if very long).\n* **No block streaming:** `blockStreamingDefault: \"off\"` (only final reply).\n\n**Channel note:** For non-Telegram channels, block streaming is **off unless**\n`*.blockStreaming` is explicitly set to `true`. Telegram can stream drafts\n(`channels.telegram.streamMode`) without block replies.\n\nConfig location reminder: the `blockStreaming*` defaults live under\n`agents.defaults`, not the root config.\n\n## Telegram draft streaming (token-ish)\n\nTelegram is the only channel with draft streaming:\n\n* Uses Bot API `sendMessageDraft` in **private chats with topics**.\n* `channels.telegram.streamMode: \"partial\" | \"block\" | \"off\"`.\n  * `partial`: draft updates with the latest stream text.\n  * `block`: draft updates in chunked blocks (same chunker rules).\n  * `off`: no draft streaming.\n* Draft chunk config (only for `streamMode: \"block\"`): `channels.telegram.draftChunk` (defaults: `minChars: 200`, `maxChars: 800`).\n* Draft streaming is separate from block streaming; block replies are off by default and only enabled by `*.blockStreaming: true` on non-Telegram channels.\n* Final reply is still a normal message.\n* `/reasoning stream` writes reasoning into the draft bubble (Telegram only).\n\nWhen draft streaming is active, OpenClaw disables block streaming for that reply to avoid double-streaming.\n\n* `sendMessageDraft`: Telegram draft bubble (not a real message).\n* `final reply`: normal Telegram message send.",
  "code_samples": [
    {
      "code": "Model output\n  └─ text_delta/events\n       ├─ (blockStreamingBreak=text_end)\n       │    └─ chunker emits blocks as buffer grows\n       └─ (blockStreamingBreak=message_end)\n            └─ chunker flushes at message_end\n                   └─ channel send (block replies)",
      "language": "unknown"
    },
    {
      "code": "Telegram (private + topics)\n  └─ sendMessageDraft (draft bubble)\n       ├─ streamMode=partial → update latest text\n       └─ streamMode=block   → chunker updates draft\n  └─ final reply → normal message",
      "language": "unknown"
    }
  ],
  "headings": [
    {
      "level": "h2",
      "text": "Block streaming (channel messages)",
      "id": "block-streaming-(channel-messages)"
    },
    {
      "level": "h2",
      "text": "Chunking algorithm (low/high bounds)",
      "id": "chunking-algorithm-(low/high-bounds)"
    },
    {
      "level": "h2",
      "text": "Coalescing (merge streamed blocks)",
      "id": "coalescing-(merge-streamed-blocks)"
    },
    {
      "level": "h2",
      "text": "Human-like pacing between blocks",
      "id": "human-like-pacing-between-blocks"
    },
    {
      "level": "h2",
      "text": "“Stream chunks or everything”",
      "id": "“stream-chunks-or-everything”"
    },
    {
      "level": "h2",
      "text": "Telegram draft streaming (token-ish)",
      "id": "telegram-draft-streaming-(token-ish)"
    }
  ],
  "url": "llms-txt#streaming-+-chunking",
  "links": []
}