openclaw-skill/openclaw-knowhow-skill/output/openclaw-docs_data/pages/Talk_Mode_aebd596bad.json

{
  "title": "Talk Mode",
  "content": "Talk mode is a continuous voice conversation loop:\n\n1. Listen for speech\n2. Send transcript to the model (main session, chat.send)\n3. Wait for the response\n4. Speak it via ElevenLabs (streaming playback)\n\n* **Always-on overlay** while Talk mode is enabled.\n* **Listening → Thinking → Speaking** phase transitions.\n* On a **short pause** (silence window), the current transcript is sent.\n* Replies are **written to WebChat** (same as typing).\n* **Interrupt on speech** (default on): if the user starts talking while the assistant is speaking, we stop playback and note the interruption timestamp for the next prompt.\n\n## Voice directives in replies\n\nThe assistant may prefix its reply with a **single JSON line** to control voice:\n\n* First non-empty line only.\n* Unknown keys are ignored.\n* `once: true` applies to the current reply only.\n* Without `once`, the voice becomes the new default for Talk mode.\n* The JSON line is stripped before TTS playback.\n\n* `voice` / `voice_id` / `voiceId`\n* `model` / `model_id` / `modelId`\n* `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`\n* `seed`, `normalize`, `lang`, `output_format`, `latency_tier`\n* `once`\n\n## Config (`~/.openclaw/openclaw.json`)\n\n* `interruptOnSpeech`: true\n* `voiceId`: falls back to `ELEVENLABS_VOICE_ID` / `SAG_VOICE_ID` (or first ElevenLabs voice when API key is available)\n* `modelId`: defaults to `eleven_v3` when unset\n* `apiKey`: falls back to `ELEVENLABS_API_KEY` (or gateway shell profile if available)\n* `outputFormat`: defaults to `pcm_44100` on macOS/iOS and `pcm_24000` on Android (set `mp3_*` to force MP3 streaming)\n\n* Menu bar toggle: **Talk**\n* Config tab: **Talk Mode** group (voice id + interrupt toggle)\n* Overlay:\n  * **Listening**: cloud pulses with mic level\n  * **Thinking**: sinking animation\n  * **Speaking**: radiating rings\n  * Click cloud: stop speaking\n  * Click X: exit Talk mode\n\n* Requires Speech + Microphone permissions.\n* Uses `chat.send` against session key `main`.\n* TTS uses ElevenLabs streaming API with `ELEVENLABS_API_KEY` and incremental playback on macOS/iOS/Android for lower latency.\n* `stability` for `eleven_v3` is validated to `0.0`, `0.5`, or `1.0`; other models accept `0..1`.\n* `latency_tier` is validated to `0..4` when set.\n* Android supports `pcm_16000`, `pcm_22050`, `pcm_24000`, and `pcm_44100` output formats for low-latency AudioTrack streaming.",
  "code_samples": [
    {
      "code": "Rules:\n\n* First non-empty line only.\n* Unknown keys are ignored.\n* `once: true` applies to the current reply only.\n* Without `once`, the voice becomes the new default for Talk mode.\n* The JSON line is stripped before TTS playback.\n\nSupported keys:\n\n* `voice` / `voice_id` / `voiceId`\n* `model` / `model_id` / `modelId`\n* `speed`, `rate` (WPM), `stability`, `similarity`, `style`, `speakerBoost`\n* `seed`, `normalize`, `lang`, `output_format`, `latency_tier`\n* `once`\n\n## Config (`~/.openclaw/openclaw.json`)",
      "language": "unknown"
    }
  ],
  "headings": [
    {
      "level": "h2",
      "text": "Behavior (macOS)",
      "id": "behavior-(macos)"
    },
    {
      "level": "h2",
      "text": "Voice directives in replies",
      "id": "voice-directives-in-replies"
    },
    {
      "level": "h2",
      "text": "Config (`~/.openclaw/openclaw.json`)",
      "id": "config-(`~/.openclaw/openclaw.json`)"
    },
    {
      "level": "h2",
      "text": "macOS UI",
      "id": "macos-ui"
    },
    {
      "level": "h2",
      "text": "Notes",
      "id": "notes"
    }
  ],
  "url": "llms-txt#talk-mode",
  "links": []
}