# Talk Mode Documentation ## Overview Talk Mode enables continuous voice conversations through a cycle of listening, transcription, model processing, and text-to-speech playback. ## Core Functionality The system operates in three phases: Listening, Thinking, Speaking. Upon detecting a brief silence, the transcript is sent to the model via the main session, and responses are both displayed in WebChat and spoken aloud using ElevenLabs. ## Voice Control Responses can include a JSON directive as the first line to customize voice settings: ```json { "voice": "", "once": true } ``` Supported parameters include voice selection, model specification, speed, stability, and various ElevenLabs-specific options. The `once` flag limits changes to the current reply only. ## Configuration Settings are managed in `~/.openclaw/openclaw.json` with options for voice ID, model selection, output format, and API credentials. The system defaults to `eleven_v3` model with `interruptOnSpeech` enabled. ## Platform-Specific Behavior **macOS** displays an always-on overlay with visual indicators for each phase and allows interruption when users speak during assistant responses. The UI includes menu bar toggle, configuration tab, and cloud icon controls. ## Technical Requirements The feature requires Speech and Microphone permissions and supports various PCM and MP3 output formats across macOS, iOS, and Android platforms for optimized latency.