6 custom skills (assign-task, dispatch-webhook, daily-briefing, task-capture, qmd-brain, tts-voice) with technical documentation. Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
1.4 KiB
Talk Mode Documentation
Overview
Talk Mode enables continuous voice conversations through a cycle of listening, transcription, model processing, and text-to-speech playback.
Core Functionality
The system operates in three phases: Listening, Thinking, Speaking. Upon detecting a brief silence, the transcript is sent to the model via the main session, and responses are both displayed in WebChat and spoken aloud using ElevenLabs.
Voice Control
Responses can include a JSON directive as the first line to customize voice settings:
{ "voice": "<voice-id>", "once": true }
Supported parameters include voice selection, model specification, speed, stability, and various ElevenLabs-specific options. The once flag limits changes to the current reply only.
Configuration
Settings are managed in ~/.openclaw/openclaw.json with options for voice ID, model selection, output format, and API credentials. The system defaults to eleven_v3 model with interruptOnSpeech enabled.
Platform-Specific Behavior
macOS displays an always-on overlay with visual indicators for each phase and allows interruption when users speak during assistant responses. The UI includes menu bar toggle, configuration tab, and cloud icon controls.
Technical Requirements
The feature requires Speech and Microphone permissions and supports various PCM and MP3 output formats across macOS, iOS, and Android platforms for optimized latency.