Files
openclaw-skill/openclaw-knowhow-skill/docs/infrastructure/nodes/talk.md
Selig 4c966a3ad2 Initial commit: OpenClaw Skill Collection
6 custom skills (assign-task, dispatch-webhook, daily-briefing,
task-capture, qmd-brain, tts-voice) with technical documentation.
Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
2026-03-13 10:58:30 +08:00

1.4 KiB

Talk Mode Documentation

Overview

Talk Mode enables continuous voice conversations through a cycle of listening, transcription, model processing, and text-to-speech playback.

Core Functionality

The system operates in three phases: Listening, Thinking, Speaking. Upon detecting a brief silence, the transcript is sent to the model via the main session, and responses are both displayed in WebChat and spoken aloud using ElevenLabs.

Voice Control

Responses can include a JSON directive as the first line to customize voice settings:

{ "voice": "<voice-id>", "once": true }

Supported parameters include voice selection, model specification, speed, stability, and various ElevenLabs-specific options. The once flag limits changes to the current reply only.

Configuration

Settings are managed in ~/.openclaw/openclaw.json with options for voice ID, model selection, output format, and API credentials. The system defaults to eleven_v3 model with interruptOnSpeech enabled.

Platform-Specific Behavior

macOS displays an always-on overlay with visual indicators for each phase and allows interruption when users speak during assistant responses. The UI includes menu bar toggle, configuration tab, and cloud icon controls.

Technical Requirements

The feature requires Speech and Microphone permissions and supports various PCM and MP3 output formats across macOS, iOS, and Android platforms for optimized latency.