forked from Selig/openclaw-skill
6 custom skills (assign-task, dispatch-webhook, daily-briefing, task-capture, qmd-brain, tts-voice) with technical documentation. Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
32 lines
1.4 KiB
Markdown
32 lines
1.4 KiB
Markdown
# Talk Mode Documentation
|
|
|
|
## Overview
|
|
|
|
Talk Mode enables continuous voice conversations through a cycle of listening, transcription, model processing, and text-to-speech playback.
|
|
|
|
## Core Functionality
|
|
|
|
The system operates in three phases: Listening, Thinking, Speaking. Upon detecting a brief silence, the transcript is sent to the model via the main session, and responses are both displayed in WebChat and spoken aloud using ElevenLabs.
|
|
|
|
## Voice Control
|
|
|
|
Responses can include a JSON directive as the first line to customize voice settings:
|
|
|
|
```json
|
|
{ "voice": "<voice-id>", "once": true }
|
|
```
|
|
|
|
Supported parameters include voice selection, model specification, speed, stability, and various ElevenLabs-specific options. The `once` flag limits changes to the current reply only.
|
|
|
|
## Configuration
|
|
|
|
Settings are managed in `~/.openclaw/openclaw.json` with options for voice ID, model selection, output format, and API credentials. The system defaults to `eleven_v3` model with `interruptOnSpeech` enabled.
|
|
|
|
## Platform-Specific Behavior
|
|
|
|
**macOS** displays an always-on overlay with visual indicators for each phase and allows interruption when users speak during assistant responses. The UI includes menu bar toggle, configuration tab, and cloud icon controls.
|
|
|
|
## Technical Requirements
|
|
|
|
The feature requires Speech and Microphone permissions and supports various PCM and MP3 output formats across macOS, iOS, and Android platforms for optimized latency.
|