openclaw-skill/openclaw-knowhow-skill/docs/infrastructure/nodes/talk.md

# Talk Mode Documentation

## Overview

Talk Mode enables continuous voice conversations through a cycle of listening, transcription, model processing, and text-to-speech playback.

## Core Functionality

The system operates in three phases: Listening, Thinking, Speaking. Upon detecting a brief silence, the transcript is sent to the model via the main session, and responses are both displayed in WebChat and spoken aloud using ElevenLabs.

## Voice Control

Responses can include a JSON directive as the first line to customize voice settings:

```json
{ "voice": "<voice-id>", "once": true }
```

Supported parameters include voice selection, model specification, speed, stability, and various ElevenLabs-specific options. The `once` flag limits changes to the current reply only.

## Configuration

Settings are managed in `~/.openclaw/openclaw.json` with options for voice ID, model selection, output format, and API credentials. The system defaults to `eleven_v3` model with `interruptOnSpeech` enabled.

## Platform-Specific Behavior

**macOS** displays an always-on overlay with visual indicators for each phase and allows interruption when users speak during assistant responses. The UI includes menu bar toggle, configuration tab, and cloud icon controls.

## Technical Requirements

The feature requires Speech and Microphone permissions and supports various PCM and MP3 output formats across macOS, iOS, and Android platforms for optimized latency.