Files
openclaw-skill/openclaw-knowhow-skill/docs/infrastructure/platforms/mac/voicewake.md
Selig 4c966a3ad2 Initial commit: OpenClaw Skill Collection
6 custom skills (assign-task, dispatch-webhook, daily-briefing,
task-capture, qmd-brain, tts-voice) with technical documentation.
Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
2026-03-13 10:58:30 +08:00

1.8 KiB

Voice Wake & Push-to-Talk

Overview

The Voice Wake feature operates in two modes: wake-word (always-on with trigger detection) and push-to-talk (immediate capture via right Option key hold).

Key Operating Modes

Wake-word Mode

Functions as the default, with the speech recognizer continuously listening for specified trigger tokens. Upon detection, it:

  1. Initiates capture
  2. Displays an overlay with partial transcription
  3. Automatically sends after detecting silence

Push-to-talk Mode

Activates immediately when users hold the right Option key—no trigger word necessary. The overlay remains visible during the hold and processes the audio after release.

Technical Architecture

VoiceWakeRuntime

Manages the speech recognizer, requiring approximately 0.55 seconds of meaningful pause between trigger word and command.

Silence Detection

  • 2.0-second windows during active speech
  • 5.0-second windows if only the trigger was detected
  • Hard 120-second limit per session

Overlay Implementation

Uses VoiceWakeOverlayController with committed and volatile text states. A critical improvement prevents the "sticky overlay" failure mode where manual dismissal could halt listening—the runtime no longer blocks on overlay visibility, and closing the overlay triggers automatic restart.

User Configuration

Available settings include:

  • Toggle Voice Wake on/off
  • Enable push-to-talk (Cmd+Fn hold, macOS 26+)
  • Language selection
  • Microphone selection with persistent preferences
  • Customizable audio cues (Glass sound by default, or any NSSound-compatible file)

Message Routing

Transcripts forward to the active gateway using the app's configured local or remote mode, with replies delivered to the previously-used primary provider:

  • WhatsApp
  • Telegram
  • Discord
  • WebChat