6 custom skills (assign-task, dispatch-webhook, daily-briefing, task-capture, qmd-brain, tts-voice) with technical documentation. Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
1.8 KiB
Voice Wake & Push-to-Talk
Overview
The Voice Wake feature operates in two modes: wake-word (always-on with trigger detection) and push-to-talk (immediate capture via right Option key hold).
Key Operating Modes
Wake-word Mode
Functions as the default, with the speech recognizer continuously listening for specified trigger tokens. Upon detection, it:
- Initiates capture
- Displays an overlay with partial transcription
- Automatically sends after detecting silence
Push-to-talk Mode
Activates immediately when users hold the right Option key—no trigger word necessary. The overlay remains visible during the hold and processes the audio after release.
Technical Architecture
VoiceWakeRuntime
Manages the speech recognizer, requiring approximately 0.55 seconds of meaningful pause between trigger word and command.
Silence Detection
- 2.0-second windows during active speech
- 5.0-second windows if only the trigger was detected
- Hard 120-second limit per session
Overlay Implementation
Uses VoiceWakeOverlayController with committed and volatile text states. A critical improvement prevents the "sticky overlay" failure mode where manual dismissal could halt listening—the runtime no longer blocks on overlay visibility, and closing the overlay triggers automatic restart.
User Configuration
Available settings include:
- Toggle Voice Wake on/off
- Enable push-to-talk (Cmd+Fn hold, macOS 26+)
- Language selection
- Microphone selection with persistent preferences
- Customizable audio cues (Glass sound by default, or any NSSound-compatible file)
Message Routing
Transcripts forward to the active gateway using the app's configured local or remote mode, with replies delivered to the previously-used primary provider:
- Telegram
- Discord
- WebChat