Initial commit: OpenClaw Skill Collection
6 custom skills (assign-task, dispatch-webhook, daily-briefing, task-capture, qmd-brain, tts-voice) with technical documentation. Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
This commit is contained in:
@@ -0,0 +1,54 @@
|
||||
# Voice Wake & Push-to-Talk
|
||||
|
||||
## Overview
|
||||
|
||||
The Voice Wake feature operates in two modes: wake-word (always-on with trigger detection) and push-to-talk (immediate capture via right Option key hold).
|
||||
|
||||
## Key Operating Modes
|
||||
|
||||
### Wake-word Mode
|
||||
|
||||
Functions as the default, with the speech recognizer continuously listening for specified trigger tokens. Upon detection, it:
|
||||
|
||||
1. Initiates capture
|
||||
2. Displays an overlay with partial transcription
|
||||
3. Automatically sends after detecting silence
|
||||
|
||||
### Push-to-talk Mode
|
||||
|
||||
Activates immediately when users hold the right Option key—no trigger word necessary. The overlay remains visible during the hold and processes the audio after release.
|
||||
|
||||
## Technical Architecture
|
||||
|
||||
### VoiceWakeRuntime
|
||||
|
||||
Manages the speech recognizer, requiring approximately 0.55 seconds of meaningful pause between trigger word and command.
|
||||
|
||||
### Silence Detection
|
||||
|
||||
- 2.0-second windows during active speech
|
||||
- 5.0-second windows if only the trigger was detected
|
||||
- Hard 120-second limit per session
|
||||
|
||||
### Overlay Implementation
|
||||
|
||||
Uses `VoiceWakeOverlayController` with committed and volatile text states. A critical improvement prevents the "sticky overlay" failure mode where manual dismissal could halt listening—the runtime no longer blocks on overlay visibility, and closing the overlay triggers automatic restart.
|
||||
|
||||
## User Configuration
|
||||
|
||||
Available settings include:
|
||||
|
||||
- Toggle Voice Wake on/off
|
||||
- Enable push-to-talk (Cmd+Fn hold, macOS 26+)
|
||||
- Language selection
|
||||
- Microphone selection with persistent preferences
|
||||
- Customizable audio cues (Glass sound by default, or any NSSound-compatible file)
|
||||
|
||||
## Message Routing
|
||||
|
||||
Transcripts forward to the active gateway using the app's configured local or remote mode, with replies delivered to the previously-used primary provider:
|
||||
|
||||
- WhatsApp
|
||||
- Telegram
|
||||
- Discord
|
||||
- WebChat
|
||||
Reference in New Issue
Block a user