forked from Selig/openclaw-skill
6 custom skills (assign-task, dispatch-webhook, daily-briefing, task-capture, qmd-brain, tts-voice) with technical documentation. Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
1.6 KiB
1.6 KiB
Voice Overlay
Overview
This documentation describes the macOS voice overlay lifecycle, designed to manage interactions between wake-word detection and push-to-talk functionality.
Key Design Principle
The system ensures predictable behavior when wake-word and push-to-talk overlap. If the overlay is already visible from wake-word and the user presses the hotkey, the hotkey session adopts the existing text instead of resetting it.
Core Implementation (as of Dec 9, 2025)
The architecture uses three main components:
1. VoiceSessionCoordinator
Acts as a single-session owner managing token-based API calls:
beginWakeCapturebeginPushToTalkendCapture
2. VoiceSession
Model carrying session metadata including:
- Token
- Source (wakeWord | pushToTalk)
- Committed/volatile text
- Chime flags
- Timers (auto-send, idle)
3. VoiceSessionPublisher
SwiftUI integration that mirrors the active session into SwiftUI without direct singleton mutations.
Behavior Details
- Wake-word alone: Auto-sends on silence
- Push-to-talk: Sends immediately upon release, can wait up to 1.5s for a final transcript before falling back to the current text
Debugging Support
Stream logs using:
sudo log stream --predicate 'subsystem == "bot.molt" AND category CONTAINS "voicewake"'
Migration Path
Implementation follows five sequential steps:
- Add core components
- Wire VoiceSessionCoordinator
- Integrate VoiceSession model
- Connect VoiceSessionPublisher to SwiftUI
- Integration testing for session adoption and cooldown behavior