forked from Selig/openclaw-skill
Initial commit: OpenClaw Skill Collection
6 custom skills (assign-task, dispatch-webhook, daily-briefing, task-capture, qmd-brain, tts-voice) with technical documentation. Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
# Voice Overlay
|
||||
|
||||
## Overview
|
||||
|
||||
This documentation describes the macOS voice overlay lifecycle, designed to manage interactions between wake-word detection and push-to-talk functionality.
|
||||
|
||||
## Key Design Principle
|
||||
|
||||
The system ensures predictable behavior when wake-word and push-to-talk overlap. If the overlay is already visible from wake-word and the user presses the hotkey, the hotkey session *adopts* the existing text instead of resetting it.
|
||||
|
||||
## Core Implementation (as of Dec 9, 2025)
|
||||
|
||||
The architecture uses three main components:
|
||||
|
||||
### 1. VoiceSessionCoordinator
|
||||
|
||||
Acts as a single-session owner managing token-based API calls:
|
||||
|
||||
- `beginWakeCapture`
|
||||
- `beginPushToTalk`
|
||||
- `endCapture`
|
||||
|
||||
### 2. VoiceSession
|
||||
|
||||
Model carrying session metadata including:
|
||||
|
||||
- Token
|
||||
- Source (wakeWord | pushToTalk)
|
||||
- Committed/volatile text
|
||||
- Chime flags
|
||||
- Timers (auto-send, idle)
|
||||
|
||||
### 3. VoiceSessionPublisher
|
||||
|
||||
SwiftUI integration that mirrors the active session into SwiftUI without direct singleton mutations.
|
||||
|
||||
## Behavior Details
|
||||
|
||||
- **Wake-word alone**: Auto-sends on silence
|
||||
- **Push-to-talk**: Sends immediately upon release, can wait up to 1.5s for a final transcript before falling back to the current text
|
||||
|
||||
## Debugging Support
|
||||
|
||||
Stream logs using:
|
||||
|
||||
```bash
|
||||
sudo log stream --predicate 'subsystem == "bot.molt" AND category CONTAINS "voicewake"'
|
||||
```
|
||||
|
||||
## Migration Path
|
||||
|
||||
Implementation follows five sequential steps:
|
||||
|
||||
1. Add core components
|
||||
2. Wire VoiceSessionCoordinator
|
||||
3. Integrate VoiceSession model
|
||||
4. Connect VoiceSessionPublisher to SwiftUI
|
||||
5. Integration testing for session adoption and cooldown behavior
|
||||
Reference in New Issue
Block a user