Files
Selig 4c966a3ad2 Initial commit: OpenClaw Skill Collection
6 custom skills (assign-task, dispatch-webhook, daily-briefing,
task-capture, qmd-brain, tts-voice) with technical documentation.
Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
2026-03-13 10:58:30 +08:00

59 lines
1.6 KiB
Markdown

# Voice Overlay
## Overview
This documentation describes the macOS voice overlay lifecycle, designed to manage interactions between wake-word detection and push-to-talk functionality.
## Key Design Principle
The system ensures predictable behavior when wake-word and push-to-talk overlap. If the overlay is already visible from wake-word and the user presses the hotkey, the hotkey session *adopts* the existing text instead of resetting it.
## Core Implementation (as of Dec 9, 2025)
The architecture uses three main components:
### 1. VoiceSessionCoordinator
Acts as a single-session owner managing token-based API calls:
- `beginWakeCapture`
- `beginPushToTalk`
- `endCapture`
### 2. VoiceSession
Model carrying session metadata including:
- Token
- Source (wakeWord | pushToTalk)
- Committed/volatile text
- Chime flags
- Timers (auto-send, idle)
### 3. VoiceSessionPublisher
SwiftUI integration that mirrors the active session into SwiftUI without direct singleton mutations.
## Behavior Details
- **Wake-word alone**: Auto-sends on silence
- **Push-to-talk**: Sends immediately upon release, can wait up to 1.5s for a final transcript before falling back to the current text
## Debugging Support
Stream logs using:
```bash
sudo log stream --predicate 'subsystem == "bot.molt" AND category CONTAINS "voicewake"'
```
## Migration Path
Implementation follows five sequential steps:
1. Add core components
2. Wire VoiceSessionCoordinator
3. Integrate VoiceSession model
4. Connect VoiceSessionPublisher to SwiftUI
5. Integration testing for session adoption and cooldown behavior