# Voice Overlay ## Overview This documentation describes the macOS voice overlay lifecycle, designed to manage interactions between wake-word detection and push-to-talk functionality. ## Key Design Principle The system ensures predictable behavior when wake-word and push-to-talk overlap. If the overlay is already visible from wake-word and the user presses the hotkey, the hotkey session *adopts* the existing text instead of resetting it. ## Core Implementation (as of Dec 9, 2025) The architecture uses three main components: ### 1. VoiceSessionCoordinator Acts as a single-session owner managing token-based API calls: - `beginWakeCapture` - `beginPushToTalk` - `endCapture` ### 2. VoiceSession Model carrying session metadata including: - Token - Source (wakeWord | pushToTalk) - Committed/volatile text - Chime flags - Timers (auto-send, idle) ### 3. VoiceSessionPublisher SwiftUI integration that mirrors the active session into SwiftUI without direct singleton mutations. ## Behavior Details - **Wake-word alone**: Auto-sends on silence - **Push-to-talk**: Sends immediately upon release, can wait up to 1.5s for a final transcript before falling back to the current text ## Debugging Support Stream logs using: ```bash sudo log stream --predicate 'subsystem == "bot.molt" AND category CONTAINS "voicewake"' ``` ## Migration Path Implementation follows five sequential steps: 1. Add core components 2. Wire VoiceSessionCoordinator 3. Integrate VoiceSession model 4. Connect VoiceSessionPublisher to SwiftUI 5. Integration testing for session adoption and cooldown behavior