openclaw-skill/openclaw-knowhow-skill/docs/infrastructure/platforms/mac/voice-overlay.md

# Voice Overlay

## Overview

This documentation describes the macOS voice overlay lifecycle, designed to manage interactions between wake-word detection and push-to-talk functionality.

## Key Design Principle

The system ensures predictable behavior when wake-word and push-to-talk overlap. If the overlay is already visible from wake-word and the user presses the hotkey, the hotkey session *adopts* the existing text instead of resetting it.

## Core Implementation (as of Dec 9, 2025)

The architecture uses three main components:

### 1. VoiceSessionCoordinator

Acts as a single-session owner managing token-based API calls:

- `beginWakeCapture`
- `beginPushToTalk`
- `endCapture`

### 2. VoiceSession

Model carrying session metadata including:

- Token
- Source (wakeWord | pushToTalk)
- Committed/volatile text
- Chime flags
- Timers (auto-send, idle)

### 3. VoiceSessionPublisher

SwiftUI integration that mirrors the active session into SwiftUI without direct singleton mutations.

## Behavior Details

- **Wake-word alone**: Auto-sends on silence
- **Push-to-talk**: Sends immediately upon release, can wait up to 1.5s for a final transcript before falling back to the current text

## Debugging Support

Stream logs using:

```bash
sudo log stream --predicate 'subsystem == "bot.molt" AND category CONTAINS "voicewake"'
```

## Migration Path

Implementation follows five sequential steps:

1. Add core components
2. Wire VoiceSessionCoordinator
3. Integrate VoiceSession model
4. Connect VoiceSessionPublisher to SwiftUI
5. Integration testing for session adoption and cooldown behavior