Files
Selig 4c966a3ad2 Initial commit: OpenClaw Skill Collection
6 custom skills (assign-task, dispatch-webhook, daily-briefing,
task-capture, qmd-brain, tts-voice) with technical documentation.
Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
2026-03-13 10:58:30 +08:00

1.6 KiB

Voice Overlay

Overview

This documentation describes the macOS voice overlay lifecycle, designed to manage interactions between wake-word detection and push-to-talk functionality.

Key Design Principle

The system ensures predictable behavior when wake-word and push-to-talk overlap. If the overlay is already visible from wake-word and the user presses the hotkey, the hotkey session adopts the existing text instead of resetting it.

Core Implementation (as of Dec 9, 2025)

The architecture uses three main components:

1. VoiceSessionCoordinator

Acts as a single-session owner managing token-based API calls:

  • beginWakeCapture
  • beginPushToTalk
  • endCapture

2. VoiceSession

Model carrying session metadata including:

  • Token
  • Source (wakeWord | pushToTalk)
  • Committed/volatile text
  • Chime flags
  • Timers (auto-send, idle)

3. VoiceSessionPublisher

SwiftUI integration that mirrors the active session into SwiftUI without direct singleton mutations.

Behavior Details

  • Wake-word alone: Auto-sends on silence
  • Push-to-talk: Sends immediately upon release, can wait up to 1.5s for a final transcript before falling back to the current text

Debugging Support

Stream logs using:

sudo log stream --predicate 'subsystem == "bot.molt" AND category CONTAINS "voicewake"'

Migration Path

Implementation follows five sequential steps:

  1. Add core components
  2. Wire VoiceSessionCoordinator
  3. Integrate VoiceSession model
  4. Connect VoiceSessionPublisher to SwiftUI
  5. Integration testing for session adoption and cooldown behavior