Files

Selig 4c966a3ad2 Initial commit: OpenClaw Skill Collection

6 custom skills (assign-task, dispatch-webhook, daily-briefing,
task-capture, qmd-brain, tts-voice) with technical documentation.
Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.

2026-03-13 10:58:30 +08:00

1.6 KiB

Raw Permalink Blame History

Voice Overlay

Overview

This documentation describes the macOS voice overlay lifecycle, designed to manage interactions between wake-word detection and push-to-talk functionality.

Key Design Principle

The system ensures predictable behavior when wake-word and push-to-talk overlap. If the overlay is already visible from wake-word and the user presses the hotkey, the hotkey session adopts the existing text instead of resetting it.

Core Implementation (as of Dec 9, 2025)

The architecture uses three main components:

1. VoiceSessionCoordinator

Acts as a single-session owner managing token-based API calls:

beginWakeCapture
beginPushToTalk
endCapture

2. VoiceSession

Model carrying session metadata including:

Token
Source (wakeWord | pushToTalk)
Committed/volatile text
Chime flags
Timers (auto-send, idle)

3. VoiceSessionPublisher

SwiftUI integration that mirrors the active session into SwiftUI without direct singleton mutations.

Behavior Details

Wake-word alone: Auto-sends on silence
Push-to-talk: Sends immediately upon release, can wait up to 1.5s for a final transcript before falling back to the current text

Debugging Support

Stream logs using:

sudo log stream --predicate 'subsystem == "bot.molt" AND category CONTAINS "voicewake"'

Migration Path

Implementation follows five sequential steps:

Add core components
Wire VoiceSessionCoordinator
Integrate VoiceSession model
Connect VoiceSessionPublisher to SwiftUI
Integration testing for session adoption and cooldown behavior

1.6 KiB Raw Permalink Blame History