Files
openclaw-skill/openclaw-knowhow-skill/docs/infrastructure/nodes/audio.md
Selig 4c966a3ad2 Initial commit: OpenClaw Skill Collection
6 custom skills (assign-task, dispatch-webhook, daily-briefing,
task-capture, qmd-brain, tts-voice) with technical documentation.
Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
2026-03-13 10:58:30 +08:00

35 lines
1.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Audio and Voice Notes Documentation
## Overview
OpenClaw supports audio transcription with flexible configuration options. The system automatically detects available transcription tools or allows explicit provider/CLI setup.
## Key Capabilities
When audio understanding is enabled, OpenClaw locates the first audio attachment (local path or URL) and downloads it if needed before processing through configured models in sequence until one succeeds.
## Auto-Detection Hierarchy
Without custom configuration, the system attempts transcription in this order:
- Local CLI tools (sherpa-onnx-offline, whisper-cli, whisper Python CLI)
- Gemini CLI
- Provider APIs (OpenAI, Groq, Deepgram, Google)
## Configuration Options
Three configuration patterns are provided:
1. **Provider with CLI fallback** Uses OpenAI with Whisper CLI as backup
2. **Provider-only with scope gating** Restricts to specific chat contexts (e.g., denying group chats)
3. **Single provider** Deepgram example for dedicated service use
## Important Constraints
Default size cap is 20MB (`tools.media.audio.maxBytes`). Oversize audio is skipped for that model and the next entry is tried.
Authentication follows standard model auth patterns. The transcript output is available as `{{Transcript}}` for downstream processing, with optional character trimming via `maxChars`.
## Notable Gotchas
Scope rules use first-match evaluation, CLI commands must exit cleanly with plain text output, and timeouts should be reasonable to prevent blocking the reply queue.