Initial commit: OpenClaw Skill Collection
6 custom skills (assign-task, dispatch-webhook, daily-briefing, task-capture, qmd-brain, tts-voice) with technical documentation. Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
This commit is contained in:
34
openclaw-knowhow-skill/docs/infrastructure/nodes/audio.md
Normal file
34
openclaw-knowhow-skill/docs/infrastructure/nodes/audio.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Audio and Voice Notes Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
OpenClaw supports audio transcription with flexible configuration options. The system automatically detects available transcription tools or allows explicit provider/CLI setup.
|
||||
|
||||
## Key Capabilities
|
||||
|
||||
When audio understanding is enabled, OpenClaw locates the first audio attachment (local path or URL) and downloads it if needed before processing through configured models in sequence until one succeeds.
|
||||
|
||||
## Auto-Detection Hierarchy
|
||||
|
||||
Without custom configuration, the system attempts transcription in this order:
|
||||
- Local CLI tools (sherpa-onnx-offline, whisper-cli, whisper Python CLI)
|
||||
- Gemini CLI
|
||||
- Provider APIs (OpenAI, Groq, Deepgram, Google)
|
||||
|
||||
## Configuration Options
|
||||
|
||||
Three configuration patterns are provided:
|
||||
|
||||
1. **Provider with CLI fallback** – Uses OpenAI with Whisper CLI as backup
|
||||
2. **Provider-only with scope gating** – Restricts to specific chat contexts (e.g., denying group chats)
|
||||
3. **Single provider** – Deepgram example for dedicated service use
|
||||
|
||||
## Important Constraints
|
||||
|
||||
Default size cap is 20MB (`tools.media.audio.maxBytes`). Oversize audio is skipped for that model and the next entry is tried.
|
||||
|
||||
Authentication follows standard model auth patterns. The transcript output is available as `{{Transcript}}` for downstream processing, with optional character trimming via `maxChars`.
|
||||
|
||||
## Notable Gotchas
|
||||
|
||||
Scope rules use first-match evaluation, CLI commands must exit cleanly with plain text output, and timeouts should be reasonable to prevent blocking the reply queue.
|
||||
Reference in New Issue
Block a user