Initial commit: OpenClaw Skill Collection

6 custom skills (assign-task, dispatch-webhook, daily-briefing,
task-capture, qmd-brain, tts-voice) with technical documentation.
Compatible with Claude Code, OpenClaw, Codex CLI, and OpenCode.
This commit is contained in:
2026-03-13 10:58:30 +08:00
commit 4c966a3ad2
884 changed files with 140761 additions and 0 deletions

View File

@@ -0,0 +1,34 @@
# Audio and Voice Notes Documentation
## Overview
OpenClaw supports audio transcription with flexible configuration options. The system automatically detects available transcription tools or allows explicit provider/CLI setup.
## Key Capabilities
When audio understanding is enabled, OpenClaw locates the first audio attachment (local path or URL) and downloads it if needed before processing through configured models in sequence until one succeeds.
## Auto-Detection Hierarchy
Without custom configuration, the system attempts transcription in this order:
- Local CLI tools (sherpa-onnx-offline, whisper-cli, whisper Python CLI)
- Gemini CLI
- Provider APIs (OpenAI, Groq, Deepgram, Google)
## Configuration Options
Three configuration patterns are provided:
1. **Provider with CLI fallback** Uses OpenAI with Whisper CLI as backup
2. **Provider-only with scope gating** Restricts to specific chat contexts (e.g., denying group chats)
3. **Single provider** Deepgram example for dedicated service use
## Important Constraints
Default size cap is 20MB (`tools.media.audio.maxBytes`). Oversize audio is skipped for that model and the next entry is tried.
Authentication follows standard model auth patterns. The transcript output is available as `{{Transcript}}` for downstream processing, with optional character trimming via `maxChars`.
## Notable Gotchas
Scope rules use first-match evaluation, CLI commands must exit cleanly with plain text output, and timeouts should be reasonable to prevent blocking the reply queue.

View File

@@ -0,0 +1,29 @@
# Camera Capture Documentation
## Overview
OpenClaw enables camera functionality across multiple platforms through agent workflows. The feature supports photo capture (JPG) and video clips (MP4 with optional audio) on iOS, Android, and macOS devices.
## Key Features by Platform
**iOS & Android nodes** offer identical capabilities:
- Photo capture via `camera.snap` command
- Video recording via `camera.clip` command
- User-controlled settings (default enabled)
- Foreground-only operation
- Payload protection (base64 under 5 MB)
**macOS app** includes:
- Same camera commands as mobile platforms
- Camera disabled by default in settings
- Additional screen recording capability (separate from camera)
## Important Constraints
Video clips are capped (currently `<= 60s`) to avoid oversized node payloads. Photos are automatically recompressed to maintain payload limits.
Camera and microphone access require standard OS permission prompts. Android requires explicit runtime permissions for `CAMERA` and `RECORD_AUDIO` (when applicable).
## Usage
CLI helpers simplify media capture, automatically writing decoded files to temporary locations and printing `MEDIA:<path>` for agent integration.

View File

@@ -0,0 +1,29 @@
# Image and Media Support
## Overview
The WhatsApp channel via Baileys Web supports media handling with specific rules for sending, gateway processing, and agent replies.
## Key Features
**CLI Command Structure**
The documentation specifies: `openclaw message send --media <path-or-url> [--message <caption>]` for transmitting media with optional accompanying text.
**Media Processing Pipeline**
The system handles various file types differently:
- Images undergo resizing and recompression to JPEG format with a maximum dimension of 2048 pixels
- Audio files are converted to voice notes with the `ptt` flag enabled
- Documents preserve filenames and support larger file sizes
- MP4 files can enable looped playback on mobile clients using the `gifPlayback` parameter
## Size Constraints
Outbound limits vary by media category:
- Images are capped at approximately 6 MB following recompression
- Audio and video files max out at 16 MB
- Documents can reach up to 100 MB
- Media understanding operations have separate thresholds (10 MB for images, 20 MB for audio, 50 MB for video)
## Inbound Processing
When messages arrive with attachments, the system downloads media to temporary storage and exposes templating variables for command processing. Audio transcription enables slash command functionality, while image and video descriptions preserve caption text for parsing.

View File

@@ -0,0 +1,332 @@
# Nodes
A **node** is a companion device (macOS/iOS/Android/headless) that connects to the Gateway **WebSocket** (same port as operators) with `role: "node"` and exposes a command surface (e.g. `canvas.*`, `camera.*`, `system.*`) via `node.invoke`. Protocol details: [Gateway protocol](/gateway/protocol).
Legacy transport: [Bridge protocol](/gateway/bridge-protocol) (TCP JSONL; deprecated/removed for current nodes).
macOS can also run in **node mode**: the menubar app connects to the Gateway's WS server and exposes its local canvas/camera commands as a node (so `openclaw nodes …` works against this Mac).
Notes:
* Nodes are **peripherals**, not gateways. They don't run the gateway service.
* Telegram/WhatsApp/etc. messages land on the **gateway**, not on nodes.
## Pairing + status
**WS nodes use device pairing.** Nodes present a device identity during `connect`; the Gateway
creates a device pairing request for `role: node`. Approve via the devices CLI (or UI).
Quick CLI:
```bash
openclaw devices list
openclaw devices approve <requestId>
openclaw devices reject <requestId>
openclaw nodes status
openclaw nodes describe --node <idOrNameOrIp>
```
Notes:
* `nodes status` marks a node as **paired** when its device pairing role includes `node`.
* `node.pair.*` (CLI: `openclaw nodes pending/approve/reject`) is a separate gateway-owned
node pairing store; it does **not** gate the WS `connect` handshake.
## Remote node host (system.run)
Use a **node host** when your Gateway runs on one machine and you want commands
to execute on another. The model still talks to the **gateway**; the gateway
forwards `exec` calls to the **node host** when `host=node` is selected.
### What runs where
* **Gateway host**: receives messages, runs the model, routes tool calls.
* **Node host**: executes `system.run`/`system.which` on the node machine.
* **Approvals**: enforced on the node host via `~/.openclaw/exec-approvals.json`.
### Start a node host (foreground)
On the node machine:
```bash
openclaw node run --host <gateway-host> --port 18789 --display-name "Build Node"
```
### Remote gateway via SSH tunnel (loopback bind)
If the Gateway binds to loopback (`gateway.bind=loopback`, default in local mode),
remote node hosts cannot connect directly. Create an SSH tunnel and point the
node host at the local end of the tunnel.
Example (node host -> gateway host):
```bash
# Terminal A (keep running): forward local 18790 -> gateway 127.0.0.1:18789
ssh -N -L 18790:127.0.0.1:18789 user@gateway-host
# Terminal B: export the gateway token and connect through the tunnel
export OPENCLAW_GATEWAY_TOKEN="<gateway-token>"
openclaw node run --host 127.0.0.1 --port 18790 --display-name "Build Node"
```
Notes:
* The token is `gateway.auth.token` from the gateway config (`~/.openclaw/openclaw.json` on the gateway host).
* `openclaw node run` reads `OPENCLAW_GATEWAY_TOKEN` for auth.
### Start a node host (service)
```bash
openclaw node install --host <gateway-host> --port 18789 --display-name "Build Node"
openclaw node restart
```
### Pair + name
On the gateway host:
```bash
openclaw nodes pending
openclaw nodes approve <requestId>
openclaw nodes list
```
Naming options:
* `--display-name` on `openclaw node run` / `openclaw node install` (persists in `~/.openclaw/node.json` on the node).
* `openclaw nodes rename --node <id|name|ip> --name "Build Node"` (gateway override).
### Allowlist the commands
Exec approvals are **per node host**. Add allowlist entries from the gateway:
```bash
openclaw approvals allowlist add --node <id|name|ip> "/usr/bin/uname"
openclaw approvals allowlist add --node <id|name|ip> "/usr/bin/sw_vers"
```
Approvals live on the node host at `~/.openclaw/exec-approvals.json`.
### Point exec at the node
Configure defaults (gateway config):
```bash
openclaw config set tools.exec.host node
openclaw config set tools.exec.security allowlist
openclaw config set tools.exec.node "<id-or-name>"
```
Or per session:
```
/exec host=node security=allowlist node=<id-or-name>
```
Once set, any `exec` call with `host=node` runs on the node host (subject to the
node allowlist/approvals).
Related:
* [Node host CLI](/cli/node)
* [Exec tool](/tools/exec)
* [Exec approvals](/tools/exec-approvals)
## Invoking commands
Low-level (raw RPC):
```bash
openclaw nodes invoke --node <idOrNameOrIp> --command canvas.eval --params '{"javaScript":"location.href"}'
```
Higher-level helpers exist for the common "give the agent a MEDIA attachment" workflows.
## Screenshots (canvas snapshots)
If the node is showing the Canvas (WebView), `canvas.snapshot` returns `{ format, base64 }`.
CLI helper (writes to a temp file and prints `MEDIA:<path>`):
```bash
openclaw nodes canvas snapshot --node <idOrNameOrIp> --format png
openclaw nodes canvas snapshot --node <idOrNameOrIp> --format jpg --max-width 1200 --quality 0.9
```
### Canvas controls
```bash
openclaw nodes canvas present --node <idOrNameOrIp> --target https://example.com
openclaw nodes canvas hide --node <idOrNameOrIp>
openclaw nodes canvas navigate https://example.com --node <idOrNameOrIp>
openclaw nodes canvas eval --node <idOrNameOrIp> --js "document.title"
```
Notes:
* `canvas present` accepts URLs or local file paths (`--target`), plus optional `--x/--y/--width/--height` for positioning.
* `canvas eval` accepts inline JS (`--js`) or a positional arg.
### A2UI (Canvas)
```bash
openclaw nodes canvas a2ui push --node <idOrNameOrIp> --text "Hello"
openclaw nodes canvas a2ui push --node <idOrNameOrIp> --jsonl ./payload.jsonl
openclaw nodes canvas a2ui reset --node <idOrNameOrIp>
```
Notes:
* Only A2UI v0.8 JSONL is supported (v0.9/createSurface is rejected).
## Photos + videos (node camera)
Photos (`jpg`):
```bash
openclaw nodes camera list --node <idOrNameOrIp>
openclaw nodes camera snap --node <idOrNameOrIp> # default: both facings (2 MEDIA lines)
openclaw nodes camera snap --node <idOrNameOrIp> --facing front
```
Video clips (`mp4`):
```bash
openclaw nodes camera clip --node <idOrNameOrIp> --duration 10s
openclaw nodes camera clip --node <idOrNameOrIp> --duration 3000 --no-audio
```
Notes:
* The node must be **foregrounded** for `canvas.*` and `camera.*` (background calls return `NODE_BACKGROUND_UNAVAILABLE`).
* Clip duration is clamped (currently `<= 60s`) to avoid oversized base64 payloads.
* Android will prompt for `CAMERA`/`RECORD_AUDIO` permissions when possible; denied permissions fail with `*_PERMISSION_REQUIRED`.
## Screen recordings (nodes)
Nodes expose `screen.record` (mp4). Example:
```bash
openclaw nodes screen record --node <idOrNameOrIp> --duration 10s --fps 10
openclaw nodes screen record --node <idOrNameOrIp> --duration 10s --fps 10 --no-audio
```
Notes:
* `screen.record` requires the node app to be foregrounded.
* Android will show the system screen-capture prompt before recording.
* Screen recordings are clamped to `<= 60s`.
* `--no-audio` disables microphone capture (supported on iOS/Android; macOS uses system capture audio).
* Use `--screen <index>` to select a display when multiple screens are available.
## Location (nodes)
Nodes expose `location.get` when Location is enabled in settings.
CLI helper:
```bash
openclaw nodes location get --node <idOrNameOrIp>
openclaw nodes location get --node <idOrNameOrIp> --accuracy precise --max-age 15000 --location-timeout 10000
```
Notes:
* Location is **off by default**.
* "Always" requires system permission; background fetch is best-effort.
* The response includes lat/lon, accuracy (meters), and timestamp.
## SMS (Android nodes)
Android nodes can expose `sms.send` when the user grants **SMS** permission and the device supports telephony.
Low-level invoke:
```bash
openclaw nodes invoke --node <idOrNameOrIp> --command sms.send --params '{"to":"+15555550123","message":"Hello from OpenClaw"}'
```
Notes:
* The permission prompt must be accepted on the Android device before the capability is advertised.
* Wi-Fi-only devices without telephony will not advertise `sms.send`.
## System commands (node host / mac node)
The macOS node exposes `system.run`, `system.notify`, and `system.execApprovals.get/set`.
The headless node host exposes `system.run`, `system.which`, and `system.execApprovals.get/set`.
Examples:
```bash
openclaw nodes run --node <idOrNameOrIp> -- echo "Hello from mac node"
openclaw nodes notify --node <idOrNameOrIp> --title "Ping" --body "Gateway ready"
```
Notes:
* `system.run` returns stdout/stderr/exit code in the payload.
* `system.notify` respects notification permission state on the macOS app.
* `system.run` supports `--cwd`, `--env KEY=VAL`, `--command-timeout`, and `--needs-screen-recording`.
* `system.notify` supports `--priority <passive|active|timeSensitive>` and `--delivery <system|overlay|auto>`.
* macOS nodes drop `PATH` overrides; headless node hosts only accept `PATH` when it prepends the node host PATH.
* On macOS node mode, `system.run` is gated by exec approvals in the macOS app (Settings → Exec approvals).
Ask/allowlist/full behave the same as the headless node host; denied prompts return `SYSTEM_RUN_DENIED`.
* On headless node host, `system.run` is gated by exec approvals (`~/.openclaw/exec-approvals.json`).
## Exec node binding
When multiple nodes are available, you can bind exec to a specific node.
This sets the default node for `exec host=node` (and can be overridden per agent).
Global default:
```bash
openclaw config set tools.exec.node "node-id-or-name"
```
Per-agent override:
```bash
openclaw config get agents.list
openclaw config set agents.list[0].tools.exec.node "node-id-or-name"
```
Unset to allow any node:
```bash
openclaw config unset tools.exec.node
openclaw config unset agents.list[0].tools.exec.node
```
## Permissions map
Nodes may include a `permissions` map in `node.list` / `node.describe`, keyed by permission name (e.g. `screenRecording`, `accessibility`) with boolean values (`true` = granted).
## Headless node host (cross-platform)
OpenClaw can run a **headless node host** (no UI) that connects to the Gateway
WebSocket and exposes `system.run` / `system.which`. This is useful on Linux/Windows
or for running a minimal node alongside a server.
Start it:
```bash
openclaw node run --host <gateway-host> --port 18789
```
Notes:
* Pairing is still required (the Gateway will show a node approval prompt).
* The node host stores its node id, token, display name, and gateway connection info in `~/.openclaw/node.json`.
* Exec approvals are enforced locally via `~/.openclaw/exec-approvals.json`
(see [Exec approvals](/tools/exec-approvals)).
* On macOS, the headless node host prefers the companion app exec host when reachable and falls
back to local execution if the app is unavailable. Set `OPENCLAW_NODE_EXEC_HOST=app` to require
the app, or `OPENCLAW_NODE_EXEC_FALLBACK=0` to disable fallback.
* Add `--tls` / `--tls-fingerprint` when the Gateway WS uses TLS.
## Mac node mode
* The macOS menubar app connects to the Gateway WS server as a node (so `openclaw nodes …` works against this Mac).
* In remote mode, the app opens an SSH tunnel for the Gateway port and connects to `localhost`.

View File

@@ -0,0 +1,28 @@
# Location Command Documentation
## Core Functionality
The `location.get` node command retrieves device location data. It operates through a three-tier permission model rather than a simple on/off switch, reflecting how modern operating systems handle location access.
## Permission Levels
The system uses three modes: disabled, foreground-only ("While Using"), and background-enabled ("Always"). OS permissions are multi-level. We can expose a selector in-app, but the OS still decides the actual grant.
## Command Parameters & Response
When invoked, the command accepts timeout, cache age, and accuracy preferences. The response includes latitude, longitude, accuracy in meters, altitude, speed, heading, timestamp, precision status, and location source (GPS, WiFi, cellular, or unknown).
## Error Handling
The implementation provides five stable error codes: `LOCATION_DISABLED`, `LOCATION_PERMISSION_REQUIRED`, `LOCATION_BACKGROUND_UNAVAILABLE`, `LOCATION_TIMEOUT`, and `LOCATION_UNAVAILABLE`.
## Implementation Details
- Precise location is a separate toggle from the enablement mode
- iOS/macOS users configure through system settings
- Android distinguishes between standard and background location permissions
- Future background support requires push-triggered workflows
## Integration
The feature integrates via the `nodes` tool (`location_get` action) and CLI command (`openclaw nodes location get`), with recommended UX copy provided for each permission level.

View File

@@ -0,0 +1,31 @@
# Talk Mode Documentation
## Overview
Talk Mode enables continuous voice conversations through a cycle of listening, transcription, model processing, and text-to-speech playback.
## Core Functionality
The system operates in three phases: Listening, Thinking, Speaking. Upon detecting a brief silence, the transcript is sent to the model via the main session, and responses are both displayed in WebChat and spoken aloud using ElevenLabs.
## Voice Control
Responses can include a JSON directive as the first line to customize voice settings:
```json
{ "voice": "<voice-id>", "once": true }
```
Supported parameters include voice selection, model specification, speed, stability, and various ElevenLabs-specific options. The `once` flag limits changes to the current reply only.
## Configuration
Settings are managed in `~/.openclaw/openclaw.json` with options for voice ID, model selection, output format, and API credentials. The system defaults to `eleven_v3` model with `interruptOnSpeech` enabled.
## Platform-Specific Behavior
**macOS** displays an always-on overlay with visual indicators for each phase and allows interruption when users speak during assistant responses. The UI includes menu bar toggle, configuration tab, and cloud icon controls.
## Technical Requirements
The feature requires Speech and Microphone permissions and supports various PCM and MP3 output formats across macOS, iOS, and Android platforms for optimized latency.

View File

@@ -0,0 +1,28 @@
# Voice Wake Documentation
## Overview
OpenClaw implements a centralized approach to voice wake words. The system treats wake words as a single global list managed by the Gateway rather than allowing per-node customization.
## Key Architecture Details
**Storage Location:** Wake word configurations are maintained on the gateway host at `~/.openclaw/settings/voicewake.json`
**Data Structure:** The system stores trigger words alongside a timestamp in JSON format, containing the active triggers and when they were last modified.
## API Protocol
The implementation provides two primary methods:
- Retrieval: `voicewake.get` returns the current trigger list
- Updates: `voicewake.set` modifies triggers with validation and broadcasts changes
A `voicewake.changed` event notifies all connected clients (WebSocket connections, iOS/Android nodes) whenever modifications occur.
## Client Implementations
**macOS:** The native app integrates the global trigger list with voice recognition and allows settings-based editing.
**iOS/Android:** Both mobile platforms expose wake word editors in their settings interfaces. Changes propagate through the Gateway's WebSocket connection to maintain consistency across all devices.
## Operational Constraints
The system normalizes input by trimming whitespace and removing empty values. Empty lists default to system presets, and safety limits enforce caps on trigger count and length.