{ "title": "Testing", "content": "OpenClaw has three Vitest suites (unit/integration, e2e, live) and a small set of Docker runners.\n\nThis doc is a “how we test” guide:\n\n* What each suite covers (and what it deliberately does *not* cover)\n* Which commands to run for common workflows (local, pre-push, debugging)\n* How live tests discover credentials and select models/providers\n* How to add regressions for real-world model/provider issues\n\n* Full gate (expected before push): `pnpm build && pnpm check && pnpm test`\n\nWhen you touch tests or want extra confidence:\n\n* Coverage gate: `pnpm test:coverage`\n* E2E suite: `pnpm test:e2e`\n\nWhen debugging real providers/models (requires real creds):\n\n* Live suite (models + gateway tool/image probes): `pnpm test:live`\n\nTip: when you only need one failing case, prefer narrowing live tests via the allowlist env vars described below.\n\n## Test suites (what runs where)\n\nThink of the suites as “increasing realism” (and increasing flakiness/cost):\n\n### Unit / integration (default)\n\n* Command: `pnpm test`\n* Config: `vitest.config.ts`\n* Files: `src/**/*.test.ts`\n* Scope:\n * Pure unit tests\n * In-process integration tests (gateway auth, routing, tooling, parsing, config)\n * Deterministic regressions for known bugs\n* Expectations:\n * Runs in CI\n * No real keys required\n * Should be fast and stable\n\n### E2E (gateway smoke)\n\n* Command: `pnpm test:e2e`\n* Config: `vitest.e2e.config.ts`\n* Files: `src/**/*.e2e.test.ts`\n* Scope:\n * Multi-instance gateway end-to-end behavior\n * WebSocket/HTTP surfaces, node pairing, and heavier networking\n* Expectations:\n * Runs in CI (when enabled in the pipeline)\n * No real keys required\n * More moving parts than unit tests (can be slower)\n\n### Live (real providers + real models)\n\n* Command: `pnpm test:live`\n* Config: `vitest.live.config.ts`\n* Files: `src/**/*.live.test.ts`\n* Default: **enabled** by `pnpm test:live` (sets `OPENCLAW_LIVE_TEST=1`)\n* Scope:\n * “Does this provider/model actually work *today* with real creds?”\n * Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior\n* Expectations:\n * Not CI-stable by design (real networks, real provider policies, quotas, outages)\n * Costs money / uses rate limits\n * Prefer running narrowed subsets instead of “everything”\n * Live runs will source `~/.profile` to pick up missing API keys\n * Anthropic key rotation: set `OPENCLAW_LIVE_ANTHROPIC_KEYS=\"sk-...,sk-...\"` (or `OPENCLAW_LIVE_ANTHROPIC_KEY=sk-...`) or multiple `ANTHROPIC_API_KEY*` vars; tests will retry on rate limits\n\n## Which suite should I run?\n\nUse this decision table:\n\n* Editing logic/tests: run `pnpm test` (and `pnpm test:coverage` if you changed a lot)\n* Touching gateway networking / WS protocol / pairing: add `pnpm test:e2e`\n* Debugging “my bot is down” / provider-specific failures / tool calling: run a narrowed `pnpm test:live`\n\n## Live: model smoke (profile keys)\n\nLive tests are split into two layers so we can isolate failures:\n\n* “Direct model” tells us the provider/model can answer at all with the given key.\n* “Gateway smoke” tells us the full gateway+agent pipeline works for that model (sessions, history, tools, sandbox policy, etc.).\n\n### Layer 1: Direct model completion (no gateway)\n\n* Test: `src/agents/models.profiles.live.test.ts`\n* Goal:\n * Enumerate discovered models\n * Use `getApiKeyForModel` to select models you have creds for\n * Run a small completion per model (and targeted regressions where needed)\n* How to enable:\n * `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)\n* Set `OPENCLAW_LIVE_MODELS=modern` (or `all`, alias for modern) to actually run this suite; otherwise it skips to keep `pnpm test:live` focused on gateway smoke\n* How to select models:\n * `OPENCLAW_LIVE_MODELS=modern` to run the modern allowlist (Opus/Sonnet/Haiku 4.5, GPT-5.x + Codex, Gemini 3, GLM 4.7, MiniMax M2.1, Grok 4)\n * `OPENCLAW_LIVE_MODELS=all` is an alias for the modern allowlist\n * or `OPENCLAW_LIVE_MODELS=\"openai/gpt-5.2,anthropic/claude-opus-4-5,...\"` (comma allowlist)\n* How to select providers:\n * `OPENCLAW_LIVE_PROVIDERS=\"google,google-antigravity,google-gemini-cli\"` (comma allowlist)\n* Where keys come from:\n * By default: profile store and env fallbacks\n * Set `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to enforce **profile store** only\n* Why this exists:\n * Separates “provider API is broken / key is invalid” from “gateway agent pipeline is broken”\n * Contains small, isolated regressions (example: OpenAI Responses/Codex Responses reasoning replay + tool-call flows)\n\n### Layer 2: Gateway + dev agent smoke (what “@openclaw” actually does)\n\n* Test: `src/gateway/gateway-models.profiles.live.test.ts`\n* Goal:\n * Spin up an in-process gateway\n * Create/patch a `agent:dev:*` session (model override per run)\n * Iterate models-with-keys and assert:\n * “meaningful” response (no tools)\n * a real tool invocation works (read probe)\n * optional extra tool probes (exec+read probe)\n * OpenAI regression paths (tool-call-only → follow-up) keep working\n* Probe details (so you can explain failures quickly):\n * `read` probe: the test writes a nonce file in the workspace and asks the agent to `read` it and echo the nonce back.\n * `exec+read` probe: the test asks the agent to `exec`-write a nonce into a temp file, then `read` it back.\n * image probe: the test attaches a generated PNG (cat + randomized code) and expects the model to return `cat `.\n * Implementation reference: `src/gateway/gateway-models.profiles.live.test.ts` and `src/gateway/live-image-probe.ts`.\n* How to enable:\n * `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)\n* How to select models:\n * Default: modern allowlist (Opus/Sonnet/Haiku 4.5, GPT-5.x + Codex, Gemini 3, GLM 4.7, MiniMax M2.1, Grok 4)\n * `OPENCLAW_LIVE_GATEWAY_MODELS=all` is an alias for the modern allowlist\n * Or set `OPENCLAW_LIVE_GATEWAY_MODELS=\"provider/model\"` (or comma list) to narrow\n* How to select providers (avoid “OpenRouter everything”):\n * `OPENCLAW_LIVE_GATEWAY_PROVIDERS=\"google,google-antigravity,google-gemini-cli,openai,anthropic,zai,minimax\"` (comma allowlist)\n* Tool + image probes are always on in this live test:\n * `read` probe + `exec+read` probe (tool stress)\n * image probe runs when the model advertises image input support\n * Flow (high level):\n * Test generates a tiny PNG with “CAT” + random code (`src/gateway/live-image-probe.ts`)\n * Sends it via `agent` `attachments: [{ mimeType: \"image/png\", content: \"\" }]`\n * Gateway parses attachments into `images[]` (`src/gateway/server-methods/agent.ts` + `src/gateway/chat-attachments.ts`)\n * Embedded agent forwards a multimodal user message to the model\n * Assertion: reply contains `cat` + the code (OCR tolerance: minor mistakes allowed)\n\nTip: to see what you can test on your machine (and the exact `provider/model` ids), run:\n\n## Live: Anthropic setup-token smoke\n\n* Test: `src/agents/anthropic.setup-token.live.test.ts`\n* Goal: verify Claude Code CLI setup-token (or a pasted setup-token profile) can complete an Anthropic prompt.\n* Enable:\n * `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)\n * `OPENCLAW_LIVE_SETUP_TOKEN=1`\n* Token sources (pick one):\n * Profile: `OPENCLAW_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test`\n * Raw token: `OPENCLAW_LIVE_SETUP_TOKEN_VALUE=sk-ant-oat01-...`\n* Model override (optional):\n * `OPENCLAW_LIVE_SETUP_TOKEN_MODEL=anthropic/claude-opus-4-5`\n\n## Live: CLI backend smoke (Claude Code CLI or other local CLIs)\n\n* Test: `src/gateway/gateway-cli-backend.live.test.ts`\n* Goal: validate the Gateway + agent pipeline using a local CLI backend, without touching your default config.\n* Enable:\n * `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)\n * `OPENCLAW_LIVE_CLI_BACKEND=1`\n* Defaults:\n * Model: `claude-cli/claude-sonnet-4-5`\n * Command: `claude`\n * Args: `[\"-p\",\"--output-format\",\"json\",\"--dangerously-skip-permissions\"]`\n* Overrides (optional):\n * `OPENCLAW_LIVE_CLI_BACKEND_MODEL=\"claude-cli/claude-opus-4-5\"`\n * `OPENCLAW_LIVE_CLI_BACKEND_MODEL=\"codex-cli/gpt-5.2-codex\"`\n * `OPENCLAW_LIVE_CLI_BACKEND_COMMAND=\"/full/path/to/claude\"`\n * `OPENCLAW_LIVE_CLI_BACKEND_ARGS='[\"-p\",\"--output-format\",\"json\",\"--permission-mode\",\"bypassPermissions\"]'`\n * `OPENCLAW_LIVE_CLI_BACKEND_CLEAR_ENV='[\"ANTHROPIC_API_KEY\",\"ANTHROPIC_API_KEY_OLD\"]'`\n * `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_PROBE=1` to send a real image attachment (paths are injected into the prompt).\n * `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_ARG=\"--image\"` to pass image file paths as CLI args instead of prompt injection.\n * `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_MODE=\"repeat\"` (or `\"list\"`) to control how image args are passed when `IMAGE_ARG` is set.\n * `OPENCLAW_LIVE_CLI_BACKEND_RESUME_PROBE=1` to send a second turn and validate resume flow.\n* `OPENCLAW_LIVE_CLI_BACKEND_DISABLE_MCP_CONFIG=0` to keep Claude Code CLI MCP config enabled (default disables MCP config with a temporary empty file).\n\n### Recommended live recipes\n\nNarrow, explicit allowlists are fastest and least flaky:\n\n* Single model, direct (no gateway):\n * `OPENCLAW_LIVE_MODELS=\"openai/gpt-5.2\" pnpm test:live src/agents/models.profiles.live.test.ts`\n\n* Single model, gateway smoke:\n * `OPENCLAW_LIVE_GATEWAY_MODELS=\"openai/gpt-5.2\" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`\n\n* Tool calling across several providers:\n * `OPENCLAW_LIVE_GATEWAY_MODELS=\"openai/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-3-flash-preview,zai/glm-4.7,minimax/minimax-m2.1\" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`\n\n* Google focus (Gemini API key + Antigravity):\n * Gemini (API key): `OPENCLAW_LIVE_GATEWAY_MODELS=\"google/gemini-3-flash-preview\" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`\n * Antigravity (OAuth): `OPENCLAW_LIVE_GATEWAY_MODELS=\"google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-pro-high\" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`\n\n* `google/...` uses the Gemini API (API key).\n* `google-antigravity/...` uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).\n* `google-gemini-cli/...` uses the local Gemini CLI on your machine (separate auth + tooling quirks).\n* Gemini API vs Gemini CLI:\n * API: OpenClaw calls Google’s hosted Gemini API over HTTP (API key / profile auth); this is what most users mean by “Gemini”.\n * CLI: OpenClaw shells out to a local `gemini` binary; it has its own auth and can behave differently (streaming/tool support/version skew).\n\n## Live: model matrix (what we cover)\n\nThere is no fixed “CI model list” (live is opt-in), but these are the **recommended** models to cover regularly on a dev machine with keys.\n\n### Modern smoke set (tool calling + image)\n\nThis is the “common models” run we expect to keep working:\n\n* OpenAI (non-Codex): `openai/gpt-5.2` (optional: `openai/gpt-5.1`)\n* OpenAI Codex: `openai-codex/gpt-5.2` (optional: `openai-codex/gpt-5.2-codex`)\n* Anthropic: `anthropic/claude-opus-4-5` (or `anthropic/claude-sonnet-4-5`)\n* Google (Gemini API): `google/gemini-3-pro-preview` and `google/gemini-3-flash-preview` (avoid older Gemini 2.x models)\n* Google (Antigravity): `google-antigravity/claude-opus-4-5-thinking` and `google-antigravity/gemini-3-flash`\n* Z.AI (GLM): `zai/glm-4.7`\n* MiniMax: `minimax/minimax-m2.1`\n\nRun gateway smoke with tools + image:\n`OPENCLAW_LIVE_GATEWAY_MODELS=\"openai/gpt-5.2,openai-codex/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-3-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-flash,zai/glm-4.7,minimax/minimax-m2.1\" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`\n\n### Baseline: tool calling (Read + optional Exec)\n\nPick at least one per provider family:\n\n* OpenAI: `openai/gpt-5.2` (or `openai/gpt-5-mini`)\n* Anthropic: `anthropic/claude-opus-4-5` (or `anthropic/claude-sonnet-4-5`)\n* Google: `google/gemini-3-flash-preview` (or `google/gemini-3-pro-preview`)\n* Z.AI (GLM): `zai/glm-4.7`\n* MiniMax: `minimax/minimax-m2.1`\n\nOptional additional coverage (nice to have):\n\n* xAI: `xai/grok-4` (or latest available)\n* Mistral: `mistral/`… (pick one “tools” capable model you have enabled)\n* Cerebras: `cerebras/`… (if you have access)\n* LM Studio: `lmstudio/`… (local; tool calling depends on API mode)\n\n### Vision: image send (attachment → multimodal message)\n\nInclude at least one image-capable model in `OPENCLAW_LIVE_GATEWAY_MODELS` (Claude/Gemini/OpenAI vision-capable variants, etc.) to exercise the image probe.\n\n### Aggregators / alternate gateways\n\nIf you have keys enabled, we also support testing via:\n\n* OpenRouter: `openrouter/...` (hundreds of models; use `openclaw models scan` to find tool+image capable candidates)\n* OpenCode Zen: `opencode/...` (auth via `OPENCODE_API_KEY` / `OPENCODE_ZEN_API_KEY`)\n\nMore providers you can include in the live matrix (if you have creds/config):\n\n* Built-in: `openai`, `openai-codex`, `anthropic`, `google`, `google-vertex`, `google-antigravity`, `google-gemini-cli`, `zai`, `openrouter`, `opencode`, `xai`, `groq`, `cerebras`, `mistral`, `github-copilot`\n* Via `models.providers` (custom endpoints): `minimax` (cloud/API), plus any OpenAI/Anthropic-compatible proxy (LM Studio, vLLM, LiteLLM, etc.)\n\nTip: don’t try to hardcode “all models” in docs. The authoritative list is whatever `discoverModels(...)` returns on your machine + whatever keys are available.\n\n## Credentials (never commit)\n\nLive tests discover credentials the same way the CLI does. Practical implications:\n\n* If the CLI works, live tests should find the same keys.\n\n* If a live test says “no creds”, debug the same way you’d debug `openclaw models list` / model selection.\n\n* Profile store: `~/.openclaw/credentials/` (preferred; what “profile keys” means in the tests)\n\n* Config: `~/.openclaw/openclaw.json` (or `OPENCLAW_CONFIG_PATH`)\n\nIf you want to rely on env keys (e.g. exported in your `~/.profile`), run local tests after `source ~/.profile`, or use the Docker runners below (they can mount `~/.profile` into the container).\n\n## Deepgram live (audio transcription)\n\n* Test: `src/media-understanding/providers/deepgram/audio.live.test.ts`\n* Enable: `DEEPGRAM_API_KEY=... DEEPGRAM_LIVE_TEST=1 pnpm test:live src/media-understanding/providers/deepgram/audio.live.test.ts`\n\n## Docker runners (optional “works in Linux” checks)\n\nThese run `pnpm test:live` inside the repo Docker image, mounting your local config dir and workspace (and sourcing `~/.profile` if mounted):\n\n* Direct models: `pnpm test:docker:live-models` (script: `scripts/test-live-models-docker.sh`)\n* Gateway + dev agent: `pnpm test:docker:live-gateway` (script: `scripts/test-live-gateway-models-docker.sh`)\n* Onboarding wizard (TTY, full scaffolding): `pnpm test:docker:onboard` (script: `scripts/e2e/onboard-docker.sh`)\n* Gateway networking (two containers, WS auth + health): `pnpm test:docker:gateway-network` (script: `scripts/e2e/gateway-network-docker.sh`)\n* Plugins (custom extension load + registry smoke): `pnpm test:docker:plugins` (script: `scripts/e2e/plugins-docker.sh`)\n\n* `OPENCLAW_CONFIG_DIR=...` (default: `~/.openclaw`) mounted to `/home/node/.openclaw`\n* `OPENCLAW_WORKSPACE_DIR=...` (default: `~/.openclaw/workspace`) mounted to `/home/node/.openclaw/workspace`\n* `OPENCLAW_PROFILE_FILE=...` (default: `~/.profile`) mounted to `/home/node/.profile` and sourced before running tests\n* `OPENCLAW_LIVE_GATEWAY_MODELS=...` / `OPENCLAW_LIVE_MODELS=...` to narrow the run\n* `OPENCLAW_LIVE_REQUIRE_PROFILE_KEYS=1` to ensure creds come from the profile store (not env)\n\nRun docs checks after doc edits: `pnpm docs:list`.\n\n## Offline regression (CI-safe)\n\nThese are “real pipeline” regressions without real providers:\n\n* Gateway tool calling (mock OpenAI, real gateway + agent loop): `src/gateway/gateway.tool-calling.mock-openai.test.ts`\n* Gateway wizard (WS `wizard.start`/`wizard.next`, writes config + auth enforced): `src/gateway/gateway.wizard.e2e.test.ts`\n\n## Agent reliability evals (skills)\n\nWe already have a few CI-safe tests that behave like “agent reliability evals”:\n\n* Mock tool-calling through the real gateway + agent loop (`src/gateway/gateway.tool-calling.mock-openai.test.ts`).\n* End-to-end wizard flows that validate session wiring and config effects (`src/gateway/gateway.wizard.e2e.test.ts`).\n\nWhat’s still missing for skills (see [Skills](/tools/skills)):\n\n* **Decisioning:** when skills are listed in the prompt, does the agent pick the right skill (or avoid irrelevant ones)?\n* **Compliance:** does the agent read `SKILL.md` before use and follow required steps/args?\n* **Workflow contracts:** multi-turn scenarios that assert tool order, session history carryover, and sandbox boundaries.\n\nFuture evals should stay deterministic first:\n\n* A scenario runner using mock providers to assert tool calls + order, skill file reads, and session wiring.\n* A small suite of skill-focused scenarios (use vs avoid, gating, prompt injection).\n* Optional live evals (opt-in, env-gated) only after the CI-safe suite is in place.\n\n## Adding regressions (guidance)\n\nWhen you fix a provider/model issue discovered in live:\n\n* Add a CI-safe regression if possible (mock/stub provider, or capture the exact request-shape transformation)\n* If it’s inherently live-only (rate limits, auth policies), keep the live test narrow and opt-in via env vars\n* Prefer targeting the smallest layer that catches the bug:\n * provider request conversion/replay bug → direct models test\n * gateway session/history/tool pipeline bug → gateway live smoke or CI-safe gateway mock test", "code_samples": [ { "code": "## Live: Anthropic setup-token smoke\n\n* Test: `src/agents/anthropic.setup-token.live.test.ts`\n* Goal: verify Claude Code CLI setup-token (or a pasted setup-token profile) can complete an Anthropic prompt.\n* Enable:\n * `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)\n * `OPENCLAW_LIVE_SETUP_TOKEN=1`\n* Token sources (pick one):\n * Profile: `OPENCLAW_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test`\n * Raw token: `OPENCLAW_LIVE_SETUP_TOKEN_VALUE=sk-ant-oat01-...`\n* Model override (optional):\n * `OPENCLAW_LIVE_SETUP_TOKEN_MODEL=anthropic/claude-opus-4-5`\n\nSetup example:", "language": "unknown" }, { "code": "## Live: CLI backend smoke (Claude Code CLI or other local CLIs)\n\n* Test: `src/gateway/gateway-cli-backend.live.test.ts`\n* Goal: validate the Gateway + agent pipeline using a local CLI backend, without touching your default config.\n* Enable:\n * `pnpm test:live` (or `OPENCLAW_LIVE_TEST=1` if invoking Vitest directly)\n * `OPENCLAW_LIVE_CLI_BACKEND=1`\n* Defaults:\n * Model: `claude-cli/claude-sonnet-4-5`\n * Command: `claude`\n * Args: `[\"-p\",\"--output-format\",\"json\",\"--dangerously-skip-permissions\"]`\n* Overrides (optional):\n * `OPENCLAW_LIVE_CLI_BACKEND_MODEL=\"claude-cli/claude-opus-4-5\"`\n * `OPENCLAW_LIVE_CLI_BACKEND_MODEL=\"codex-cli/gpt-5.2-codex\"`\n * `OPENCLAW_LIVE_CLI_BACKEND_COMMAND=\"/full/path/to/claude\"`\n * `OPENCLAW_LIVE_CLI_BACKEND_ARGS='[\"-p\",\"--output-format\",\"json\",\"--permission-mode\",\"bypassPermissions\"]'`\n * `OPENCLAW_LIVE_CLI_BACKEND_CLEAR_ENV='[\"ANTHROPIC_API_KEY\",\"ANTHROPIC_API_KEY_OLD\"]'`\n * `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_PROBE=1` to send a real image attachment (paths are injected into the prompt).\n * `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_ARG=\"--image\"` to pass image file paths as CLI args instead of prompt injection.\n * `OPENCLAW_LIVE_CLI_BACKEND_IMAGE_MODE=\"repeat\"` (or `\"list\"`) to control how image args are passed when `IMAGE_ARG` is set.\n * `OPENCLAW_LIVE_CLI_BACKEND_RESUME_PROBE=1` to send a second turn and validate resume flow.\n* `OPENCLAW_LIVE_CLI_BACKEND_DISABLE_MCP_CONFIG=0` to keep Claude Code CLI MCP config enabled (default disables MCP config with a temporary empty file).\n\nExample:", "language": "unknown" } ], "headings": [ { "level": "h2", "text": "Quick start", "id": "quick-start" }, { "level": "h2", "text": "Test suites (what runs where)", "id": "test-suites-(what-runs-where)" }, { "level": "h3", "text": "Unit / integration (default)", "id": "unit-/-integration-(default)" }, { "level": "h3", "text": "E2E (gateway smoke)", "id": "e2e-(gateway-smoke)" }, { "level": "h3", "text": "Live (real providers + real models)", "id": "live-(real-providers-+-real-models)" }, { "level": "h2", "text": "Which suite should I run?", "id": "which-suite-should-i-run?" }, { "level": "h2", "text": "Live: model smoke (profile keys)", "id": "live:-model-smoke-(profile-keys)" }, { "level": "h3", "text": "Layer 1: Direct model completion (no gateway)", "id": "layer-1:-direct-model-completion-(no-gateway)" }, { "level": "h3", "text": "Layer 2: Gateway + dev agent smoke (what “@openclaw” actually does)", "id": "layer-2:-gateway-+-dev-agent-smoke-(what-“@openclaw”-actually-does)" }, { "level": "h2", "text": "Live: Anthropic setup-token smoke", "id": "live:-anthropic-setup-token-smoke" }, { "level": "h2", "text": "Live: CLI backend smoke (Claude Code CLI or other local CLIs)", "id": "live:-cli-backend-smoke-(claude-code-cli-or-other-local-clis)" }, { "level": "h3", "text": "Recommended live recipes", "id": "recommended-live-recipes" }, { "level": "h2", "text": "Live: model matrix (what we cover)", "id": "live:-model-matrix-(what-we-cover)" }, { "level": "h3", "text": "Modern smoke set (tool calling + image)", "id": "modern-smoke-set-(tool-calling-+-image)" }, { "level": "h3", "text": "Baseline: tool calling (Read + optional Exec)", "id": "baseline:-tool-calling-(read-+-optional-exec)" }, { "level": "h3", "text": "Vision: image send (attachment → multimodal message)", "id": "vision:-image-send-(attachment-→-multimodal-message)" }, { "level": "h3", "text": "Aggregators / alternate gateways", "id": "aggregators-/-alternate-gateways" }, { "level": "h2", "text": "Credentials (never commit)", "id": "credentials-(never-commit)" }, { "level": "h2", "text": "Deepgram live (audio transcription)", "id": "deepgram-live-(audio-transcription)" }, { "level": "h2", "text": "Docker runners (optional “works in Linux” checks)", "id": "docker-runners-(optional-“works-in-linux”-checks)" }, { "level": "h2", "text": "Docs sanity", "id": "docs-sanity" }, { "level": "h2", "text": "Offline regression (CI-safe)", "id": "offline-regression-(ci-safe)" }, { "level": "h2", "text": "Agent reliability evals (skills)", "id": "agent-reliability-evals-(skills)" }, { "level": "h2", "text": "Adding regressions (guidance)", "id": "adding-regressions-(guidance)" } ], "url": "llms-txt#testing", "links": [] }