openclaw-skill/openclaw-knowhow-skill/output/openclaw-docs_data/pages/Local_Models_b538d1242d.json

{
  "title": "Local models",
  "content": "Local is doable, but OpenClaw expects large context + strong defenses against prompt injection. Small cards truncate context and leak safety. Aim high: **≥2 maxed-out Mac Studios or equivalent GPU rig (\\~\\$30k+)**. A single **24 GB** GPU works only for lighter prompts with higher latency. Use the **largest / full-size model variant you can run**; aggressively quantized or “small” checkpoints raise prompt-injection risk (see [Security](/gateway/security)).\n\n## Recommended: LM Studio + MiniMax M2.1 (Responses API, full-size)\n\nBest current local stack. Load MiniMax M2.1 in LM Studio, enable the local server (default `http://127.0.0.1:1234`), and use Responses API to keep reasoning separate from final text.\n\n* Install LM Studio: [https://lmstudio.ai](https://lmstudio.ai)\n* In LM Studio, download the **largest MiniMax M2.1 build available** (avoid “small”/heavily quantized variants), start the server, confirm `http://127.0.0.1:1234/v1/models` lists it.\n* Keep the model loaded; cold-load adds startup latency.\n* Adjust `contextWindow`/`maxTokens` if your LM Studio build differs.\n* For WhatsApp, stick to Responses API so only final text is sent.\n\nKeep hosted models configured even when running local; use `models.mode: \"merge\"` so fallbacks stay available.\n\n### Hybrid config: hosted primary, local fallback\n\n### Local-first with hosted safety net\n\nSwap the primary and fallback order; keep the same providers block and `models.mode: \"merge\"` so you can fall back to Sonnet or Opus when the local box is down.\n\n### Regional hosting / data routing\n\n* Hosted MiniMax/Kimi/GLM variants also exist on OpenRouter with region-pinned endpoints (e.g., US-hosted). Pick the regional variant there to keep traffic in your chosen jurisdiction while still using `models.mode: \"merge\"` for Anthropic/OpenAI fallbacks.\n* Local-only remains the strongest privacy path; hosted regional routing is the middle ground when you need provider features but want control over data flow.\n\n## Other OpenAI-compatible local proxies\n\nvLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style `/v1` endpoint. Replace the provider block above with your endpoint and model ID:\n\nKeep `models.mode: \"merge\"` so hosted models stay available as fallbacks.\n\n* Gateway can reach the proxy? `curl http://127.0.0.1:1234/v1/models`.\n* LM Studio model unloaded? Reload; cold start is a common “hanging” cause.\n* Context errors? Lower `contextWindow` or raise your server limit.\n* Safety: local models skip provider-side filters; keep agents narrow and compaction on to limit prompt injection blast radius.",
  "code_samples": [
    {
      "code": "**Setup checklist**\n\n* Install LM Studio: [https://lmstudio.ai](https://lmstudio.ai)\n* In LM Studio, download the **largest MiniMax M2.1 build available** (avoid “small”/heavily quantized variants), start the server, confirm `http://127.0.0.1:1234/v1/models` lists it.\n* Keep the model loaded; cold-load adds startup latency.\n* Adjust `contextWindow`/`maxTokens` if your LM Studio build differs.\n* For WhatsApp, stick to Responses API so only final text is sent.\n\nKeep hosted models configured even when running local; use `models.mode: \"merge\"` so fallbacks stay available.\n\n### Hybrid config: hosted primary, local fallback",
      "language": "unknown"
    },
    {
      "code": "### Local-first with hosted safety net\n\nSwap the primary and fallback order; keep the same providers block and `models.mode: \"merge\"` so you can fall back to Sonnet or Opus when the local box is down.\n\n### Regional hosting / data routing\n\n* Hosted MiniMax/Kimi/GLM variants also exist on OpenRouter with region-pinned endpoints (e.g., US-hosted). Pick the regional variant there to keep traffic in your chosen jurisdiction while still using `models.mode: \"merge\"` for Anthropic/OpenAI fallbacks.\n* Local-only remains the strongest privacy path; hosted regional routing is the middle ground when you need provider features but want control over data flow.\n\n## Other OpenAI-compatible local proxies\n\nvLLM, LiteLLM, OAI-proxy, or custom gateways work if they expose an OpenAI-style `/v1` endpoint. Replace the provider block above with your endpoint and model ID:",
      "language": "unknown"
    }
  ],
  "headings": [
    {
      "level": "h2",
      "text": "Recommended: LM Studio + MiniMax M2.1 (Responses API, full-size)",
      "id": "recommended:-lm-studio-+-minimax-m2.1-(responses-api,-full-size)"
    },
    {
      "level": "h3",
      "text": "Hybrid config: hosted primary, local fallback",
      "id": "hybrid-config:-hosted-primary,-local-fallback"
    },
    {
      "level": "h3",
      "text": "Local-first with hosted safety net",
      "id": "local-first-with-hosted-safety-net"
    },
    {
      "level": "h3",
      "text": "Regional hosting / data routing",
      "id": "regional-hosting-/-data-routing"
    },
    {
      "level": "h2",
      "text": "Other OpenAI-compatible local proxies",
      "id": "other-openai-compatible-local-proxies"
    },
    {
      "level": "h2",
      "text": "Troubleshooting",
      "id": "troubleshooting"
    }
  ],
  "url": "llms-txt#local-models",
  "links": []
}