add 6 skills to repo + update skill-review for xiaoming

- Add code-interpreter, kokoro-tts, remotion-best-practices,
  research-to-paper-slides, summarize, tavily-tool to source repo
- skill-review: add main/xiaoming agent mapping in handler.ts + SKILL.md
- tts-voice: handler.ts updates from agent workspace

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-13 22:59:31 +08:00
parent da6e932d51
commit f1a6df4ca4
24 changed files with 1690 additions and 0 deletions

View File

@@ -0,0 +1,150 @@
---
name: code-interpreter
description: Local Python code execution for calculations, tabular data inspection, CSV/JSON processing, simple plotting, text transformation, quick experiments, and reproducible analysis inside the OpenClaw workspace. Use when the user wants ChatGPT-style code interpreter behavior locally: run Python, analyze files, compute exact answers, transform data, inspect tables, or generate output files/artifacts. Prefer this for low-risk local analysis; do not use it for untrusted code, secrets handling, privileged actions, or network-dependent tasks.
---
# Code Interpreter
Run local Python code through the bundled runner.
## Safety boundary
This is **local execution**, not a hardened container. Treat it as a convenience tool for trusted, low-risk tasks.
Always:
- Keep work inside the OpenClaw workspace when possible.
- Prefer reading/writing files under the current task directory or an explicit artifact directory.
- Keep timeouts short by default.
- Avoid network access unless the user explicitly asks and the task truly needs it.
- Do not execute untrusted code copied from the web or other people.
- Do not expose secrets, tokens, SSH keys, browser cookies, or system files to the script.
Do not use this skill for:
- system administration
- package installation loops
- long-running servers
- privileged operations
- destructive file changes outside the workspace
- executing arbitrary third-party code verbatim
## Runner
Run from the OpenClaw workspace:
```bash
python3 {baseDir}/scripts/run_code.py --code 'print(2 + 2)'
```
Or pass a script file:
```bash
python3 {baseDir}/scripts/run_code.py --file path/to/script.py
```
Or pipe code via stdin:
```bash
cat my_script.py | python3 {baseDir}/scripts/run_code.py --stdin
```
## Useful options
```bash
# set timeout seconds (default 20)
python3 {baseDir}/scripts/run_code.py --code '...' --timeout 10
# run from a specific working directory inside workspace
python3 {baseDir}/scripts/run_code.py --file script.py --cwd /home/selig/.openclaw/workspace/project
# keep outputs in a known artifact directory inside workspace
python3 {baseDir}/scripts/run_code.py --file script.py --artifact-dir /home/selig/.openclaw/workspace/.tmp/my-analysis
# save full stdout / stderr
python3 {baseDir}/scripts/run_code.py --code '...' --stdout-file out.txt --stderr-file err.txt
```
## Built-in environment
The runner uses the dedicated interpreter at:
- `/home/selig/.openclaw/workspace/.venv-code-interpreter/bin/python` (use the venv path directly; do not resolve the symlink to system Python)
This keeps plotting/data-analysis dependencies stable without touching the system Python.
The runner exposes these variables to the script:
- `OPENCLAW_WORKSPACE`
- `CODE_INTERPRETER_RUN_DIR`
- `CODE_INTERPRETER_ARTIFACT_DIR`
It also writes a helper file in the run directory:
```python
from ci_helpers import save_text, save_json
```
Use those helpers to save artifacts into `CODE_INTERPRETER_ARTIFACT_DIR`.
## V4 automatic data analysis
For automatic profiling/report generation from a local data file, use:
- `scripts/analyze_data.py`
- Reference: `references/v4-usage.md`
This flow is ideal when the user wants a fast "analyze this CSV/JSON/Excel and give me a report + plots" result.
## Output
The runner prints compact JSON:
```json
{
"ok": true,
"exitCode": 0,
"timeout": false,
"runDir": "...",
"artifactDir": "...",
"packageStatus": {"pandas": true, "numpy": true, "matplotlib": false},
"artifacts": [{"path": "...", "bytes": 123}],
"stdout": "...",
"stderr": "..."
}
```
## Workflow
1. Decide whether the task is a good fit for local trusted execution.
2. Write the smallest script that solves the problem.
3. Use `--artifact-dir` when the user may want generated files preserved.
4. Run with a short timeout.
5. Inspect `stdout`, `stderr`, and `artifacts`.
6. If producing files, mention their exact paths in the reply.
## Patterns
### Exact calculation
Use a one-liner with `--code`.
### File analysis
Read input files from workspace, then write summaries/derived files back to `artifactDir`.
### Automatic report bundle
When the user wants a quick profiling pass, run `scripts/analyze_data.py` against the file and return the generated `summary.json`, `report.md`, `preview.csv`, and any PNG plots.
### Table inspection
Prefer pandas when available; otherwise fall back to csv/json stdlib.
### Plotting
If `matplotlib` is available, write PNG files to `artifactDir`. Use a forced CJK font strategy for Chinese charts. The bundled default is Google Noto Sans CJK TC under `assets/fonts/` when present, then system fallbacks. Apply the chosen font not only via rcParams but also directly to titles, axis labels, tick labels, and legend text through FontProperties. This avoids tofu/garbled Chinese and suppresses missing-glyph warnings reliably. If plotting is unavailable, continue with tabular/text output.
### Reusable logic
Write a small `.py` file in the current task area, run with `--file`, then keep it if it may be reused.
## Notes
- The runner launches `python3 -B` with a minimal environment.
- It creates an isolated temp run directory under `workspace/.tmp/code-interpreter-runs/`.
- `stdout` / `stderr` are truncated in the JSON preview if very large; save to files when needed.
- `MPLBACKEND=Agg` is set so headless plotting works when matplotlib is installed.
- If a task needs stronger isolation than this local runner provides, do not force it—use a real sandbox/container approach instead.

View File

@@ -0,0 +1,29 @@
# V4 Usage
## Purpose
Generate an automatic data analysis bundle from a local data file.
## Command
```bash
/home/selig/.openclaw/workspace/.venv-code-interpreter/bin/python \
/home/selig/.openclaw/workspace/skills/code-interpreter/scripts/analyze_data.py \
/path/to/input.csv \
--artifact-dir /home/selig/.openclaw/workspace/.tmp/my-analysis
```
## Outputs
- `summary.json` — machine-readable profile
- `report.md` — human-readable summary
- `preview.csv` — first 50 rows after parsing
- `*.png` — generated plots when matplotlib is available
## Supported inputs
- `.csv`
- `.tsv`
- `.json`
- `.xlsx`
- `.xls`

View File

@@ -0,0 +1,285 @@
#!/usr/bin/env python3
import argparse
import json
import math
import os
from pathlib import Path
try:
import pandas as pd
except ImportError:
raise SystemExit(
'pandas is required. Run with the code-interpreter venv:\n'
' ~/.openclaw/workspace/.venv-code-interpreter/bin/python analyze_data.py ...'
)
try:
import matplotlib
import matplotlib.pyplot as plt
HAS_MPL = True
except Exception:
HAS_MPL = False
ZH_FONT_CANDIDATES = [
'/home/selig/.openclaw/workspace/skills/code-interpreter/assets/fonts/NotoSansCJKtc-Regular.otf',
'/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf',
]
def configure_matplotlib_fonts() -> tuple[str | None, object | None]:
if not HAS_MPL:
return None, None
chosen = None
chosen_prop = None
for path in ZH_FONT_CANDIDATES:
if Path(path).exists():
try:
from matplotlib import font_manager
font_manager.fontManager.addfont(path)
font_prop = font_manager.FontProperties(fname=path)
font_name = font_prop.get_name()
matplotlib.rcParams['font.family'] = [font_name]
matplotlib.rcParams['axes.unicode_minus'] = False
chosen = font_name
chosen_prop = font_prop
break
except Exception:
continue
return chosen, chosen_prop
def apply_font(ax, font_prop) -> None:
if not font_prop:
return
title = ax.title
if title:
title.set_fontproperties(font_prop)
ax.xaxis.label.set_fontproperties(font_prop)
ax.yaxis.label.set_fontproperties(font_prop)
for label in ax.get_xticklabels():
label.set_fontproperties(font_prop)
for label in ax.get_yticklabels():
label.set_fontproperties(font_prop)
legend = ax.get_legend()
if legend:
for text in legend.get_texts():
text.set_fontproperties(font_prop)
legend.get_title().set_fontproperties(font_prop)
def detect_format(path: Path) -> str:
ext = path.suffix.lower()
if ext in {'.csv', '.tsv', '.txt'}:
return 'delimited'
if ext == '.json':
return 'json'
if ext in {'.xlsx', '.xls'}:
return 'excel'
raise SystemExit(f'Unsupported file type: {ext}')
def load_df(path: Path) -> pd.DataFrame:
fmt = detect_format(path)
if fmt == 'delimited':
sep = '\t' if path.suffix.lower() == '.tsv' else ','
return pd.read_csv(path, sep=sep)
if fmt == 'json':
try:
return pd.read_json(path)
except ValueError:
return pd.DataFrame(json.loads(path.read_text(encoding='utf-8')))
if fmt == 'excel':
return pd.read_excel(path)
raise SystemExit('Unsupported format')
def safe_name(s: str) -> str:
keep = []
for ch in s:
if ch.isalnum() or ch in ('-', '_'):
keep.append(ch)
elif ch in (' ', '/'):
keep.append('_')
out = ''.join(keep).strip('_')
return out[:80] or 'column'
def series_stats(s: pd.Series) -> dict:
non_null = s.dropna()
result = {
'dtype': str(s.dtype),
'nonNull': int(non_null.shape[0]),
'nulls': int(s.isna().sum()),
'unique': int(non_null.nunique()) if len(non_null) else 0,
}
if pd.api.types.is_numeric_dtype(s):
result.update({
'min': None if non_null.empty else float(non_null.min()),
'max': None if non_null.empty else float(non_null.max()),
'mean': None if non_null.empty else float(non_null.mean()),
'sum': None if non_null.empty else float(non_null.sum()),
})
else:
top = non_null.astype(str).value_counts().head(5)
result['topValues'] = [{
'value': str(idx),
'count': int(val),
} for idx, val in top.items()]
return result
def maybe_parse_dates(df: pd.DataFrame) -> tuple[pd.DataFrame, list[str]]:
parsed = []
out = df.copy()
for col in out.columns:
if out[col].dtype == 'object':
sample = out[col].dropna().astype(str).head(20)
if sample.empty:
continue
parsed_col = pd.to_datetime(out[col], errors='coerce')
success_ratio = float(parsed_col.notna().mean()) if len(out[col]) else 0.0
if success_ratio >= 0.6:
out[col] = parsed_col
parsed.append(str(col))
return out, parsed
def write_report(df: pd.DataFrame, summary: dict, out_dir: Path) -> Path:
lines = []
lines.append('# Data Analysis Report')
lines.append('')
lines.append(f"- Source: `{summary['source']}`")
lines.append(f"- Rows: **{summary['rows']}**")
lines.append(f"- Columns: **{summary['columns']}**")
lines.append(f"- Generated plots: **{len(summary['plots'])}**")
if summary['parsedDateColumns']:
lines.append(f"- Parsed date columns: {', '.join(summary['parsedDateColumns'])}")
lines.append('')
lines.append('## Columns')
lines.append('')
for name, meta in summary['columnProfiles'].items():
lines.append(f"### {name}")
lines.append(f"- dtype: `{meta['dtype']}`")
lines.append(f"- non-null: {meta['nonNull']}")
lines.append(f"- nulls: {meta['nulls']}")
lines.append(f"- unique: {meta['unique']}")
if 'mean' in meta:
lines.append(f"- min / max: {meta['min']} / {meta['max']}")
lines.append(f"- mean / sum: {meta['mean']} / {meta['sum']}")
elif meta.get('topValues'):
preview = ', '.join([f"{x['value']} ({x['count']})" for x in meta['topValues'][:5]])
lines.append(f"- top values: {preview}")
lines.append('')
report = out_dir / 'report.md'
report.write_text('\n'.join(lines).strip() + '\n', encoding='utf-8')
return report
def generate_plots(df: pd.DataFrame, out_dir: Path, font_prop=None) -> list[str]:
if not HAS_MPL:
return []
plots = []
numeric_cols = [c for c in df.columns if pd.api.types.is_numeric_dtype(df[c])]
date_cols = [c for c in df.columns if pd.api.types.is_datetime64_any_dtype(df[c])]
cat_cols = [c for c in df.columns if not pd.api.types.is_numeric_dtype(df[c]) and not pd.api.types.is_datetime64_any_dtype(df[c])]
if numeric_cols:
col = numeric_cols[0]
plt.figure(figsize=(7, 4))
bins = min(20, max(5, int(math.sqrt(max(1, df[col].dropna().shape[0])))))
df[col].dropna().hist(bins=bins)
plt.title(f'Histogram of {col}', fontproperties=font_prop)
plt.xlabel(str(col), fontproperties=font_prop)
plt.ylabel('Count', fontproperties=font_prop)
apply_font(plt.gca(), font_prop)
path = out_dir / f'hist_{safe_name(str(col))}.png'
plt.tight_layout()
plt.savefig(path, dpi=160)
plt.close()
plots.append(str(path))
if cat_cols and numeric_cols:
cat, num = cat_cols[0], numeric_cols[0]
grp = df.groupby(cat, dropna=False)[num].sum().sort_values(ascending=False).head(12)
if not grp.empty:
plt.figure(figsize=(8, 4.5))
grp.plot(kind='bar')
plt.title(f'{num} by {cat}', fontproperties=font_prop)
plt.xlabel(str(cat), fontproperties=font_prop)
plt.ylabel(f'Sum of {num}', fontproperties=font_prop)
apply_font(plt.gca(), font_prop)
plt.tight_layout()
path = out_dir / f'bar_{safe_name(str(num))}_by_{safe_name(str(cat))}.png'
plt.savefig(path, dpi=160)
plt.close()
plots.append(str(path))
if date_cols and numeric_cols:
date_col, num = date_cols[0], numeric_cols[0]
grp = df[[date_col, num]].dropna().sort_values(date_col)
if not grp.empty:
plt.figure(figsize=(8, 4.5))
plt.plot(grp[date_col], grp[num], marker='o')
plt.title(f'{num} over time', fontproperties=font_prop)
plt.xlabel(str(date_col), fontproperties=font_prop)
plt.ylabel(str(num), fontproperties=font_prop)
apply_font(plt.gca(), font_prop)
plt.tight_layout()
path = out_dir / f'line_{safe_name(str(num))}_over_time.png'
plt.savefig(path, dpi=160)
plt.close()
plots.append(str(path))
return plots
def main() -> int:
parser = argparse.ArgumentParser(description='Automatic data analysis report generator')
parser.add_argument('input', help='Input data file (csv/json/xlsx)')
parser.add_argument('--artifact-dir', required=True, help='Output artifact directory')
args = parser.parse_args()
input_path = Path(args.input).expanduser().resolve()
artifact_dir = Path(args.artifact_dir).expanduser().resolve()
artifact_dir.mkdir(parents=True, exist_ok=True)
df = load_df(input_path)
original_columns = [str(c) for c in df.columns]
df, parsed_dates = maybe_parse_dates(df)
chosen_font, chosen_font_prop = configure_matplotlib_fonts()
preview_path = artifact_dir / 'preview.csv'
df.head(50).to_csv(preview_path, index=False)
summary = {
'source': str(input_path),
'rows': int(df.shape[0]),
'columns': int(df.shape[1]),
'columnNames': original_columns,
'parsedDateColumns': parsed_dates,
'columnProfiles': {str(c): series_stats(df[c]) for c in df.columns},
'plots': [],
'plotFont': chosen_font,
}
summary['plots'] = generate_plots(df, artifact_dir, chosen_font_prop)
summary_path = artifact_dir / 'summary.json'
summary_path.write_text(json.dumps(summary, ensure_ascii=False, indent=2), encoding='utf-8')
report_path = write_report(df, summary, artifact_dir)
result = {
'ok': True,
'input': str(input_path),
'artifactDir': str(artifact_dir),
'summary': str(summary_path),
'report': str(report_path),
'preview': str(preview_path),
'plots': summary['plots'],
}
print(json.dumps(result, ensure_ascii=False, indent=2))
return 0
if __name__ == '__main__':
raise SystemExit(main())

View File

@@ -0,0 +1,241 @@
#!/usr/bin/env python3
import argparse
import importlib.util
import json
import os
import pathlib
import shutil
import subprocess
import sys
import tempfile
import time
from typing import Optional
WORKSPACE = pathlib.Path('/home/selig/.openclaw/workspace').resolve()
RUNS_DIR = WORKSPACE / '.tmp' / 'code-interpreter-runs'
MAX_PREVIEW = 12000
ARTIFACT_SCAN_LIMIT = 100
PACKAGE_PROBES = ['pandas', 'numpy', 'matplotlib']
PYTHON_BIN = str(WORKSPACE / '.venv-code-interpreter' / 'bin' / 'python')
def current_python_paths(run_dir_path: pathlib.Path) -> str:
"""Build PYTHONPATH: run_dir (for ci_helpers) only.
Venv site-packages are already on sys.path when using PYTHON_BIN."""
return str(run_dir_path)
def read_code(args: argparse.Namespace) -> str:
sources = [bool(args.code), bool(args.file), bool(args.stdin)]
if sum(sources) != 1:
raise SystemExit('Provide exactly one of --code, --file, or --stdin')
if args.code:
return args.code
if args.file:
return pathlib.Path(args.file).read_text(encoding='utf-8')
return sys.stdin.read()
def ensure_within_workspace(path_str: Optional[str], must_exist: bool = True) -> pathlib.Path:
if not path_str:
return WORKSPACE
p = pathlib.Path(path_str).expanduser().resolve()
if p != WORKSPACE and WORKSPACE not in p.parents:
raise SystemExit(f'Path must stay inside workspace: {WORKSPACE}')
if must_exist and (not p.exists() or not p.is_dir()):
raise SystemExit(f'Path not found or not a directory: {p}')
return p
def ensure_output_path(path_str: Optional[str]) -> Optional[pathlib.Path]:
if not path_str:
return None
p = pathlib.Path(path_str).expanduser().resolve()
p.parent.mkdir(parents=True, exist_ok=True)
return p
def write_text(path_str: Optional[str], text: str) -> None:
p = ensure_output_path(path_str)
if not p:
return
p.write_text(text, encoding='utf-8')
def truncate(text: str) -> str:
if len(text) <= MAX_PREVIEW:
return text
extra = len(text) - MAX_PREVIEW
return text[:MAX_PREVIEW] + f'\n...[truncated {extra} chars]'
def package_status() -> dict:
out: dict[str, bool] = {}
for name in PACKAGE_PROBES:
proc = subprocess.run(
[PYTHON_BIN, '-c', f"import importlib.util; print('1' if importlib.util.find_spec('{name}') else '0')"],
capture_output=True,
text=True,
encoding='utf-8',
errors='replace',
)
out[name] = proc.stdout.strip() == '1'
return out
def rel_to(path: pathlib.Path, base: pathlib.Path) -> str:
try:
return str(path.relative_to(base))
except Exception:
return str(path)
def scan_artifacts(base_dir: pathlib.Path, root_label: str) -> list[dict]:
if not base_dir.exists():
return []
items: list[dict] = []
for p in sorted(base_dir.rglob('*')):
if len(items) >= ARTIFACT_SCAN_LIMIT:
break
if p.is_file():
try:
size = p.stat().st_size
except Exception:
size = None
items.append({
'root': root_label,
'path': str(p),
'relative': rel_to(p, base_dir),
'bytes': size,
})
return items
def write_helper(run_dir_path: pathlib.Path, artifact_dir: pathlib.Path) -> None:
helper = run_dir_path / 'ci_helpers.py'
helper.write_text(
"""
from pathlib import Path
import json
import os
WORKSPACE = Path(os.environ['OPENCLAW_WORKSPACE'])
RUN_DIR = Path(os.environ['CODE_INTERPRETER_RUN_DIR'])
ARTIFACT_DIR = Path(os.environ['CODE_INTERPRETER_ARTIFACT_DIR'])
def save_text(name: str, text: str) -> str:
path = ARTIFACT_DIR / name
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(text, encoding='utf-8')
return str(path)
def save_json(name: str, data) -> str:
path = ARTIFACT_DIR / name
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(json.dumps(data, ensure_ascii=False, indent=2), encoding='utf-8')
return str(path)
""".lstrip(),
encoding='utf-8',
)
def main() -> int:
parser = argparse.ArgumentParser(description='Local Python runner for OpenClaw code-interpreter skill')
parser.add_argument('--code', help='Python code to execute')
parser.add_argument('--file', help='Path to a Python file to execute')
parser.add_argument('--stdin', action='store_true', help='Read Python code from stdin')
parser.add_argument('--cwd', help='Working directory inside workspace')
parser.add_argument('--artifact-dir', help='Artifact directory inside workspace to keep outputs')
parser.add_argument('--timeout', type=int, default=20, help='Timeout seconds (default: 20)')
parser.add_argument('--stdout-file', help='Optional file path to save full stdout')
parser.add_argument('--stderr-file', help='Optional file path to save full stderr')
parser.add_argument('--keep-run-dir', action='store_true', help='Keep generated temp run directory even on success')
args = parser.parse_args()
code = read_code(args)
cwd = ensure_within_workspace(args.cwd)
RUNS_DIR.mkdir(parents=True, exist_ok=True)
run_dir_path = pathlib.Path(tempfile.mkdtemp(prefix='run-', dir=str(RUNS_DIR))).resolve()
artifact_dir = ensure_within_workspace(args.artifact_dir, must_exist=False) if args.artifact_dir else (run_dir_path / 'artifacts')
artifact_dir.mkdir(parents=True, exist_ok=True)
script_path = run_dir_path / 'main.py'
script_path.write_text(code, encoding='utf-8')
write_helper(run_dir_path, artifact_dir)
env = {
'PATH': os.environ.get('PATH', '/usr/bin:/bin'),
'HOME': str(run_dir_path),
'PYTHONPATH': current_python_paths(run_dir_path),
'PYTHONIOENCODING': 'utf-8',
'PYTHONUNBUFFERED': '1',
'OPENCLAW_WORKSPACE': str(WORKSPACE),
'CODE_INTERPRETER_RUN_DIR': str(run_dir_path),
'CODE_INTERPRETER_ARTIFACT_DIR': str(artifact_dir),
'MPLBACKEND': 'Agg',
}
started = time.time()
timed_out = False
exit_code = None
stdout = ''
stderr = ''
try:
proc = subprocess.run(
[PYTHON_BIN, '-B', str(script_path)],
cwd=str(cwd),
env=env,
capture_output=True,
text=True,
encoding='utf-8',
errors='replace',
timeout=max(1, args.timeout),
)
exit_code = proc.returncode
stdout = proc.stdout
stderr = proc.stderr
except subprocess.TimeoutExpired as exc:
timed_out = True
exit_code = 124
raw_out = exc.stdout or ''
raw_err = exc.stderr or ''
stdout = raw_out if isinstance(raw_out, str) else raw_out.decode('utf-8', errors='replace')
stderr = (raw_err if isinstance(raw_err, str) else raw_err.decode('utf-8', errors='replace')) + f'\nExecution timed out after {args.timeout}s.'
duration = round(time.time() - started, 3)
write_text(args.stdout_file, stdout)
write_text(args.stderr_file, stderr)
artifacts = scan_artifacts(artifact_dir, 'artifactDir')
if artifact_dir != run_dir_path:
artifacts.extend(scan_artifacts(run_dir_path / 'artifacts', 'runArtifacts'))
result = {
'ok': (exit_code == 0 and not timed_out),
'exitCode': exit_code,
'timeout': timed_out,
'durationSec': duration,
'cwd': str(cwd),
'runDir': str(run_dir_path),
'artifactDir': str(artifact_dir),
'packageStatus': package_status(),
'artifacts': artifacts,
'stdout': truncate(stdout),
'stderr': truncate(stderr),
}
print(json.dumps(result, ensure_ascii=False, indent=2))
if not args.keep_run_dir and result['ok'] and artifact_dir != run_dir_path:
shutil.rmtree(run_dir_path, ignore_errors=True)
return 0 if result['ok'] else 1
if __name__ == '__main__':
raise SystemExit(main())