← 返回
未分类 Key 中文

ExpertPack Eval

Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs c...
Measure ExpertPack EK (Esoteric Knowledge) ratio and run automated quality evals. Use when: (1) Measuring what percentage of a pack's content frontier LLMs c...
brianhearn
未分类 clawhub v1.1.0 1 版本 100000 Key: 需要
★ 1
Stars
📥 644
下载
💾 0
安装
1
版本
#latest

概述

ExpertPack Eval

Measure and evaluate ExpertPack quality. Companion to the core expertpack skill.

Note: This skill makes external API calls to OpenRouter for blind probing and LLM-as-judge scoring. Requires an API key.

1. Measure EK Ratio

Blind-probe frontier models to measure what percentage of a pack's propositions they cannot answer without the pack loaded:

python3 {skill_dir}/scripts/eval-ek.py <pack-path> [--models model1,model2] [--sample N] [--output FILE]
  • Default models: GPT-4.1-mini, Claude Sonnet 4.6, Gemini 2.0 Flash (via OpenRouter)
  • API key: Auto-resolves from OpenClaw auth profiles or OPENROUTER_API_KEY env var
  • Judge model: Claude Sonnet (GPT-4.1-mini is unreliable as judge — defaults to "partial")
  • Output: YAML with per-proposition scores and aggregate ratio

Interpretation:

EK RatioMeaning
-------------------
0.80+Exceptional — almost entirely esoteric
0.60–0.79Strong — majority esoteric
0.40–0.59Mixed — significant GK padding
0.20–0.39Weak — most content already in weights
< 0.20Minimal value-add

Add measured ratio to manifest.yaml:

ek_ratio:
  value: 0.72
  measured: "2026-03-12"
  models: ["gpt-4.1-mini", "claude-sonnet-4-6", "gemini-2.0-flash"]
  propositions_tested: 142

2. Run Quality Eval

Automated eval against a pack-powered agent endpoint:

python3 {skill_dir}/scripts/run-eval.py \
  --questions <eval-set.yaml> \
  --endpoint <ws://host:port/path> \
  --output <results.yaml> \
  --label "baseline"
  • Build eval set: 30+ questions (basic, intermediate, advanced, out-of-scope)
  • Fix one dimension at a time: structure → agent training → model
  • Re-run after each change to verify improvement

Learn more: expertpack.ai · GitHub

版本历史

共 1 个版本

  • v1.1.0 当前
    2026-05-01 21:28 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Elite To Expertpack

brianhearn
将 Elite Longterm Memory 数据转换为结构化的 ExpertPack,迁移 5 层记忆系统(SESSION-STATE 热 RAM、LanceDB 温存储、Git-Notes 列…)
★ 1 📥 618

Ontology To Expertpack

brianhearn
将本体技能知识图转换为结构化专家包,用于从本体技能的实体/关系图迁移。
★ 2 📥 626
developer-tools

Expertpack Export

brianhearn
导出OpenClaw实例积累的知识为结构化的ExpertPack,用于备份代理身份、迁移等。
★ 1 📥 635