← 返回
AI智能 中文

semantic-model-router

Smart LLM Router — routes every query to the cheapest capable model. Supports 17 models across Anthropic, OpenAI, Google, DeepSeek & xAI (Grok). Uses a pre-t...
智能LLM路由器——将每个查询路由至成本最低的可用模型。支持Anthropic、OpenAI、Google、DeepSeek及xAI (Grok)的17种模型。
rayray1218
AI智能 clawhub v1.0.3 1 版本 100000 Key: 无需
★ 0
Stars
📥 688
下载
💾 30
安装
1
版本
#latest

概述

Semantic Model Router

Smart LLM router that saves up to 99% on inference costs by routing each request to the cheapest model that can handle it. Powered by a pre-trained ML classifier and semantic embeddings — no external calls, no API keys needed.

Install

openclaw plugins install @rayray1218/semantic-model-router

Quick Start

from scripts.model_router import ModelRouter

router = ModelRouter()
res = router.route("Design a distributed caching layer for a fintech platform.")
print(res["report"])
# [ClawRouter] anthropic/claude-sonnet-4-6 (ELITE, ml, conf=0.97)
#              Cost: $3.0/M | Baseline: $10.0/M | Saved: 70.0%

How Routing Works

Queries are classified into three tiers through a 3-stage pipeline:

  1. ML Classifier (primary): A Logistic Regression model trained on 6,000+ labeled queries. Runs in <1ms from embedded weights in model_weights.py.
  2. Semantic Embeddings (fallback): Cosine similarity to tier intent vectors via sentence-transformers.
  3. Keyword Rules (last resort): Pattern matching with no dependencies.
TierDefault ModelTypical WorkloadCost/1Mvs Baseline
---------------
BASICdeepseek/deepseek-chatGreetings, simple Q&A, chit-chat$0.1499% saved
BALANCEDopenai/gpt-4o-miniSummaries, translations, explanations$0.1599% saved
ELITEanthropic/claude-sonnet-4-6Complex coding, architecture, security$3.0070% saved

Supported Models (17 total, verified Feb 2026)

Anthropic

ModelInput /1MOutput /1M
---------
anthropic/claude-sonnet-4-6$3.00$15.00 ★ ELITE default
anthropic/claude-opus-4-5$5.00$25.00
anthropic/claude-haiku-4-5$0.80$4.00

OpenAI

ModelInput /1MOutput /1M
---------
openai/gpt-5$1.25$10.00
openai/gpt-4o$2.50$10.00
openai/gpt-4o-mini$0.15$0.60 ★ BALANCED default
openai/o3$2.00$8.00
openai/o4-mini$1.10$4.40

Google

ModelInput /1MOutput /1M
---------
google/gemini-3.0-pro$1.25$10.00
google/gemini-2.5-pro$1.25$10.00
google/gemini-2.5-flash$0.30$2.50
google/gemini-2.5-flash-lite$0.10$0.40

DeepSeek

ModelInput /1MOutput /1M
---------
deepseek/deepseek-chat (V3.2)$0.28$0.42 ★ BASIC default
deepseek/deepseek-reasoner (V3.2)$0.28$0.42

xAI (Grok)

ModelInput /1MOutput /1M
---------
xai/grok-3$3.00$15.00
xai/grok-3-mini$0.30$0.50

> Pricing source: Official API docs of each provider, verified Feb 2026.

Override Models at Runtime

# Use GPT-5.2 for ELITE, Gemini Flash Lite for BASIC
router = ModelRouter(
    elite_model="openai/gpt-5.2",
    balanced_model="google/gemini-2.5-flash",
    basic_model="google/gemini-2.5-flash-lite",
)
# Swap a tier's model without recreating the router
router.set_model("ELITE", "anthropic/claude-opus-4-5")

List All Available Models (CLI)

python3 scripts/model_router.py --list-models

CLI Usage

# Route a single query
python3 scripts/model_router.py "Implement AES encryption from scratch"

# Override ELITE model
python3 scripts/model_router.py --elite openai/gpt-5.2 "Write a compiler"

# Run full smoke-test
python3 scripts/model_router.py

Dynamic Keyword Expansion

router.add_keywords("ELITE", ["cryptographic proof", "zero-knowledge"])

Example Output

Query                                              Predicted  Expected   ✓  Cost Info
────────────────────────────────────────────────────────────────────────────────────
How are you doing today?                           BASIC      BASIC      ✓  $0.14/M  saved 98.6%
Summarize this article in three bullet points.     BALANCED   BALANCED   ✓  $0.15/M  saved 98.5%
Implement a thread-safe LRU cache in Python.       ELITE      ELITE      ✓  $3.0/M   saved 70.0%

Security & Privacy

  • Zero external calls: All classification runs locally.
  • No API keys: The router itself needs none.
  • Transparent weights: All model parameters live in scripts/model_weights.py — fully auditable.

Save costs, route smarter. Built for the OpenClaw community.

版本历史

共 1 个版本

  • v1.0.3 当前
    2026-03-29 21:09 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 709 📥 243,449
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,055 📥 795,189
data-analysis

Model-Selector

rayray1218
强大的模型路由技能,分析查询意图与成本效益,在执行前选取最优 LLM(精英/均衡/基础)。
★ 0 📥 641