← 返回
AI智能 中文

Model Router

Intelligent cost-aware model routing that classifies task complexity and selects the optimal AI model. Automatically routes simple tasks to cheap models and...
智能成本感知模型路由,通过划分任务复杂度选择最优AI模型。自动将简单任务路由至低成本模型...
jsgoecke
AI智能 clawhub v0.1.0 1 版本 99781.9 Key: 无需
★ 0
Stars
📥 915
下载
💾 4
安装
1
版本
#latest

概述

Smart Model Router

Intelligent cost-aware model routing for OpenClaw agents.

Before executing any task via sessions_spawn or delegating to a sub-agent, classify the task complexity using the rules below and route to the optimal model. This saves 60-90% on LLM costs by using cheap models for simple work and reserving premium models for tasks that genuinely need them.

Core Principle

Route every request to the cheapest model that can handle it well.

Step 1: Classify Task Complexity

Score the task on these dimensions. Count how many COMPLEX/REASONING indicators are present:

SIMPLE indicators (route to Tier 1)

  • Greetings, small talk, status checks, heartbeats
  • Single factual questions ("What is X?", "Define Y")
  • Simple translations, format conversions
  • File lookups, directory listings, basic shell commands
  • Calendar checks, weather queries
  • Tasks under 50 tokens with no technical depth
  • Keywords: "what is", "define", "translate", "list", "check", "hello", "status"

MODERATE indicators (route to Tier 2)

  • Summarization of documents or conversations
  • Single-file code edits, bug fixes, simple refactors
  • Writing emails, messages, short-form content
  • Data extraction, parsing, formatting
  • Explaining concepts, answering "how to" questions
  • Research requiring synthesis of a few sources
  • Keywords: "summarize", "explain", "write", "fix this", "how to", "extract"

COMPLEX indicators (route to Tier 3)

  • Multi-file code generation or refactoring
  • Architecture design, system design
  • Creative writing (stories, long-form, nuanced tone)
  • Debugging complex issues across multiple systems
  • Analysis requiring multiple perspectives
  • Tasks with constraints ("optimize for X while maintaining Y")
  • Keywords: "build", "design", "architect", "refactor", "create", "implement", "analyze"

REASONING indicators (route to Tier 4)

  • Mathematical proofs, formal logic
  • Multi-step reasoning chains ("first X, then Y, therefore Z")
  • Security vulnerability analysis
  • Performance optimization with tradeoffs
  • Scientific analysis, hypothesis testing
  • Any task with 2+ of: "prove", "derive", "why does", "compare and contrast", "evaluate tradeoffs", "step by step"
  • Keywords: "prove", "derive", "reason", "why does", "evaluate", "theorem"

Special Rules

  • 2+ reasoning keywords → always Tier 4 (high confidence)
  • Code blocks or multi-file references → minimum Tier 2
  • "Debug" + stack traces → Tier 3
  • Heartbeats and /status → always Tier 1
  • When uncertain, default to Tier 2 (fast, cheap, good enough)

Step 2: Select Model from Tier

Tier 0 — FREE (OpenRouter free tier)

ModelCostBest For
-----------------------
Gemini 2.5 Flash (free)$0.00High-volume simple tasks, translation
Gemini 2.5 Flash-Lite (free)$0.00Translation, marketing
Gemini 3 Flash Preview (free)$0.00Technology, health, science
DeepSeek V3.2 (free)$0.00Roleplay, creative writing
Moonshot Kimi K2.5 (free)$0.00Technology, programming
Arcee Trinity Large Preview (free)$0.00Creative writing, storytelling, agents

Default Tier 0 model: openrouter/free (auto-selects from available free models)

Access via OpenRouter with model IDs like google/gemini-2.5-flash, deepseek/deepseek-v3.2-20251201, moonshotai/kimi-k2.5-0127. Or use openrouter/free to auto-route across all free models.

Note: Free models have rate limits and may have variable availability. Use for non-critical tasks only.

Tier 1 — SIMPLE (near-zero cost)

ModelInput $/MTokOutput $/MTokBest For
--------------------------------------------
Gemini 2.0 Flash$0.10$0.40Default simple tier — fast, multimodal, 1M context
GPT-4o-mini$0.15$0.60Simple tasks, multimodal
GPT-5 Nano$0.05$0.40Cheapest OpenAI option
DeepSeek V3$0.27$1.10Budget general-purpose
Gemini 2.5 Flash-Lite$0.10$0.40Most economical Google model

Default Tier 1 model: gemini-2.0-flash (best cost/reliability balance)

Tier 2 — MODERATE (balanced)

ModelInput $/MTokOutput $/MTokBest For
--------------------------------------------
Claude Haiku 4.5$1.00$5.00Near-frontier, fast, great coding
GPT-4o$2.50$10.00Multimodal, tool use, solid all-rounder
Gemini 2.5 Flash$0.15$0.60Thinking-enabled, fast reasoning
GPT-5 Mini$0.25$2.00Balanced performance, 400K context
Mistral Medium 3$0.40$2.00European languages, balanced

Default Tier 2 model: claude-haiku-4-5 (best quality-to-price at this tier)

Tier 3 — COMPLEX (premium)

ModelInput $/MTokOutput $/MTokBest For
--------------------------------------------
Claude Sonnet 4.5$3.00$15.00Best coding-to-cost ratio, most popular
GPT-5$1.25$10.00Flagship coding and agentic tasks
GPT-5.3 Codex$1.75*$14.00*Most capable agentic coding model
Gemini 2.5 Pro$1.25$10.00Coding, reasoning, up to 2M context
Claude Opus 4.5$5.00$25.00Maximum intelligence, agentic tasks
Grok 4$3.00$15.00Frontier reasoning, real-time data

*GPT-5.3 Codex API pricing not yet officially released; estimated from GPT-5.2 Codex rates.

Default Tier 3 model: claude-sonnet-4-5 (best balance of quality, coding, and cost)

Tier 4 — REASONING (maximum capability)

ModelInput $/MTokOutput $/MTokBest For
--------------------------------------------
Claude Opus 4.6$5.00$25.00Latest frontier reasoning, extended thinking, 1M context (beta)
Claude Opus 4.5$5.00$25.00Extended thinking, frontier reasoning
o3$2.00$8.00Deep STEM reasoning
DeepSeek R1$0.55$2.19Budget reasoning (20-50x cheaper than o1)
o4-mini$1.10$4.40Efficient reasoning

Default Tier 4 model: claude-opus-4-6 with extended thinking enabled

Step 3: Apply Optimization Mode

🟢 Balanced Mode (DEFAULT)

Use the default model for each tier as listed above. Escalate to next tier if the model produces low-quality output or fails.

🔵 Aggressive Mode (Maximum Savings)

Override tier defaults to cheapest option:

  • Tier 0-1: openrouter/free ($0.00) for simple tasks, fall back to gemini-2.0-flash ($0.10/$0.40)
  • Tier 2: gemini-2.5-flash ($0.15/$0.60)
  • Tier 3: gemini-2.5-pro ($1.25/$10.00)
  • Tier 4: deepseek-r1 ($0.55/$2.19)

Savings: 70-99% vs always using Opus

🟡 Quality Mode (Maximum Quality)

Override tier defaults to best-in-class:

  • Tier 1: claude-haiku-4-5 ($1.00/$5.00)
  • Tier 2: claude-sonnet-4-5 ($3.00/$15.00)
  • Tier 3: claude-opus-4-6 ($5.00/$25.00) or gpt-5.3-codex for coding
  • Tier 4: claude-opus-4-6 ($5.00/$25.00) with extended thinking

Step 4: Execute with sessions_spawn

# Simple task — Tier 1
sessions_spawn --task "What's on my calendar today?" --model gemini-2.0-flash

# Moderate task — Tier 2
sessions_spawn --task "Summarize this document" --model claude-haiku-4-5

# Complex task — Tier 3
sessions_spawn --task "Build a React auth component with tests" --model claude-sonnet-4-5

# Reasoning task — Tier 4
sessions_spawn --task "Prove this algorithm is O(n log n)" --model claude-opus-4-6

Progressive Escalation Pattern

When uncertain about complexity, start cheap and escalate:

# 1. Try Tier 1 with timeout
sessions_spawn --task "Fix this bug" --model gemini-2.0-flash --runTimeoutSeconds 60

# 2. If output is poor or times out, escalate to Tier 2
sessions_spawn --task "Fix this bug" --model claude-haiku-4-5

# 3. If still failing, escalate to Tier 3
sessions_spawn --task "Fix this complex bug" --model claude-sonnet-4-5

Maximum escalation chain: 3 attempts. If Tier 3 fails, surface the error to the user rather than burning tokens.

Parallel Processing for Batch Tasks

Route batch/parallel tasks to Tier 1 models for massive savings:

# Batch summaries in parallel with cheap model
sessions_spawn --task "Summarize doc A" --model gemini-2.0-flash &
sessions_spawn --task "Summarize doc B" --model gemini-2.0-flash &
sessions_spawn --task "Summarize doc C" --model gemini-2.0-flash &
wait

# Then analyze results with premium model
sessions_spawn --task "Synthesize findings from all summaries" --model claude-sonnet-4-5

Special Routing Rules

ScenarioRoute ToWhy
-------------------------
Heartbeat / status checkTier 0 (openrouter/free) or Tier 1Zero intelligence needed, save every cent
Vision / image analysisgemini-2.5-proBest multimodal + huge context
Long context (>100K tokens)gemini-2.5-pro or gpt-51M-2M context windows
Chinese language tasksdeepseek-v3 or glm-4.7Optimized for Chinese
Real-time web data neededgrok-4.1-fastBuilt-in X/web search, 2M context
Agentic coding tasksgpt-5.3-codex or claude-sonnet-4-5Purpose-built for agentic code workflows
Code generationclaude-sonnet-4-5 minimumBest code quality per dollar
Math / formal proofso3 or claude-opus-4-6 with thinkingSpecialized reasoning

Cost Comparison (Typical Workload)

For a typical OpenClaw day (24 heartbeats + 20 sub-agent tasks + 10 user queries):

StrategyMonthly CostSavings
--------------------------------
All Opus 4.6~$200baseline
Smart routing (balanced)~$4578%
Smart routing (aggressive)~$1592%
Smart routing (aggressive + free tier)~$597%
All free models (OpenRouter)~$0100% (but rate-limited & unreliable)

When NOT to Route Down

Always use Tier 3+ for:

  • Security-sensitive code review
  • Financial calculations where errors are costly
  • Architecture decisions that affect the whole codebase
  • Anything the user explicitly asks for premium quality
  • Tasks where the user says "be thorough" or "take your time"

Mode Switching

Users can switch modes mid-conversation:

  • "Use aggressive routing" → Switch to cheapest models per tier
  • "Use quality mode" → Switch to best models per tier
  • "Use balanced routing" → Return to defaults
  • "Use [specific model] for this" → Override routing for one task

Pricing Reference (February 2026)

All prices per million tokens. Models are listed from cheapest to most expensive output:

ModelInputOutputContextProvider
-----------------------------------------
OpenRouter Free Models$0.00$0.00VariesOpenRouter
GPT-5 Nano$0.05$0.40400KOpenAI
Gemini 2.0 Flash$0.10$0.401MGoogle
Gemini 2.5 Flash-Lite$0.10$0.401MGoogle
GPT-4o-mini$0.15$0.60128KOpenAI
Gemini 2.5 Flash$0.15$0.601MGoogle
Grok 4.1 Fast$0.20$0.502MxAI
GPT-5 Mini$0.25$2.00400KOpenAI
DeepSeek V3$0.27$1.1064KDeepSeek
DeepSeek R1$0.55$2.1964KDeepSeek
Claude Haiku 4.5$1.00$5.00200KAnthropic
o4-mini$1.10$4.40200KOpenAI
Gemini 2.5 Pro$1.25$10.001MGoogle
GPT-5$1.25$10.00400KOpenAI
GPT-5.3 Codex$1.75*$14.00*400KOpenAI
o3$2.00$8.00200KOpenAI
GPT-4o$2.50$10.00128KOpenAI
Claude Sonnet 4.5$3.00$15.00200KAnthropic
Grok 4$3.00$15.00256KxAI
Claude Opus 4.5$5.00$25.00200KAnthropic
Claude Opus 4.6$5.00$25.00200K (1M beta)Anthropic

*GPT-5.3 Codex pricing estimated from GPT-5.2 Codex; official API pricing pending.

Note: Prices change. Check provider pricing pages for current rates. Batch API discounts (50% off) and prompt caching (50-90% off) can reduce costs further. OpenRouter free models have rate limits — see openrouter.ai/collections/free-models for current availability.

版本历史

共 1 个版本

  • v0.1.0 当前
    2026-03-29 07:36 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,058 📥 797,759
ai-intelligence

Proactive Agent

halthelobster
将AI智能体从任务执行者升级为主动预判需求、持续优化的智能伙伴。集成WAL协议、工作缓冲区、自主定时任务及实战验证模式。Hal Stack核心组件 🦞
★ 834 📥 212,947
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 710 📥 243,666