Routes tasks between a local Ollama instance (fast, cheap) and a remote/cloud Ollama instance (more capable) based on task complexity classification and system capabilities.
# 1. Profile your system
python scripts/system_profiler.py
# 2. Check endpoints are healthy
python scripts/health_check.py
# 3. Route a task
python scripts/route.py "What is quantum computing?"
User Request
↓
System Profiler (detects compatible models)
↓
Health Check (verifies endpoints are up)
↓
Classify Task (1-5 complexity score)
↓
├─ Score 1-2 → Local Ollama (fast, cheap)
├─ Score 3-5 → Cloud Ollama (powerful)
└─ Match specialist → Dedicated model
↓
Verify model available (fallback if not)
↓
Stream Response
| Score | Complexity | Examples | Routed To |
|---|---|---|---|
| ------- | ------------ | ---------- | ----------- |
| 1 | Simple | "What is 2+2?", "Define entropy" | Local |
| 2 | Basic | "Write hello world in Python" | Local |
| 3 | Complex | "Debug this error", "Compare X vs Y" | Cloud |
| 4 | Deep | "Design a system", "Research topic" | Cloud |
| 5 | Expert | "Build from scratch", "Multi-file project" | Cloud |
smart-router/
├── SKILL.md # This file
├── __init__.py # Python package interface
├── requirements.txt # Dependencies
│
├── config/
│ ├── router.yaml # Main configuration
│ └── system_profile.json # Auto-generated system specs
│
├── scripts/
│ ├── classify.py # Task complexity classifier
│ ├── execute.py # Ollama API client
│ ├── route.py # Main routing logic
│ ├── system_profiler.py # Hardware detection
│ └── health_check.py # Endpoint health verification
│
├── tests/
│ └── test_classifier.py # Test suite
│
└── references/
└── classifier-prompt.txt # LLM fallback prompt
Edit config/router.yaml:
# Local Ollama (your machine)
local:
model: "llama3.2"
base_url: "http://localhost:11434"
# Cloud Ollama (remote server)
cloud:
model: "qwen2.5:14b"
base_url: "http://192.168.1.100:11434"
# Tasks scoring >= this go to cloud
threshold: 3
# Domain specialists (checked first)
specialists:
code:
model: "codellama:34b"
base_url: "http://192.168.1.100:11434"
triggers: ["code review", "refactor"]
# Performance settings
performance:
timeout_seconds: 60
stream_responses: true
retry_attempts: 2
# Caching
cache:
enabled: true
db_path: "cache/router.db"
ttl_seconds: 86400
# Basic routing
python scripts/route.py "What is the capital of France?"
# With profiling (updates system profile)
python scripts/route.py "Debug this error" --profile
# Custom config
python scripts/route.py "Design a system" --config config/my-router.yaml
# No streaming (wait for full response)
python scripts/route.py "Summarize this" --no-stream
# Health check all endpoints
python scripts/health_check.py
# Manual classification
python scripts/classify.py "Write a function"
# Output: "2:basic-task"
from smart_router import SmartRouter
# Initialize
router = SmartRouter()
# Route with streaming
for chunk in router.route("Explain quantum computing"):
print(chunk, end='')
# Classify only
score, reason = router.classify("Debug this code")
print(f"Complexity: {score}/5, Reason: {reason}")
# Get configuration
config = router.get_config()
print(f"Local model: {config['local']['model']}")
Run once (or when hardware changes):
python scripts/system_profiler.py
This creates config/system_profile.json with:
Verify endpoints before use:
python scripts/health_check.py
Checks:
When you submit a task:
Uses regex patterns (not LLM calls) for speed:
Automatically detects what your system can run:
Pre-flight checks prevent routing to dead endpoints:
✓ local | Status: healthy | Latency: 45ms | Models: 5
✗ cloud | Status: unreachable | Error: Connection refused
Even though Ollama is free, logs track latency:
[2024-01-15T10:30:00] task: '...' -> local | model: llama3.2 | latency: 0.85s
[2024-01-15T10:30:45] task: '...' -> cloud | model: qwen2.5:14b | latency: 3.2s
# Run classifier tests
python tests/test_classifier.py
# Expected output:
# ✓ PASS [1] Simple factual question
# ✓ PASS [1] Zip code (not code)
# ✓ PASS [3] Debugging
# ...
# Results: X passed, Y failed
# Check if Ollama is running
ollama serve
# Verify endpoint
curl http://localhost:11434/api/tags
# Pull the model
ollama pull llama3.2
# Or let router auto-fallback to available model
Check pattern in scripts/classify.py:
# Add new pattern
COMPLEXITY_PATTERNS[2].append(r'your\s+pattern\s+here')
# In config/router.yaml
performance:
timeout_seconds: 30 # Reduce timeout
pip install -r requirements.txt| Approach | Latency | Cost | Accuracy | Verdict |
|---|---|---|---|---|
| ---------- | --------- | ------ | ---------- | --------- |
| Pattern matching | 30ms | 0 tokens | 90% | ✅ Used |
| LLM classification | 500ms | 50 tokens | 95% | Optional (--llm) |
Pattern matching wins on speed/cost. Accuracy is good enough for routing.
Ollama-only keeps everything:
共 1 个版本