← 返回
开发者工具 中文

Token Guard

Prevents LLM API 429 errors by estimating tokens, tracking quotas, throttling requests, detecting duplicates, caching responses, and auto-fallback by model.
防止 LLM API 429 错误:通过估算令牌、跟踪配额、限流请求、去重、缓存响应以及模型自动回退。
edmonddantesj edmonddantesj 来源
开发者工具 clawhub v1.5.0 1 版本 99927.8 Key: 无需
★ 0
Stars
📥 1,384
下载
💾 69
安装
1
版本
#latest

概述

TokenGuard — LLM API 429 Prevention Engine

Version: 1.5.0

Author: Aoineco & Co.

License: MIT

Tags: rate-limit, 429, token-management, cost-optimization, llm-guard, high-performance

Description

Prevents LLM API 429 (Rate Limit / Resource Exhausted) errors by intercepting requests before they're sent. Designed for users on free/low-cost API plans who need maximum intelligence per dollar.

Core philosophy: "Intelligence is measured not by how much you spend, but by how little you need."

Problem

When using LLM APIs (especially Google Gemini Flash with 1M TPM limit):

  • Large documents (docx, PDFs) can consume the entire minute quota in one request
  • Failed requests still count toward token usage
  • Retry loops after 429 errors waste more tokens → death spiral
  • No built-in way to detect runaway/duplicate requests

Features

FeatureDescription
----------------------
Pre-flight Token EstimationEstimates token count before API call (CJK-aware, no tiktoken dependency)
Real-time Quota TrackingTracks per-model per-minute token usage with sliding window
Smart ThrottleAuto-waits when quota > 80%, blocks at > 95%
Duplicate DetectionBlocks identical requests within 60s window (3+ = runaway)
Response CachingCaches successful responses for duplicate requests
Auto Model FallbackSwitches to cheaper/available model when primary is exhausted
429 Error ParserExtracts exact retry delay from Google/Anthropic error responses
Batch vs Mistake DetectionDistinguishes intentional bulk processing from error loops

Supported Models

Pre-configured quotas for:

  • gemini-3-flash (1M TPM)
  • gemini-3-pro (2M TPM)
  • claude-haiku (50K TPM)
  • claude-sonnet (200K TPM)
  • claude-opus (200K TPM)
  • gpt-4o (800K TPM)
  • deepseek (1M TPM)

Custom quotas can be added for any model.

Usage

from token_guard import TokenGuard

guard = TokenGuard()

# Before every API call:
decision = guard.check(prompt_text, model="gemini-3-flash")

if decision.action == "proceed":
    response = call_your_api(prompt_text)
    guard.record_usage(decision.estimated_tokens, model="gemini-3-flash")
    guard.cache_response(prompt_text, response)

elif decision.action == "wait":
    time.sleep(decision.wait_seconds)
    # retry

elif decision.action == "fallback":
    response = call_your_api(prompt_text, model=decision.fallback_model)

elif decision.action == "block":
    print(f"Blocked: {decision.reason}")

# If you get a 429 error:
guard.record_429("gemini-3-flash", retry_delay=53.0)

Integration with OpenClaw

Add to your agent's config or use as a middleware:

skills:
  - token-guard

The agent can invoke TokenGuard before any LLM API call to prevent quota exhaustion.

File Structure

token-guard/
├── SKILL.md          # This file
└── scripts/
    └── token_guard.py  # Main engine (zero external dependencies)

Status Output Example

{
  "models": {
    "gemini-3-flash": {
      "tpm_limit": 1000000,
      "used_this_minute": 750000,
      "remaining": 250000,
      "usage_pct": "75.0%",
      "status": "🟢 OK"
    }
  },
  "stats": {
    "total_checks": 42,
    "tokens_saved": 128000,
    "blocks": 3,
    "fallbacks": 2
  }
}

Zero Dependencies

Pure Python 3.10+. No pip install needed. No tiktoken, no external API calls.

Designed for the $7 Bootstrap Protocol — every byte counts.

版本历史

共 1 个版本

  • v1.5.0 当前
    2026-03-29 03:44 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

self-improving agent

pskoett
捕获经验教训、错误及修正内容,以实现持续改进。适用于以下场景:(1)命令或操作意外失败;(2)用户纠正Claude(如“不,那不对……”“实际上……”);(3)用户请求的功能不存在;(4)外部API或工具出现故障;(5)Claude发现自身
★ 4,086 📥 814,804
ai-agent

Skill Vetter

spclaudehome
AI智能体技能安全预审工具。安装ClawdHub、GitHub等来源技能前,检查风险信号、权限范围及可疑模式。
★ 1,232 📥 268,310
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,385 📥 321,014