← 返回
开发者工具 Key

Failover Gateway Pub

Set up an active-passive OpenClaw failover gateway with health monitoring, auto-promotion/demotion, channel splitting, and git workspace sync for seamless re...
搭建主备OpenClaw故障转移网关,具备健康监控、自动升降级、通道分离及git工作区同步功能,实现无缝切换。
ember-claw ember-claw 来源
开发者工具 clawhub v1.0.0 1 版本 99908.6 Key: 需要
★ 0
Stars
📥 1,093
下载
💾 8
安装
1
版本
#latest

概述

Failover Gateway for OpenClaw

Deploy a standby OpenClaw gateway that automatically takes over when your primary goes down. Active-passive design with auto-promotion and auto-demotion.

What You Get

  • ~30 second failover — health monitor detects primary down, promotes standby
  • Auto-recovery — when primary comes back, standby demotes itself
  • Zero split-brain — primary and standby use different channels (no duplicate messages)
  • Git-synced workspace — standby pulls latest workspace on promotion
  • $12/month — runs on a minimal VPS

Architecture

PRIMARY (your main VPS)          STANDBY (failover VPS)
├─ Full stack (all channels)     ├─ Single channel only (e.g., Discord DM)
├─ All cron jobs                 ├─ No crons (recovery mode)
├─ Gateway active ✅              ├─ Gateway stopped 💤
└─ Pushes workspace to git       └─ Health monitor watches primary
                                      │
                                      ├─ Primary healthy → sleep
                                      ├─ Primary down 30s → PROMOTE
                                      └─ Primary back → DEMOTE

The key insight: split your channels between primary and standby. Don't share credentials — give each node exclusive ownership of different channels. This eliminates split-brain entirely.

Channel Split Examples

SetupPrimaryStandby
-------------------------
RC + DiscordRocket.Chat (full)Discord DM only
Discord + TelegramDiscord (full)Telegram DM only
Slack + DiscordSlack (full)Discord DM only

Your primary handles everything. The standby is minimal recovery — just enough to stay reachable.

Prerequisites

  • Primary OpenClaw instance running on a VPS
  • A second VPS for the standby ($6-12/mo, any provider)
  • Tailscale mesh network (or any VPN/private network)
  • Git repository for workspace sync (GitHub, GitLab, etc.)
  • A second messaging channel for the standby (different from primary)

Step-by-Step Deployment

Phase 1: Provision the Standby VPS

Any cheap VPS works. Recommended: 2GB RAM, Ubuntu 24.04.

# Harden the box
ufw allow 22/tcp
ufw enable
apt install -y fail2ban unattended-upgrades

# Create openclaw user
adduser openclaw --disabled-password
usermod -aG sudo openclaw
# Copy your SSH key to openclaw user

# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up --hostname=your-failover-name

Phase 2: Install OpenClaw

# As openclaw user
curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
source ~/.bashrc
nvm install --lts
npm install -g openclaw

# Clone workspace
git clone <your-workspace-repo> ~/.openclaw/workspace

Phase 3: Failover Config

Create a minimal OpenClaw config on the standby. Only enable the standby channel:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-opus-4-6",
        "fallbacks": ["anthropic/claude-sonnet-4-5"]
      },
      "workspace": "/home/openclaw/.openclaw/workspace"
    },
    "list": [{ "id": "main", "default": true }]
  },
  "channels": {
    "discord": {
      "enabled": true,
      "token": "<YOUR_DISCORD_BOT_TOKEN>",
      "dm": {
        "policy": "allowlist",
        "allowFrom": ["<YOUR_DISCORD_USER_ID>"]
      }
    }
  },
  "gateway": {
    "port": 18789,
    "mode": "local",
    "bind": "tailnet"
  }
}

Important: Disable this channel on your primary to avoid conflicts.

Test it works: openclaw gateway run — verify the bot connects and responds, then stop it.

Phase 4: Deploy Health Monitor

Copy the included scripts/health-monitor.sh to the standby:

sudo cp health-monitor.sh /usr/local/bin/openclaw-health-monitor.sh
sudo chmod +x /usr/local/bin/openclaw-health-monitor.sh

Edit the variables at the top:

  • PRIMARY_IP — your primary's Tailscale IP
  • PRIMARY_PORT — your primary's gateway port (default: 18789)
  • SECRETS_HOST — (optional) host to rsync secrets from on promotion

Create the systemd services:

/etc/systemd/system/openclaw-health-monitor.service

[Unit]
Description=OpenClaw Failover Health Monitor
After=network-online.target tailscaled.service
Wants=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/openclaw-health-monitor.sh
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

/etc/systemd/system/openclaw.service

[Unit]
Description=OpenClaw Gateway (Failover)
After=network-online.target tailscaled.service
Wants=network-online.target

[Service]
Type=simple
User=openclaw
Group=openclaw
WorkingDirectory=/home/openclaw/.openclaw/workspace
ExecStart=/usr/bin/openclaw gateway run
Restart=on-failure
RestartSec=5
Environment=HOME=/home/openclaw
Environment=NODE_ENV=production

[Install]
WantedBy=multi-user.target

Enable the monitor (but NOT the gateway — the monitor starts it on promotion):

sudo systemctl daemon-reload
sudo systemctl enable openclaw-health-monitor
sudo systemctl start openclaw-health-monitor
# Do NOT enable openclaw.service — the monitor controls it

Phase 5: Disable Standby Channel on Primary

This is critical. Remove or disable the standby's channel from your primary config:

{
  "channels": {
    "discord": { "enabled": false }
  }
}

Each node owns its channels exclusively. No sharing, no conflicts.

Phase 6: Test

# On primary — simulate failure
sudo systemctl stop openclaw-gateway  # or kill the process

# Watch the standby logs
journalctl -u openclaw-health-monitor -f

# Expected: 3 failed checks → PROMOTE → gateway starts → standby channel live

# On primary — recover
sudo systemctl start openclaw-gateway

# Expected: standby detects primary → DEMOTE → gateway stops

Failover Timeline

TimeEvent
-------------
0sPrimary goes down
10sFirst health check fails
20sSecond check fails
30sThird check fails → PROMOTE
35sGit pull, secrets sync
40sGateway starting
45sStandby channel active
~60sYou're reachable again

Edge Cases

ScenarioResult
------------------
Primary diesStandby promotes in ~30-60s
Primary + standby dieYou're offline (add a third node?)
Network partitionStandby may promote while primary is still running — but since they use different channels, no conflicts
Standby rebootsHealth monitor auto-restarts (systemd), resumes watching
Primary flapsPromote/demote cycles — health monitor handles it, but consider increasing FAIL_THRESHOLD

Failback

Recovery is automatic. When the primary comes back:

  1. Health monitor detects primary healthy
  2. Stops the standby gateway
  3. Primary resumes all channels
  4. Standby returns to watching

No manual intervention needed.

Cost

ComponentCost
-----------------
VPS (2GB RAM)$6-12/mo
TailscaleFree (personal)
Git repoFree
Total$6-12/mo

Tips

  • Test monthly. Kill your primary, verify failover works. Trust but verify.
  • Keep the standby minimal. No crons, no extra channels. It's recovery mode.
  • Git push frequently. The standby's workspace is only as fresh as your last push.
  • Use Tailscale. It makes cross-VPS networking trivial. No firewall rules, no port forwarding.
  • Different bot tokens. If using Discord on both, you need two bot applications. Same bot token = last-connect-wins.
  • Monitor the monitor. Check journalctl -u openclaw-health-monitor occasionally to make sure it's running.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 06:42 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 30,725
it-ops-security

MoltGuard - Security & Antivirus & Guardrails

thomaslwang
MoltGuard — OpenClaw 安全守卫,由 OpenGuardrails 提供。安装 MoltGuard,保护您和您的用户免受提示注入、数据泄露和恶意攻击。
★ 116 📥 30,812
it-ops-security

Tmux

steipete
通过发送按键和抓取窗格输出,远程控制交互式 CLI 的 tmux 会话。
★ 45 📥 29,309