← 返回
AI智能 Key

Gemini Live Phone

Bridge Twilio phone calls to Google Gemini Live API for real-time AI voice conversations. No STT/TTS middleware required. Includes VAD and echo suppression.
桥接 Twilio 通话至 Google Gemini Live API 进行实时 AI 语音对话,免 STT/TTS 中间件,集成 VAD 与回声消除。
quantdeveloperusa
AI智能 clawhub v1.0.1 1 版本 100000 Key: 需要
★ 0
Stars
📥 663
下载
💾 22
安装
1
版本
#latest

概述

Gemini Live Phone Bridge

Real-time voice AI over phone calls using Google Gemini's native audio capabilities.

Architecture

Phone ↔ Twilio ↔ WebSocket (μ-law 8kHz) ↔ Bridge (PCM transcoding) ↔ Gemini Live API (24kHz PCM)

Quick Start

# Set required env vars
export GOOGLE_API_KEY="your-key"
export TWILIO_AUTH_TOKEN="your-token"

# Run the bridge
python scripts/bridge.py --port 3335

Endpoints

EndpointMethodDescription
---------
/gemini-live/statusGETHealth check + active calls
/gemini-live/incomingPOSTTwiML for inbound calls (Twilio webhook)
/gemini-live/streamWSTwilio Media Stream WebSocket
/gemini-live/callPOSTInitiate outbound call
/gemini-live/twimlPOSTTwiML for outbound calls
/gemini-live/call-statusPOSTTwilio call status webhook

Outbound Call API

curl -X POST https://your-domain/gemini-live/call \
  -H 'Content-Type: application/json' \
  -d '{"to": "+1234567890", "greeting": "Hello! This is Marcia."}'

Configuration

All settings via CLI args or environment variables:

Core

  • --model — Gemini model (default: gemini-2.5-flash-native-audio-latest)
  • --voice — Gemini voice: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr (default: Kore)
  • --from-number — Twilio outbound number (default: env TWILIO_FROM)
  • --system-prompt — AI persona system prompt
  • --max-duration — Max call seconds (default: 300)

VAD (Voice Activity Detection)

  • --vad-enabled / --no-vad — Toggle server-side VAD (default: on)
  • --vad-silence-ms — Silence duration to trigger activityEnd (default: 500)
  • --vad-energy-threshold — RMS energy threshold (default: 0.01)
  • --vad-speech-min-ms — Min speech duration before activityStart (default: 100)

Echo Suppression

  • --echo-multiplier — VAD threshold multiplier during agent speech (default: 3.0)
  • --echo-decay-ms — Decay time after agent stops speaking (default: 300)

Twilio Setup

  1. Buy a phone number on Twilio
  2. Set Voice webhook: https://your-domain/gemini-live/incoming (HTTP POST)
  3. Set Call status URL: https://your-domain/gemini-live/call-status (HTTP POST)
  4. Ensure geo-permissions are enabled for target countries

Network Requirements

The bridge must be accessible from the internet (Twilio connects to it).

Recommended: Caddy reverse proxy with WebSocket support.

# Caddy config example
handle /gemini-live/* {
    reverse_proxy localhost:3335 {
        flush_interval -1
        transport http {
            read_timeout 0
            write_timeout 0
        }
    }
}

Performance

Latency benchmarks (Gemini 2.5 Flash Native Audio):

ConfigMedianMinMax
------------
No VAD, 200ms buffer3,660ms2,360ms5,180ms
Server VAD, 50ms buffer2,500ms2,080ms6,980ms

Server-side VAD reduces median latency by ~32%.

版本历史

共 1 个版本

  • v1.0.1 当前
    2026-03-19 08:55 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Jobs Hunter Claw

quantdeveloperusa
使用Google表格作为中央数据存储,自动化职位发现、投递申请、状态追踪和活动记录,助您高效求职。
★ 0 📥 709
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 709 📥 243,478
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,349 📥 317,642