← 返回
AI智能 Key

Transcribe audio files via OpenRouter using audio-capable models

Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).
通过 OpenRouter 使用 Gemini、GPT-4o-audio 等音频模型转录音频文件。
obviyus
AI智能 clawhub v1.0.0 1 版本 99029.1 Key: 需要
★ 4
Stars
📥 4,408
下载
💾 925
安装
1
版本
#latest

概述

OpenRouter Audio Transcription

Transcribe audio files using OpenRouter's chat completions API with input_audio content type. Works with any audio-capable model.

Quick start

{baseDir}/scripts/transcribe.sh /path/to/audio.m4a

Output goes to stdout.

Useful flags

# Custom model (default: google/gemini-2.5-flash)
{baseDir}/scripts/transcribe.sh audio.ogg --model openai/gpt-4o-audio-preview

# Custom instructions
{baseDir}/scripts/transcribe.sh audio.m4a --prompt "Transcribe with speaker labels"

# Save to file
{baseDir}/scripts/transcribe.sh audio.m4a --out /tmp/transcript.txt

# Custom caller identifier (for OpenRouter dashboard)
{baseDir}/scripts/transcribe.sh audio.m4a --title "MyApp"

How it works

  1. Converts audio to WAV (mono, 16kHz) using ffmpeg
  2. Base64 encodes the audio
  3. Sends to OpenRouter chat completions with input_audio content
  4. Extracts transcript from response

API key

Set OPENROUTER_API_KEY env var, or configure in ~/.clawdbot/clawdbot.json:

{
  skills: {
    "openrouter-transcribe": {
      apiKey: "YOUR_OPENROUTER_KEY"
    }
  }
}

Headers

The script sends identification headers to OpenRouter:

  • X-Title: Caller name (default: "Peanut/Clawdbot")
  • HTTP-Referer: Reference URL (default: "https://clawdbot.com")

These show up in your OpenRouter dashboard for tracking.

Troubleshooting

ffmpeg format errors: The script uses a temp directory (not mktemp -t file.wav) because macOS's mktemp adds random suffixes after the extension, breaking format detection.

Argument list too long: Large audio files produce huge base64 strings that exceed shell argument limits. The script writes to temp files (--rawfile for jq, @file for curl) instead of passing data as arguments.

Empty response: If you get "Empty response from API", the script will dump the raw response for debugging. Common causes:

  • Invalid API key
  • Model doesn't support audio input
  • Audio file too large or corrupted

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-28 10:57 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Manage YNAB budgets, accounts, categories, and transactions.

obviyus
通过CLI管理YNAB预算、账户、类别和交易
★ 8 📥 3,076
ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 709 📥 243,527
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,349 📥 317,697