← 返回
开发者工具 中文

Whisper STT

Free local speech-to-text transcription using OpenAI Whisper. Transcribe audio files (mp3, wav, m4a, ogg, etc.) to text without API costs. Use when: (1) User...
利用 OpenAI Whisper 实现免费本地语音转文字。无需 API 费用,即可将 mp3、wav、m4a、ogg 等音频文件转录为文本。适用场景:(1) 用户...
nickylin
开发者工具 clawhub v1.0.0 1 版本 99726 Key: 无需
★ 1
Stars
📥 1,800
下载
💾 56
安装
1
版本
#latest

概述

Whisper STT Skill

Free, local speech-to-text using OpenAI Whisper.

Prerequisites

Install dependencies (one-time setup):

pip install openai-whisper torch

Optional: Install ffmpeg for broader format support:

  • macOS: brew install ffmpeg
  • Ubuntu: sudo apt install ffmpeg

Usage

Transcribe an audio file

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py <audio_file>

Options

OptionDescription
---------------------
--modelModel size: tiny, base, small, medium, large, large-v3-turbo (default: base)
--language, -lLanguage code: zh, en, ja, etc. (auto-detect if not specified)
--output, -oOutput format: json, txt, srt, vtt (default: json)

Examples

Chinese audio to text:

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py recording.m4a --language zh --output txt

Generate subtitles (SRT):

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py video.mp4 --output srt > subtitles.srt

Use faster model:

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model tiny --output txt

High accuracy (slower):

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model large-v3 --output txt

Model Selection Guide

ModelSpeedAccuracyVRAM/RAMBest For
--------------------------------------------
tiny~32xBasic~1GBQuick tests, low resource
base~16xGood~1GBBalanced speed/accuracy
small~6xBetter~2GBBetter accuracy
medium~2xVery Good~5GBHigh accuracy
large1xExcellent~10GBBest quality
large-v3-turbo~8xExcellent~6GBFast + accurate (recommended)

Troubleshooting

"ModuleNotFoundError: No module named 'whisper'"

→ Run: pip install openai-whisper torch

"ffmpeg not found"

→ Install ffmpeg or convert audio to WAV format first

Slow transcription

→ Use smaller model (tiny/base) or ensure GPU is available (Apple Silicon MPS, NVIDIA CUDA)

Poor accuracy on Chinese

→ Use --language zh explicitly and consider larger model (medium/large)

Output Formats

  • json: Full result with segments, timestamps, and metadata
  • txt: Plain text transcription only
  • srt: SubRip subtitle format with timing
  • vtt: WebVTT subtitle format for web players

Credits

Powered by OpenAI Whisper - open source speech recognition.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 12:28 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Agent Browser

matrixy
专为AI智能体优化的无头浏览器自动化CLI,支持无障碍树快照和基于引用的元素选择。
★ 427 📥 118,129
developer-tools

Gog

steipete
Google Workspace 命令行工具,支持 Gmail、日历、云端硬盘、通讯录、表格和文档。
★ 921 📥 185,771
developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 67 📥 180,039