概述

Whisper STT Skill

Free, local speech-to-text using OpenAI Whisper.

Prerequisites

Install dependencies (one-time setup):

pip install openai-whisper torch

Optional: Install ffmpeg for broader format support:

macOS: brew install ffmpeg
Ubuntu: sudo apt install ffmpeg

Usage

Transcribe an audio file

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py <audio_file>

Options

Option	Description
--------	-------------
`--model`	Model size: tiny, base, small, medium, large, large-v3-turbo (default: base)
`--language, -l`	Language code: zh, en, ja, etc. (auto-detect if not specified)
`--output, -o`	Output format: json, txt, srt, vtt (default: json)

Examples

Chinese audio to text:

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py recording.m4a --language zh --output txt

Generate subtitles (SRT):

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py video.mp4 --output srt > subtitles.srt

Use faster model:

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model tiny --output txt

High accuracy (slower):

python ~/.openclaw/skills/whisper-stt/scripts/transcribe.py audio.mp3 --model large-v3 --output txt

Model Selection Guide

Model	Speed	Accuracy	VRAM/RAM	Best For
-------	-------	----------	----------	----------
tiny	~32x	Basic	~1GB	Quick tests, low resource
base	~16x	Good	~1GB	Balanced speed/accuracy
small	~6x	Better	~2GB	Better accuracy
medium	~2x	Very Good	~5GB	High accuracy
large	1x	Excellent	~10GB	Best quality
large-v3-turbo	~8x	Excellent	~6GB	Fast + accurate (recommended)

Troubleshooting

"ModuleNotFoundError: No module named 'whisper'"

→ Run: pip install openai-whisper torch

"ffmpeg not found"

→ Install ffmpeg or convert audio to WAV format first

Slow transcription

→ Use smaller model (tiny/base) or ensure GPU is available (Apple Silicon MPS, NVIDIA CUDA)

Poor accuracy on Chinese

→ Use --language zh explicitly and consider larger model (medium/large)

Output Formats

json: Full result with segments, timestamps, and metadata
txt: Plain text transcription only
srt: SubRip subtitle format with timing
vtt: WebVTT subtitle format for web players

Credits

Powered by OpenAI Whisper - open source speech recognition.

版本历史

共 1 个版本

v1.0.0 当前

2026-03-29 12:28 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)