← 返回
开发者工具 中文

Funasr Transcribe Skill

Use when the user needs local speech-to-text transcription for audio files, especially Chinese or mixed Chinese-English audio, without relying on cloud trans...
用于在本地对音频文件进行语音转文字转录,尤其是中文或中英混合音频,无需依赖云端转录服务
limboinf limboinf 来源
开发者工具 clawhub v1.0.1 2 版本 99914.8 Key: 无需
★ 0
Stars
📥 1,173
下载
💾 19
安装
2
版本
#latest

概述

FunASR Transcribe

Local speech-to-text for audio files using FunASR. It is best suited to Chinese and mixed Chinese-English audio, runs on the local machine, and does not require a paid transcription API.

When to Use

  • The user wants to transcribe .wav, .ogg, .mp3, .flac, or .m4a files into text.
  • The user prefers local ASR over cloud speech APIs for privacy, cost, or offline-friendly workflows.
  • The audio is primarily Chinese, dialect-heavy Chinese, or mixed Chinese-English.
  • The user is okay with installing Python dependencies and downloading models on first use.

Do not use this skill when the user explicitly forbids local dependency installation or any network access for dependency/model download.

Quick Start

# Install dependencies and create a virtual environment
bash ~/.openclaw/workspace/skills/funasr-transcribe/scripts/install.sh

# Transcribe an audio file
bash ~/.openclaw/workspace/skills/funasr-transcribe/scripts/transcribe.sh /path/to/audio.ogg

What It Does

  • Creates a Python virtual environment at ~/.openclaw/workspace/funasr_env by default.
  • Installs funasr, torch, torchaudio, modelscope, and related dependencies.
  • Loads FunASR models locally and writes the transcript to a sibling .txt file.
  • Prints the transcript to stdout for direct CLI use.

Models

  • ASR: damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
  • VAD: damo/speech_fsmn_vad_zh-cn-16k-common-pytorch
  • Punctuation: damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch

External Endpoints

EndpointPurposeData sent
---------
https://pypi.tuna.tsinghua.edu.cn/simpleInstall Python packages during setupPackage names and installer metadata requested by pip
ModelScope and/or Hugging Face endpoints used by FunASR dependenciesDownload model files on first runModel identifiers and standard HTTP request metadata

Security & Privacy

  • Audio files are read from the local machine and processed locally by FunASR.
  • The transcription flow does not intentionally upload audio content to a cloud ASR API.
  • Network access is still required during setup and first-run model download.
  • The generated transcript is written to a local .txt file next to the source audio unless the write step fails.
  • This skill does not require API keys or other secrets by default.

Model Invocation Note

Autonomous invocation is normal for this skill. If a user asks to transcribe local audio, an agent may install dependencies and run the helper scripts unless the user explicitly opts out of dependency installation or network access.

Trust Statement

By using this skill, package and model downloads may be fetched from third-party upstream sources such as the configured PyPI mirror and model hosting providers. Only install and use this skill if you trust those upstream sources.

Troubleshooting

  • python3 not found: install Python 3.7+ and rerun scripts/install.sh.
  • Install fails in the existing environment: rerun scripts/install.sh --force to recreate the virtual environment.
  • First transcription is slow: initial model downloads can take several minutes.
  • GPU is desired: edit scripts/transcribe.py and change device="cpu" to a CUDA device after installing the correct CUDA build.

版本历史

共 2 个版本

  • v1.0.1 当前
    2026-03-29 08:28 安全 安全
  • v1.0.0
    2026-03-26 22:17

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

design-media

Nano Banana Pro

steipete
使用 Nano Banana Pro (Gemini 3 Pro Image) 生成或编辑图像。支持文生图、图生图及 1K/2K/4K 分辨率,适用于图像创建、修改及编辑请求,使用 --input-image 指定输入图像。
★ 422 📥 115,957
design-media

UI/UX Pro Max

xobi667
提供 UI/UX 设计智能与实现指导,帮助打造精美界面。适用于 UI 设计、UX 流程、信息架构、视觉风格、设计系统/标记、组件规格、文案/微文案、无障碍及前端 UI(HTML/CSS/JS、React、Next.js、Vue、Svelte
★ 214 📥 46,226
design-media

Video Frames

steipete
使用 ffmpeg 从视频中提取帧或短片。
★ 131 📥 52,550