← 返回
AI智能 中文

Douyin Video Transcribe

Douyin video transcription suite. Extract audio from Douyin/TikTok China videos, transcribe with Whisper, and analyze content. Supports video links, local fi...
抖音视频转录套件。提取抖音/TikTok中国区视频音频,使用Whisper转录并分析内容。支持视频链接和本地文件。
don068589
AI智能 clawhub v2.0.0 2 版本 99769.3 Key: 无需
★ 0
Stars
📥 865
下载
💾 56
安装
2
版本
#latest

概述

Douyin Transcribe - Video Transcription Suite

A complete solution for transcribing Douyin (抖音/TikTok China) videos. Extracts audio, transcribes speech to text, and generates structured summaries.

Version History

VersionChanges
------------------
2.0.0Modular architecture, improved workflow, browser DOM extraction
1.0.0Initial release, basic transcription

Architecture

\\\

User Input (Douyin Link/File)

┌─────────────────────────────────────────┐

│ Workflow Orchestrator │

├─────────────────────────────────────────┤

│ Step 1: Fetcher → Get video file │

│ Step 2: Transcriber → Extract & convert│

│ Step 3: Analyzer → Structure output │

│ Step 4: Output → Save results │

└─────────────────────────────────────────┘

\\\

Core Features

  • Video Fetching: Browser-based DOM extraction for CDN URLs
  • Audio Extraction: ffmpeg-powered audio conversion
  • Speech-to-Text: Whisper ASR with multiple model options
  • Content Analysis: Auto-structured transcripts with key points
  • Multi-format Support: Video links, local files, image notes

Prerequisites

ToolPurposeInstall
------------------------
curlDownload filesBuilt-in (Windows: \curl.exe\)
ffmpegAudio extraction/merge\winget install Gyan.FFmpeg\
WhisperTranscription\pip install openai-whisper\ or Docker
BrowserVideo extractionOpenClaw profile required

Docker Whisper (Recommended):

\\\ash

docker run -d -p 9000:9000 --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest

\\\

Workflow

Step 0: Input Classification

Input TypeDetectionAction
-------------------------------
Video link (\/video/\)URL patternFull workflow
Image note (\/note/\)URL patternSnapshot only
Local video fileFile pathStart from Step 2
Text inputPlain textStart from Step 3

Step 1: Fetch Video

1.1 Resolve Short URL

\\\ash

Windows PowerShell

curl.exe -sL -o NUL -w "%{url_effective}" "https://v.douyin.com/xxx/"

macOS/Linux

curl -sL -o /dev/null -w '%{url_effective}' "https://v.douyin.com/xxx/"

\\\

Output: \https://www.douyin.com/video/7616020798351871284\

1.2 Open Video Page

\\\

browser(action='open', profile='openclaw', url='https://www.douyin.com/video/{VIDEO_ID}')

\\\

Wait 10-15 seconds for page to load completely.

1.3 Extract Video URL (Browser DOM Method)

\\\javascript

browser(action='act', targetId='PAGE_ID', request={

"kind": "evaluate",

"fn": "(() => {

const entries = performance.getEntriesByType('resource');

const videoEntries = entries.filter(e => {

const name = e.name.toLowerCase();

return name.includes('douyinvod') &&

(name.includes('.mp4') || name.includes('video'));

});

if (videoEntries.length > 0) {

const video = videoEntries[videoEntries.length - 1];

return {

url: video.name,

type: video.name.includes('.mp4') ? 'mp4' : 'dash'

};

}

return null;

})()"

})

\\\

Important Notes:

  • \ct\ action requires nested \ equest\ object with \kind\ and \ n\
  • Wrong: \rowser(action='act', fn='...')\
  • Correct: \rowser(action='act', request={"kind": "evaluate", "fn": "..."})\

1.4 Download Video

\\\ash

curl.exe -L -H "Referer: https://www.douyin.com/" -o video.mp4 ""

\\\

Referer header is required, otherwise 403.

Step 2: Transcribe Audio

2.1 Extract Audio

\\\ash

For MP4 videos

ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

For DASH videos (need merge)

ffmpeg -i video.mp4 -i audio.mp4 -c copy merged.mp4 -y

ffmpeg -i merged.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

\\\

Parameters:

  • \-ar 16000\: 16kHz sample rate (Whisper requirement)
  • \-ac 1\: Mono channel
  • \-c:a pcm_s16le\: 16-bit PCM

2.2 Transcribe with Docker Whisper

\\\ash

curl.exe -X POST "http://localhost:PORT/asr" -F "audio_file=@audio.wav"

\\\

2.3 Alternative: Local Whisper

\\\ash

python -m whisper audio.wav --model small --language zh

\\\

Model Selection:

ModelSize5-min Video (CPU)AccuracyUse Case
----------------------------------------------------
tiny75MB~30sFairQuick preview
base142MB~1minGoodDaily use
small466MB~3minBetterRecommended
medium1.5GB~8minBestHigh accuracy

Step 3: Analyze Content

Agent processes transcript and generates:

  1. Fix transcription errors
    • Correct homophones
    • Fix speaker names
    • Remove filler words
  1. Structure content
    • Add paragraph breaks
    • Create sections
  1. Extract key points
    • Main ideas
    • Important quotes
  1. Generate tags
    • 3-5 topic tags

Step 4: Save Output

Transcript Format

\\\markdown

{Title}

作者: {Author}

来源: 抖音

日期: {Date}

转录时间: {Transcription Date}


摘要

{Summary}


正文

{Transcript content with paragraphs}


要点

  • {Key point 1}
  • {Key point 2}
  • {Key point 3}

标签

#{tag1} #{tag2} #{tag3}

\\\

File Naming Convention

\\\

{VIDEO_ID}-抖音转录.md

\\\

Troubleshooting

StageIssueSolution
------------------------
Step 1Short URL failsCheck link completeness, remove share text
Step 1JS returns nullWait 15-20s and retry, increase timeout
Step 1Download 403URL expired, re-fetch from browser
Step 1DASH no audioMerge with \ fmpeg -i video -i audio -c copy\
Step 2ffmpeg not installed\winget install Gyan.FFmpeg\
Step 2Whisper service down\docker start whisper-asr\
Step 2Transcription slow10-min video takes 15-20 min on CPU
Step 2Poor qualityUse larger model (medium)

Image Note Handling

Image notes (\/note/\) don't need transcription:

\\\

  1. browser(action='open', profile='openclaw', url='IMAGE_NOTE_URL')
  2. browser(action='snapshot')
  3. Extract content from snapshot
  4. Save to output directory

\\\

Edge Cases

  • Article links (\/article/\): Use browser snapshot, no transcription
  • Douyin AI summary: Extract from page as supplement
  • Other platforms: Use yt-dlp for YouTube/Bilibili
  • Live streams: Not supported

Related Modules

This skill can be extended with standalone modules:

ModulePurpose
-----------------
douyin-fetcherVideo fetching only
douyin-transcriberAudio transcription only
douyin-analyzerContent analysis only
douyin-orchestratorWorkflow coordination

License

MIT-0 License - Free to use, modify, and redistribute.

版本历史

共 2 个版本

  • v2.0.0 当前
    2026-05-01 03:23 安全 安全
  • v1.0.0
    2026-03-19 12:17 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 710 📥 243,666
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,058 📥 797,759

Self-Improve

don068589
可插拔的AI智能体自我改进框架,自动学习错误、纠正和反馈,持续提升执行质量。
★ 0 📥 1,352