概述

Douyin Transcribe - Video Transcription Suite

A complete solution for transcribing Douyin (抖音/TikTok China) videos. Extracts audio, transcribes speech to text, and generates structured summaries.

Version History

Version	Changes
---------	---------
2.0.0	Modular architecture, improved workflow, browser DOM extraction
1.0.0	Initial release, basic transcription

Architecture

\\\

User Input (Douyin Link/File)

│

▼

┌─────────────────────────────────────────┐

│ Workflow Orchestrator │

├─────────────────────────────────────────┤

│ Step 1: Fetcher → Get video file │

│ Step 2: Transcriber → Extract & convert│

│ Step 3: Analyzer → Structure output │

│ Step 4: Output → Save results │

└─────────────────────────────────────────┘

\\\

Core Features

Video Fetching: Browser-based DOM extraction for CDN URLs
Audio Extraction: ffmpeg-powered audio conversion
Speech-to-Text: Whisper ASR with multiple model options
Content Analysis: Auto-structured transcripts with key points
Multi-format Support: Video links, local files, image notes

Prerequisites

Tool	Purpose	Install
------	---------	---------
curl	Download files	Built-in (Windows: \curl.exe\)
ffmpeg	Audio extraction/merge	\winget install Gyan.FFmpeg\
Whisper	Transcription	\pip install openai-whisper\ or Docker
Browser	Video extraction	OpenClaw profile required

Docker Whisper (Recommended):

\\\ash

docker run -d -p 9000:9000 --name whisper-asr onerahmet/openai-whisper-asr-webservice:latest

\\\

Workflow

Step 0: Input Classification

Input Type	Detection	Action
------------	-----------	--------
Video link (\/video/\)	URL pattern	Full workflow
Image note (\/note/\)	URL pattern	Snapshot only
Local video file	File path	Start from Step 2
Text input	Plain text	Start from Step 3

Step 1: Fetch Video

1.1 Resolve Short URL

\\\ash

Windows PowerShell

curl.exe -sL -o NUL -w "%{url_effective}" "https://v.douyin.com/xxx/"

macOS/Linux

curl -sL -o /dev/null -w '%{url_effective}' "https://v.douyin.com/xxx/"

\\\

Output: \https://www.douyin.com/video/7616020798351871284\

1.2 Open Video Page

\\\

browser(action='open', profile='openclaw', url='https://www.douyin.com/video/{VIDEO_ID}')

\\\

Wait 10-15 seconds for page to load completely.

1.3 Extract Video URL (Browser DOM Method)

\\\javascript

browser(action='act', targetId='PAGE_ID', request={

"kind": "evaluate",

"fn": "(() => {

const entries = performance.getEntriesByType('resource');

const videoEntries = entries.filter(e => {

const name = e.name.toLowerCase();

return name.includes('douyinvod') &&

(name.includes('.mp4') || name.includes('video'));

});

if (videoEntries.length > 0) {

const video = videoEntries[videoEntries.length - 1];

return {

url: video.name,

type: video.name.includes('.mp4') ? 'mp4' : 'dash'

};

}

return null;

})()"

})

\\\

Important Notes:

\ct\ action requires nested \ equest\ object with \kind\ and \n\
Wrong: \rowser(action='act', fn='...')\
Correct: \rowser(action='act', request={"kind": "evaluate", "fn": "..."})\

1.4 Download Video

\\\ash

curl.exe -L -H "Referer: https://www.douyin.com/" -o video.mp4 ""

\\\

Referer header is required, otherwise 403.

Step 2: Transcribe Audio

2.1 Extract Audio

\\\ash

For MP4 videos

ffmpeg -i video.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

For DASH videos (need merge)

ffmpeg -i video.mp4 -i audio.mp4 -c copy merged.mp4 -y

ffmpeg -i merged.mp4 -ar 16000 -ac 1 -c:a pcm_s16le audio.wav -y

\\\

Parameters:

\-ar 16000\: 16kHz sample rate (Whisper requirement)
\-ac 1\: Mono channel
\-c:a pcm_s16le\: 16-bit PCM

2.2 Transcribe with Docker Whisper

\\\ash

curl.exe -X POST "http://localhost:PORT/asr" -F "audio_file=@audio.wav"

\\\

2.3 Alternative: Local Whisper

\\\ash

python -m whisper audio.wav --model small --language zh

\\\

Model Selection:

Model	Size	5-min Video (CPU)	Accuracy	Use Case
-------	------	-------------------	----------	----------
tiny	75MB	~30s	Fair	Quick preview
base	142MB	~1min	Good	Daily use
small	466MB	~3min	Better	Recommended
medium	1.5GB	~8min	Best	High accuracy

Step 3: Analyze Content

Agent processes transcript and generates:

Fix transcription errors

Correct homophones
Fix speaker names
Remove filler words

Structure content

Add paragraph breaks
Create sections

Extract key points

Main ideas
Important quotes

Generate tags

3-5 topic tags

Step 4: Save Output

Transcript Format

\\\markdown

{Title}

作者: {Author}

来源: 抖音

日期: {Date}

转录时间: {Transcription Date}

摘要

{Summary}

正文

{Transcript content with paragraphs}

要点

{Key point 1}
{Key point 2}
{Key point 3}

Troubleshooting

Stage	Issue	Solution
-------	-------	----------
Step 1	Short URL fails	Check link completeness, remove share text
Step 1	JS returns null	Wait 15-20s and retry, increase timeout
Step 1	Download 403	URL expired, re-fetch from browser
Step 1	DASH no audio	Merge with \fmpeg -i video -i audio -c copy\
Step 2	ffmpeg not installed	\winget install Gyan.FFmpeg\
Step 2	Whisper service down	\docker start whisper-asr\
Step 2	Transcription slow	10-min video takes 15-20 min on CPU
Step 2	Poor quality	Use larger model (medium)

Image Note Handling

Image notes (\/note/\) don't need transcription:

\\\

browser(action='open', profile='openclaw', url='IMAGE_NOTE_URL')
browser(action='snapshot')
Extract content from snapshot
Save to output directory

\\\

Edge Cases

Article links (\/article/\): Use browser snapshot, no transcription
Douyin AI summary: Extract from page as supplement
Other platforms: Use yt-dlp for YouTube/Bilibili
Live streams: Not supported

Related Modules

This skill can be extended with standalone modules:

Module	Purpose
--------	---------
douyin-fetcher	Video fetching only
douyin-transcriber	Audio transcription only
douyin-analyzer	Content analysis only
douyin-orchestrator	Workflow coordination

License

MIT-0 License - Free to use, modify, and redistribute.

版本历史

共 2 个版本

v2.0.0 当前

2026-05-01 03:23 安全安全
v1.0.0

2026-03-19 12:17 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)

安全，无风险

查看报告

Douyin Video Transcribe

概述

Douyin Transcribe - Video Transcription Suite

Version History

Architecture

Core Features

Prerequisites

Workflow

Step 0: Input Classification

Step 1: Fetch Video

1.1 Resolve Short URL

Windows PowerShell

macOS/Linux

1.2 Open Video Page

1.3 Extract Video URL (Browser DOM Method)

1.4 Download Video

Step 2: Transcribe Audio

2.1 Extract Audio

For MP4 videos

For DASH videos (need merge)

2.2 Transcribe with Docker Whisper

2.3 Alternative: Local Whisper

Step 3: Analyze Content

Step 4: Save Output

Transcript Format

{Title}

摘要

正文

要点

标签

File Naming Convention

Troubleshooting

Image Note Handling

Edge Cases

Related Modules

License

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

ontology

self-improving agent

Self-Improve