← 返回
数据分析

TikTok Crawling (yt-dlp)

Use for TikTok crawling, content retrieval, and analysis
用于 TikTok 爬取、内容检索与分析
romneyda
数据分析 clawhub v1.0.0 1 版本 98683.5 Key: 无需
★ 16
Stars
📥 5,227
下载
💾 765
安装
1
版本
#latest

概述

TikTok Scraping with yt-dlp

yt-dlp is a CLI for downloading video/audio from TikTok and many other sites.

Setup

# macOS
brew install yt-dlp ffmpeg

# pip (any platform)
pip install yt-dlp
# Also install ffmpeg separately for merging/post-processing

Download Patterns

Single Video

yt-dlp "https://www.tiktok.com/@handle/video/1234567890"

Entire Profile

yt-dlp "https://www.tiktok.com/@handle" \
  -P "./tiktok/data" \
  -o "%(uploader)s/%(upload_date)s-%(id)s/video.%(ext)s" \
  --write-info-json

Creates:

tiktok/data/
  handle/
    20260220-7331234567890/
      video.mp4
      video.info.json

Multiple Profiles

for handle in handle1 handle2 handle3; do
  yt-dlp "https://www.tiktok.com/@$handle" \
    -P "./tiktok/data" \
    -o "%(uploader)s/%(upload_date)s-%(id)s/video.%(ext)s" \
    --write-info-json \
    --download-archive "./tiktok/downloaded.txt"
done

Search, Hashtags & Sounds

# Search by keyword
yt-dlp "tiktoksearch:cooking recipes" --playlist-end 20

# Hashtag page
yt-dlp "https://www.tiktok.com/tag/booktok" --playlist-end 50

# Videos using a specific sound
yt-dlp "https://www.tiktok.com/music/original-sound-1234567890" --playlist-end 30

Format Selection

# List available formats
yt-dlp -F "https://www.tiktok.com/@handle/video/1234567890"

# Download specific format (e.g., best video without watermark if available)
yt-dlp -f "best" "https://www.tiktok.com/@handle/video/1234567890"

Filtering

By Date

# On or after a date
--dateafter 20260215

# Before a date
--datebefore 20260220

# Exact date
--date 20260215

# Date range
--dateafter 20260210 --datebefore 20260220

# Relative dates (macOS / Linux)
--dateafter "$(date -u -v-7d +%Y%m%d)"           # macOS: last 7 days
--dateafter "$(date -u -d '7 days ago' +%Y%m%d)" # Linux: last 7 days

By Metrics & Content

# 100k+ views
--match-filters "view_count >= 100000"

# Duration between 30-60 seconds
--match-filters "duration >= 30 & duration <= 60"

# Title contains "recipe" (case-insensitive)
--match-filters "title ~= (?i)recipe"

# Combine: 50k+ views from Feb 2026
yt-dlp "https://www.tiktok.com/@handle" \
  --match-filters "view_count >= 50000" \
  --dateafter 20260201

Metadata Only (No Download)

Preview What Would Download

yt-dlp "https://www.tiktok.com/@handle" \
  --simulate \
  --print "%(upload_date)s | %(view_count)s views | %(title)s"

Export to JSON

# Single JSON array
yt-dlp "https://www.tiktok.com/@handle" --simulate --dump-json > handle_videos.json

# JSONL (one object per line, better for large datasets)
yt-dlp "https://www.tiktok.com/@handle" --simulate -j > handle_videos.jsonl

Export to CSV

yt-dlp "https://www.tiktok.com/@handle" \
  --simulate \
  --print-to-file "%(uploader)s,%(id)s,%(upload_date)s,%(view_count)s,%(like_count)s,%(webpage_url)s" \
  "./tiktok/analysis/metadata.csv"

Analyze with jq

# Top 10 videos by views from downloaded .info.json files
jq -s 'sort_by(.view_count) | reverse | .[:10] | .[] | {title, view_count, url: .webpage_url}' \
  tiktok/data/*/*.info.json

# Total views across all videos
jq -s 'map(.view_count) | add' tiktok/data/*/*.info.json

# Videos grouped by upload date
jq -s 'group_by(.upload_date) | map({date: .[0].upload_date, count: length})' \
  tiktok/data/*/*.info.json

> Tip: For deeper analysis and visualization, load JSONL/CSV exports into Python with pandas. Useful for engagement scatter plots, posting frequency charts, or comparing metrics across creators.


Ongoing Scraping

Archive (Skip Already Downloaded)

The --download-archive flag tracks downloaded videos, enabling incremental updates:

yt-dlp "https://www.tiktok.com/@handle" \
  -P "./tiktok/data" \
  -o "%(uploader)s/%(upload_date)s-%(id)s/video.%(ext)s" \
  --write-info-json \
  --download-archive "./tiktok/downloaded.txt"

Run the same command later—it skips videos already in downloaded.txt.

Authentication (Private/Restricted Content)

# Use cookies from browser (recommended)
yt-dlp --cookies-from-browser chrome "https://www.tiktok.com/@handle"

# Or export cookies to a file first
yt-dlp --cookies tiktok_cookies.txt "https://www.tiktok.com/@handle"

Scheduled Scraping (Cron)

# crontab -e
# Run daily at 2 AM, log output
0 2 * * * cd /path/to/project && ./scripts/scrape-tiktok.sh >> ./tiktok/logs/cron.log 2>&1

Example scripts/scrape-tiktok.sh:

#!/bin/bash
set -e

HANDLES="handle1 handle2 handle3"
DATA_DIR="./tiktok/data"
ARCHIVE="./tiktok/downloaded.txt"

for handle in $HANDLES; do
  echo "[$(date)] Scraping @$handle"
  yt-dlp "https://www.tiktok.com/@$handle" \
    -P "$DATA_DIR" \
    -o "%(uploader)s/%(upload_date)s-%(id)s/video.%(ext)s" \
    --write-info-json \
    --download-archive "$ARCHIVE" \
    --cookies-from-browser chrome \
    --dateafter "$(date -u -v-7d +%Y%m%d)" \
    --sleep-interval 2 \
    --max-sleep-interval 5
done
echo "[$(date)] Done"

Troubleshooting

ProblemSolution
-------------------------------------------------------------------------------------------------------------------
Empty results / no videos foundAdd --cookies-from-browser chrome — TikTok rate-limits anonymous requests
403 Forbidden errorsRate limited. Wait 10-15 min, or use cookies/different IP
"Video unavailable"Region-locked. Try --geo-bypass or a VPN
Watermarked videosCheck -F for alternative formats; some may lack watermark
Slow downloadsAdd --concurrent-fragments 4 for faster downloads
Profile shows fewer videos than expectedTikTok API limits. Use --playlist-end N explicitly, try with cookies

Debug Mode

# Verbose output to diagnose issues
yt-dlp -v "https://www.tiktok.com/@handle" 2>&1 | tee debug.log

Reference

Key Options

OptionDescription
------------------------------------------------------------------------
-o TEMPLATEOutput filename template
-P PATHBase download directory
--dateafter DATEVideos on/after date (YYYYMMDD)
--datebefore DATEVideos on/before date
--playlist-end NStop after N videos
--match-filters EXPRFilter by metadata (views, duration, title)
--write-info-jsonSave metadata JSON per video
--download-archive FILETrack downloads, skip duplicates
--simulate / -sDry run, no download
-j / --dump-jsonOutput metadata as JSON
--cookies-from-browser NAMEUse cookies from browser
--sleep-interval SECWait between downloads (avoid rate limits)

Output Template Variables

VariableExample Output
----------------------------------------
%(id)s7331234567890
%(uploader)shandle
%(upload_date)s20260215
%(title).50sFirst 50 chars of title
%(view_count)s1500000
%(like_count)s250000
%(ext)smp4

Full template reference →

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-28 11:33 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Stock Analysis

udiedrichsen
{"answer":"基于雅虎财经数据,分析股票与加密货币。支持投资组合管理、自选股预警、股息分析、8维评分、热门趋势扫描及传闻/早期信号探测。适用于股票分析、持仓追踪、财报异动、加密监控、热门股追踪或提前发掘非主流传闻。"}
★ 269 📥 56,890
data-analysis

Excel / XLSX

ivangdavila
创建、检查和编辑 Microsoft Excel 工作簿及 XLSX 文件,支持可靠的公式、日期、类型、格式、重算及模板保留功能。
★ 366 📥 139,960
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 64,857