← 返回
效率工具

Baoyu Url To Markdown

Fetch any URL and convert to markdown using baoyu-fetch CLI (Chrome CDP with site-specific adapters). Built-in adapters for X/Twitter, YouTube transcripts, H...
使用 baoyu-fetch CLI 获取任意 URL 并转换为 Markdown(Chrome CDP 与站点专属适配器)。内置适配器支持 X/Twitter、YouTube 字幕、HTML 等。
jimliu
效率工具 clawhub v1.117.2 3 版本 100000 Key: 无需
★ 0
Stars
📥 3,153
下载
💾 61
安装
3
版本
#latest

概述

URL to Markdown

Fetches any URL via baoyu-fetch CLI (Chrome CDP + site-specific adapters) and converts it to clean markdown.

User Input Tools

When this skill prompts the user, follow this tool-selection rule (priority order):

  1. Prefer built-in user-input tools exposed by the current agent runtime — e.g., AskUserQuestion, request_user_input, clarify, ask_user, or any equivalent.
  2. Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
  3. Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.

Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.

CLI Setup

Important: The CLI source is vendored in {baseDir}/scripts/lib. scripts/package.json installs only third-party runtime dependencies.

Agent Execution Instructions:

  1. Determine this SKILL.md file's directory path as {baseDir}
  2. Resolve ${BUN} runtime: if bun installed → bun; else suggest installing Bun
  3. If {baseDir}/scripts/node_modules does not exist, run ${BUN} install --cwd {baseDir}/scripts
  4. ${READER} = {baseDir}/scripts/baoyu-fetch
  5. Replace all ${READER} in this document with the resolved value

Preferences (EXTEND.md)

Check EXTEND.md in priority order — the first one found wins:

PriorityPathScope
-----------------------
1.baoyu-skills/baoyu-url-to-markdown/EXTEND.mdProject
2${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.mdXDG
3$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.mdUser home
ResultAction
----------------
FoundRead, parse, apply settings
Not foundMUST run first-time setup (see below) — do NOT silently create defaults

EXTEND.md supports: download media by default, default output directory.

First-Time Setup ⛔ BLOCKING

When EXTEND.md is not found, you MUST use AskUserQuestion to gather preferences before creating EXTEND.md. NEVER create EXTEND.md with silent defaults. Generation is BLOCKED until setup completes. Batch all three questions into a single call:

  • Q1 — Media (header "Media"): "How to handle images and videos in pages?"
  • "Ask each time (Recommended)" — Prompt after each save
  • "Always download" — Download to local imgs/ and videos/
  • "Never download" — Keep remote URLs
  • Q2 — Output (header "Output"): "Default output directory?"
  • "url-to-markdown (Recommended)" — Save to ./url-to-markdown/{domain}/{slug}.md
  • User may pick "Other" and type a custom path
  • Q3 — Save (header "Save"): "Where to save preferences?"
  • "User (Recommended)" — ~/.baoyu-skills/ (all projects)
  • "Project" — .baoyu-skills/ (this project only)

After answers, write EXTEND.md, confirm "Preferences saved to [path]", then continue.

Full template: references/config/first-time-setup.md.

Supported Keys

KeyDefaultValuesDescription
-----------------------------------
download_mediaaskask / 1 / 0ask = prompt each time, 1 = always, 0 = never
default_output_diremptypath or emptyDefault output directory (empty = ./url-to-markdown/)

EXTEND.md → CLI mapping:

EXTEND.md keyCLI argumentNotes
-----------------------------------
download_media: 1--download-mediaRequires --output to be set
default_output_dir: ./posts/Agent constructs --output ./posts/{domain}/{slug}.mdAgent generates path, not a direct flag

Value priority: CLI arguments → EXTEND.md → skill defaults.

Usage

# Default: headless capture, markdown to stdout
${READER} <url>

# Save to file
${READER} <url> --output article.md

# Save with media download
${READER} <url> --output article.md --download-media

# Wait for interaction (login/CAPTCHA) — auto-detect and continue
${READER} <url> --wait-for interaction --output article.md

# Wait for interaction — manual control (Enter to continue)
${READER} <url> --wait-for force --output article.md

# JSON output
${READER} <url> --format json --output article.json

# Force specific adapter
${READER} <url> --adapter youtube --output transcript.md

Options

OptionDescription
---------------------
URL to fetch
--output Output file path (default: stdout)
--format Output format: markdown (default) or json
--jsonShorthand for --format json
--adapter Force adapter: x, youtube, hn, or generic (default: auto-detect)
--headlessForce headless Chrome (no visible window)
--wait-for Interaction wait mode: none (default), interaction, or force
--wait-for-interactionAlias for --wait-for interaction
--wait-for-loginAlias for --wait-for interaction
--timeout Page load timeout (default: 30000)
--interaction-timeout Login/CAPTCHA wait timeout (default: 600000 = 10 min)
--interaction-poll-interval Poll interval for interaction checks (default: 1500)
--download-mediaDownload images/videos to local imgs/ and videos/, rewrite markdown links. Requires --output
--media-dir Base directory for downloaded media (default: same as --output directory)
--cdp-url Reuse existing Chrome DevTools Protocol endpoint
--browser-path Custom Chrome/Chromium binary path
--chrome-profile-dir Chrome user data directory (default: BAOYU_CHROME_PROFILE_DIR env or ./baoyu-skills/chrome-profile)
--debug-dir Write debug artifacts (document.json, markdown.md, page.html, network.json)

Agent Quality Gate

CRITICAL: treat default headless capture as provisional. Some sites render differently in headless mode and can silently return low-quality content without failing the CLI.

After every headless run, inspect the saved markdown. See references/quality-gate.md for the full checklist, recovery workflow, and capture-mode table. Read it whenever a run looks suspicious or the user asks about login/CAPTCHA handling.

Output Path Generation

The agent must construct the output file path — baoyu-fetch does not auto-generate paths.

Algorithm:

  1. Determine base directory from EXTEND.md default_output_dir or default ./url-to-markdown/
  2. Extract domain from URL (e.g., example.com)
  3. Generate slug from URL path or page title (kebab-case, 2-6 words)
  4. Construct: {base_dir}/{domain}/{slug}/{slug}.md — each URL gets its own directory so media files stay isolated
  5. Conflict resolution: append timestamp {slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.md

Pass the constructed path to --output. Media files (--download-media) are saved into subdirectories next to the markdown file, keeping each URL's assets self-contained.

Adapters & Media

See references/adapters.md for the adapter catalog (X, YouTube, Hacker News, generic), per-adapter notes, the media download flow (ask / always / never), and the JSON output schema. Read it before answering adapter-specific questions or handling media prompts.

Environment Variables

VariableDescription
-----------------------
BAOYU_CHROME_PROFILE_DIRChrome user data directory (can also use --chrome-profile-dir)

Troubleshooting: Chrome not found → use --browser-path. Timeout → increase --timeout. Login/CAPTCHA → --wait-for interaction. Debug → --debug-dir to inspect captured HTML and network logs.

Extension Support

Custom configurations via EXTEND.md. See Preferences section above for paths and supported keys.

版本历史

共 3 个版本

  • v1.117.2 当前
    2026-05-19 10:19 安全 安全
  • v1.109.0
    2026-04-30 10:35 安全 安全
  • v1.63.0
    2026-03-18 17:28

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Nano Pdf

steipete
使用nano-pdf CLI通过自然语言指令编辑PDF
★ 275 📥 114,752
productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 438 📥 147,265
productivity

Weather

steipete
获取当前天气和预报(无需API密钥)
★ 445 📥 226,137