← 返回
效率工具 中文

Pandoc

Convert documents between formats using pandoc. Supports HTML, Markdown, DOCX, PDF, EPUB, LaTeX, ODT, RST, Org, MediaWiki, JIRA, CSV, Jupyter notebooks, and...
使用 pandoc 在多种格式间转换文档,支持 HTML、Markdown、DOCX、PDF、EPUB、LaTeX 等格式。
oliver-hrkltz
效率工具 clawhub v1.0.2 1 版本 99216.2 Key: 无需
★ 3
Stars
📥 3,611
下载
💾 810
安装
1
版本
#latest

概述

Pandoc Document Converter

Convert documents between any formats pandoc supports, with full control over styling, templates,

table of contents, metadata, and PDF engine selection.

Quick Start

For most conversions, use the helper script at scripts/convert.sh:

bash <skill-dir>/scripts/convert.sh <input-file> <output-file> [options...]

The script auto-detects formats from file extensions and applies sensible defaults (standalone

output, appropriate PDF engine, default LaTeX margins for LaTeX-based PDF engines). It also checks

that pandoc, the input file, the output directory, and any requested PDF engine are available.

Any extra arguments are passed through to pandoc.

How Conversions Work

Pandoc reads a source format into an internal AST, then writes it out in the target format. This

means you can go from nearly any supported input to any supported output. The key decision points are:

  1. Input format — usually auto-detected from the file extension
  2. Output format — auto-detected from the output file extension
  3. PDF engine — for PDF output, choose between xelatex (best Unicode/font support),

lualatex (strong Unicode/fonts), tectonic (self-contained TeX), pdflatex (fastest,

good for ASCII-heavy docs), or HTML/CSS engines like weasyprint, wkhtmltopdf, or prince

  1. Styling — CSS for HTML-based outputs, LaTeX templates for PDF, reference docs for DOCX/ODT

Common Conversion Patterns

HTML → PDF

pandoc input.html -o output.pdf --pdf-engine=weasyprint -s

If the HTML uses external CSS, include it:

pandoc input.html -o output.pdf --pdf-engine=weasyprint -s --css=style.css

Markdown → PDF

pandoc input.md -o output.pdf --pdf-engine=xelatex -s --toc --toc-depth=3

Markdown → DOCX

pandoc input.md -o output.docx -s

To use a reference (template) document for styling:

pandoc input.md -o output.docx --reference-doc=template.docx

Markdown → HTML

pandoc input.md -o output.html -s --css=style.css --toc

DOCX → Markdown

pandoc input.docx -o output.md --extract-media=./media

Markdown → EPUB

pandoc input.md -o output.epub -s --toc --epub-cover-image=cover.jpg

LaTeX → PDF

pandoc input.tex -o output.pdf --pdf-engine=xelatex

CSV → HTML table

pandoc input.csv -o output.html -s

Styling and Appearance

CSS for HTML-based outputs

Create or use a CSS file and pass it with --css=path/to/style.css. For PDF output via

weasyprint, wkhtmltopdf, or prince, CSS is respected directly. For PDF via LaTeX engines,

CSS is usually ignored — use LaTeX variables or templates instead.

A sensible default stylesheet is provided at assets/default.css. Use it when the user wants

a clean, readable output without specifying their own styles:

pandoc input.md -o output.html -s --css=<skill-dir>/assets/default.css

LaTeX variables for PDF styling

Control margins, fonts, and paper size without a full template:

pandoc input.md -o output.pdf --pdf-engine=xelatex \
  -V geometry:margin=1in \
  -V fontsize=12pt \
  -V mainfont="DejaVu Serif" \
  -V documentclass=article

Reference documents for DOCX/ODT

To match a corporate style, provide a reference document:

pandoc input.md -o output.docx --reference-doc=brand-template.docx

Advanced Features

Table of Contents

Add --toc and optionally --toc-depth=N (default 3):

pandoc input.md -o output.pdf --pdf-engine=xelatex -s --toc --toc-depth=2

Metadata

Set title, author, date via YAML frontmatter in the source file or via -M:

pandoc input.md -o output.pdf --pdf-engine=xelatex -s \
  -M title="My Report" -M author="Jane Doe" -M date="2026-03-15"

Filters and Lua filters

Pandoc supports filters that transform the AST. Lua filters are self-contained:

pandoc input.md -o output.pdf --lua-filter=my-filter.lua

Multiple input files

Pandoc concatenates multiple inputs:

pandoc chapter1.md chapter2.md chapter3.md -o book.pdf --pdf-engine=xelatex -s --toc

Extracting media from DOCX/EPUB

pandoc input.docx -o output.md --extract-media=./media

Troubleshooting

ProblemLikely causeFix
---------
PDF has missing charactersFont doesn't support the glyphsUse --pdf-engine=xelatex with -V mainfont="DejaVu Serif"
PDF conversion failsNo compatible PDF engine installedCheck which xelatex lualatex tectonic pdflatex weasyprint wkhtmltopdf prince and install one that matches your output needs
DOCX looks unstyledNo reference docCreate a styled DOCX template and pass --reference-doc
HTML images missingRelative paths brokenUse --self-contained to embed images as base64
CSS has no effect on PDFLaTeX PDF engine selectedUse --pdf-engine=weasyprint, --pdf-engine=wkhtmltopdf, or --pdf-engine=prince
Table of contents emptyNo headings in sourceEnsure source uses # headings (Markdown) or

(HTML)

Format Reference

For a full list of supported input and output formats, see references/formats.md.

Choosing the Right Approach

When a user asks to convert a document, think about:

  1. What's the source format? Check the file extension or ask. If it's ambiguous (e.g., a .txt

that's actually Markdown), specify -f markdown explicitly.

  1. What's the target format? Map the user's intent to a file extension.
  2. Does it need styling? If the user wants it to "look nice" or "be professional," add CSS

(for HTML) or LaTeX variables (for PDF) or a reference doc (for DOCX).

  1. Does it need structure? TOC, numbered sections, metadata — add these when the document is

long or formal.

  1. Are there images or media? Use --self-contained for HTML, --extract-media when

converting from DOCX/EPUB to text formats.

Always use the helper script scripts/convert.sh as the starting point — it handles the most

common gotchas automatically, picks a reasonable PDF engine, and prints recovery hints when PDF

conversion fails. Add extra pandoc flags as needed for the specific use case.

版本历史

共 1 个版本

  • v1.0.2 当前
    2026-03-29 09:41 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Obsidian

steipete
操作 Obsidian 仓库(纯 Markdown 笔记)并通过 obsidian-cli 自动化。
★ 429 📥 103,687
productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 437 📥 147,168
productivity

Nano Pdf

steipete
使用nano-pdf CLI通过自然语言指令编辑PDF
★ 274 📥 114,720