Use this skill only for format conversion into Markdown. Do not perform requirement analysis, product scoping, hypothesis extraction, summaries, recommendations, handoff generation, or team coordination.
converted/ folder near the source or project deliverable folder..raw.md for OCR/ASR/LLM transcription outputs.| Input | Default Handling |
|---|---|
| --- | --- |
.md, .txt | Read directly and normalize if needed |
.docx | Convert with MarkItDown |
.pdf | Convert with MarkItDown; use Tencent OCR for scanned pages |
.pptx | Convert slide text with MarkItDown |
.xlsx, .xls | Convert tables with MarkItDown |
.csv | Convert table export with MarkItDown |
.html, .htm | Convert saved web/API documentation with MarkItDown |
.json | Convert structured JSON dump with MarkItDown |
.xml | Convert structured XML/config dump with MarkItDown |
.png, .jpg, .jpeg, .webp, .gif, .bmp | Default Tencent OCR; visual LLM fallback |
.wav, .pcm, .ogg, .speex, .silk, .mp3, .m4a, .aac, .amr | Default Tencent ASR |
.mp3, .wav, .m4a, .flac, .ogg | LLM audio fallback when Tencent ASR is unsuitable |
.zip | Batch route contained files and write manifest.md |
Unsupported by default: video, YouTube URLs, EPUB, and arbitrary binary dumps. Ask for a common-format export when needed.
API credentials, provider options, and local runtime paths are configured in providers.json at the skill root. Provider value precedence is:
providers.json > environment variable > script default
Read references/providers.md only when you need the full JSON template, environment fallback table, or provider fields. providers.json may contain real local secrets; never paste its values into chat, converted Markdown, logs, or user-visible output.
| Source | Default | Fallback |
|---|---|---|
| --- | --- | --- |
| Office/text documents | MarkItDown | Ask for cleaner .md, .pdf, .docx, or .csv export |
| CSV/HTML/JSON/XML structured exports | MarkItDown | Ask for cleaner .csv, .md, or source-system export |
| Image text OCR | scripts/tencent_ocr_to_markdown.py | scripts/vision_to_markdown.py |
| Visual layout/context transcription | scripts/vision_to_markdown.py | Ask for text/PDF/DOCX export |
| Audio transcript | scripts/tencent_asr_to_markdown.py | scripts/llm_audio_to_markdown.py |
| ZIP material package | scripts/source_to_markdown.py batch router | Convert files one by one and record failures |
Prefer the unified router for normal use. It routes by file extension, writes single-file outputs, expands ZIP packages safely, and writes a batch manifest.md.
Single file:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"input.docx" `
"converted/input.md"
Structured exports supported through MarkItDown:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"api-response.json" `
"converted/api-response.md"
python skills/source-to-markdown/scripts/source_to_markdown.py `
"export.csv" `
"converted/export.md"
python skills/source-to-markdown/scripts/source_to_markdown.py `
"saved-page.html" `
"converted/saved-page.md"
python skills/source-to-markdown/scripts/source_to_markdown.py `
"device-config.xml" `
"converted/device-config.md"
Image route selection:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"input.png" `
"converted/input.raw.md" `
--image-route ocr
python skills/source-to-markdown/scripts/source_to_markdown.py `
"input.png" `
"converted/input.raw.md" `
--image-route vision
Audio route selection:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"meeting.m4a" `
"converted/meeting.raw.md" `
--audio-route asr
python skills/source-to-markdown/scripts/source_to_markdown.py `
"meeting.mp3" `
"converted/meeting.raw.md" `
--audio-route llm `
--request-timeout 600
ZIP material package:
python skills/source-to-markdown/scripts/source_to_markdown.py `
"materials.zip" `
"converted/materials"
ZIP output layout:
converted/materials/
├── manifest.md
├── source-a.md
├── table-export.md
├── screenshot.raw.md
└── nested/path/spec.md
manifest.md records every contained file, route, output path, status, and failure/skipped reason. Do not treat failed or skipped files as converted evidence.
Use MarkItDown for normal document sources:
markitdown "input.docx" -o "converted/input.md"
markitdown "input.pdf" -o "converted/input.md"
markitdown "input.pptx" -o "converted/input.md"
markitdown "input.xlsx" -o "converted/input.md"
markitdown "input.csv" -o "converted/input.md"
markitdown "input.html" -o "converted/input.md"
markitdown "input.json" -o "converted/input.md"
markitdown "input.xml" -o "converted/input.md"
For .md or .txt, read directly and preserve the original text unless normalization is explicitly requested.
For scanned PDFs, use Tencent OCR page by page when MarkItDown cannot extract text:
python skills/source-to-markdown/scripts/tencent_ocr_to_markdown.py `
"scanned.pdf" `
"converted/scanned-page-1.raw.md" `
--pdf-page-number 1
On Windows, set UTF-8 output if Chinese text prints incorrectly:
$env:PYTHONIOENCODING='utf-8'
Use Tencent OCR first for images, screenshots, scanned notes, and scanned PDF pages:
python skills/source-to-markdown/scripts/tencent_ocr_to_markdown.py `
"input.png" `
"converted/input.raw.md" `
--request-timeout 300
For large images with small text, add:
--enable-detect-split
Use the visual LLM route only when deterministic OCR is insufficient and the image needs visual layout/context transcription:
python skills/source-to-markdown/scripts/vision_to_markdown.py `
"input.png" `
"converted/input.raw.md" `
--request-timeout 300
Image output must be raw visible text or faithful visual transcription. Do not add interpretation, analysis, or conclusions.
Use Tencent ASR first for recordings:
python skills/source-to-markdown/scripts/tencent_asr_to_markdown.py `
"meeting.m4a" `
"converted/meeting.raw.md" `
--request-timeout 300
Tencent ASR uses tencent_asr in providers.json, supports common recording formats, and outputs only recognized transcript text from flash_result.
Use the LLM audio route only when Tencent ASR is unavailable, unsuitable, or explicitly requested:
python skills/source-to-markdown/scripts/llm_audio_to_markdown.py `
"meeting.mp3" `
"converted/meeting.raw.md" `
--request-timeout 300
For long recordings, increase timeout:
--request-timeout 600
For known-good audio on the LLM fallback route, bypass normalization only when needed:
--normalize-audio never
Audio output must be raw transcript text only. Do not add generated headings, summaries, action items, analysis, “识别不确定处”, or invented “无” sections.
.md for document conversions..md for MarkItDown-routed structured exports such as CSV, HTML, JSON, and XML..raw.md for OCR, ASR, and LLM transcription outputs.manifest.md and one output file per converted source file.[无法识别] or [不确定: ...].[听不清] or [不确定: ...].If conversion fails:
.docx -> ask for .pdf or .md.pdf -> ask for text PDF or Word source if scanned/poorly extracted.pptx -> ask for speaker notes or exported .pdf.xlsx -> ask for .csv only if spreadsheet parsing fails共 1 个版本