← 返回
效率工具 中文

Arxiv Paper Processor

Tool for manual per-paper ArXiv paper processing: batch/source/pdf download then model-driven full-text reading and summary.md writing in chosen language.
用于手动逐篇处理ArXiv论文的工具:批量下载源代码/PDF,然后进行模型驱动的全文阅读并用选定语言撰写summary.md。
xukp20
效率工具 clawhub v0.1.1 1 版本 99575.7 Key: 无需
★ 1
Stars
📥 2,796
下载
💾 104
安装
1
版本
#latest

概述

ArXiv Paper Processor

Use this skill for per-paper manual summarization, with optional batch artifact download.

  • Single-paper mode: process one paper directory (e.g. //).
  • Batch predownload mode: process many paper directories under one run dir before writing summaries.

Language Parameter

  • Use a workflow language parameter (for example English or Chinese) and apply it manually.
  • The per-paper summary.md must be written in the selected language.
  • If download scripts are called directly, pass --language for traceability.

Core Principle

Scripts only fetch artifacts. The model performs reading and writing.

Non-negotiable Constraint

  • Do not generate summary.md by script-based snippet extraction, regex harvesting, or template autofill.
  • Do not use Python/shell scripts to auto-compose section text from abstract/introduction fragments.
  • Scripts in this skill are only for artifact download (source/pdf) and trace logs.
  • The final summary.md must come from model-side reading and synthesis of the paper content.

Optional Batch Artifact Download (Many Papers)

Use this first when Stage B has many papers:

python3 scripts/download_papers_batch.py \
  --run-dir /path/to/run \
  --artifact source_then_pdf \
  --max-workers 3 \
  --min-interval-sec 5 \
  --language English

Key behavior:

  • Supports --artifact source, --artifact pdf, or --artifact source_then_pdf (default).
  • Supports concurrency (--max-workers) and safe throttling/retry (--min-interval-sec, retry args).
  • Uses run-local throttle state by default (/.runtime/arxiv_download_state.json) to reduce 429 risk.
  • Skips papers that already have usable source/source_extract/*.tex or existing source/paper.pdf (unless --force).
  • Resume-friendly: if a paper already has a completed summary.md, you can skip that paper's summary-writing step.
  • Writes batch log to /download_batch_log.json by default.

Step 1: Download Source (Preferred)

python3 scripts/download_arxiv_source.py \
  --paper-dir /path/to/run/2602.00528 \
  --language English

This writes:

  • source/source_bundle.bin
  • source/source_extract/
  • source/download_source_log.json

If usable source already exists and --force is not set, the script reuses local artifacts.

Step 2: If Needed, Download PDF

python3 scripts/download_arxiv_pdf.py \
  --paper-dir /path/to/run/2602.00528 \
  --language English

This writes:

  • source/paper.pdf
  • source/download_pdf_log.json

If PDF already exists and --force is not set, the script reuses local artifacts.

Step 3: Model Reads and Summarizes

  1. If summary.md already exists and follows the required format, skip this paper and mark it complete.
  2. Read metadata.md first.
  3. If source/source_extract/ already exists with readable .tex files, use it directly.
  4. Otherwise, if source/paper.pdf already exists, use PDF directly.
  5. If neither exists, run download scripts (single-paper scripts or batch script) first.
  6. Manually write summary.md in the same paper directory, in the selected language.

Do not rely on rule-based auto summarization.

Do not rely on auto-extracted snippets as the primary writing basis.

Quality Requirement

  • Every section should include paper-specific details that are traceable to full-text reading.
  • Section 4/5/10 should reflect concrete method and evaluation details, not generic wording.
  • If key details are unclear in the source, explicitly note uncertainty instead of guessing.
  • Match the detail level shown in references/summary-example-en.md and references/summary-example-zh.md.
  • If your draft is clearly shorter or less specific than the examples, expand it before finishing.

Required Output

  • /summary.md in fixed section format.
  • Pay special attention to section ## 10. Brief Conclusion: write a 3-4 sentence mini-conclusion that covers contribution, method, evaluation setup, and results with paper-specific details.
  • In section ## 1. Paper Snapshot, use exact keys: ArXiv ID, Title, Authors, Publish date, Primary category, Reading basis.
  • Do not use key variants such as Reading source, Author list, Published on, or lowercase key names.

See references/summary-format.md for exact section requirements.

Related Skills

This skill is a sub-skill of arxiv-summarizer-orchestrator.

Pipeline position:

  1. Step 1 (upstream): arxiv-search-collector produces the selected paper directories and metadata.
  2. Step 2 (this skill): arxiv-paper-processor downloads artifacts and writes one summary.md per paper.
  3. Step 3 (downstream): arxiv-batch-reporter uses these per-paper summaries to generate the final collection report.

Use this skill together with Step 1 and Step 3 for full end-to-end execution.

版本历史

共 1 个版本

  • v0.1.1 当前
    2026-03-28 23:20 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 438 📥 147,187
data-analysis

Arxiv Batch Reporter

xukp20
生成批量 arXiv 报告,模型模板结合每篇论文 summary.md 中的简要结论和摘要链接。
★ 0 📥 1,388
productivity

Weather

steipete
获取当前天气和预报(无需API密钥)
★ 444 📥 226,110