← 返回
效率工具 中文

Document Workflow

一键实现学术论文的搜索、下载、分块提取文本及结构化总结,支持按年份和引用数筛选。
一键搜索、下载学术论文,分块提取文本并结构化总结,支持年份和引用数筛选。
yjr-123456
效率工具 clawhub v1.3.1 2 版本 100000 Key: 无需
★ 1
Stars
📥 1,408
下载
💾 311
安装
2
版本
#arxiv#latest#paper#pdf#research

概述

Document Workflow

Academic paper research: Search → Download LaTeX → Read & Summarize


Quick Start

1. Search Papers

python -m skills.document-workflow.scripts.search_papers --query "world model" --max_results 5 --year_from 2024

2. Download LaTeX Source

python -m skills.document-workflow.scripts.latex_reader "2301.07088" --keep

3. Read & Summarize

Read the LaTeX source files and summarize following the reading guide below.

Reading Guide

After downloading LaTeX source to arxiv_{id}/, read the .tex files in this order:

Step 1: Get Metadata

Read the main .tex file (usually main.tex, root.tex, or {paper-id}.tex) for:

  • \title{} - Paper title
  • \author{} - Authors
  • \begin{abstract}...\end{abstract} - Abstract

Step 2: Understand the Problem

Read the Introduction section (usually intro.tex, 1-introduction.tex, or first \section):

  • What problem does this paper solve?
  • What are the key contributions?
  • How does it relate to prior work?

Step 3: Understand the Method

Read the Method/Approach section:

  • What is the proposed approach?
  • Key equations in \begin{equation}...\end{equation} or \begin{align}...\end{align}
  • Algorithm pseudocode in \begin{algorithm}...\end{algorithm}

Step 4: Check Experiments

Read the Experiments section:

  • Datasets used
  • Baselines compared
  • Metrics in \begin{table}...\end{table} with results
  • Key findings

Step 5: Check References

Read the .bib or .bbl file for:

  • Related work citations
  • Key papers in the field

Output Schema

Summarize the paper in this JSON format(see more details in ./references/output_schema.json):

{
  "paper_title": "Full title",
  "authors": ["Author 1", "Author 2"],
  "source": "arXiv:XXXX.XXXXX",
  "task_definition": {
    "domain": "Research domain",
    "task": "Specific task",
    "problem_statement": "What problem this paper solves",
    "key_contributions": ["Contribution 1", "Contribution 2"]
  },
  "experiments": {
    "datasets": ["Dataset 1", "Dataset 2"],
    "baselines": ["Baseline 1", "Baseline 2"],
    "metrics": [
      {"name": "Metric name", "description": "What it measures","definition":"Mathematical definition or formula for the metric"}
    ],
    "results": [
      {"setting": "Dataset", "metric": "Metric", "proposed_method": "Score", "best_baseline": "Score"}
    ],
    "key_findings": ["Finding 1", "Finding 2"]
  }
}

Scripts

ScriptFunction
------------------
search_papers.pySearch papers (Tavily + Semantic Scholar)
download_paper.pyDownload PDF (for human reading)
latex_reader.pyDownload LaTeX source (for AI reading)

Tips for Reading LaTeX

LaTeX CommandMeaning
------------------------
\section{Title}Section heading
\subsection{Title}Subsection heading
\textbf{text}Bold text (often important)
\cite{key}Citation reference
\begin{equation}...\end{equation}Numbered equation
\begin{table}...\end{table}Table
\begin{figure}...\end{figure}Figure
\input{file} or \subfile{file}Include another .tex file

Config

# Optional: Semantic Scholar API key
export SEMANTIC_SCHOLAR_API_KEY="your-key"

# Default download path
C:\Users\Lenovo\Desktop\papers

版本历史

共 2 个版本

  • v1.3.1 当前
    2026-05-01 02:36 安全 安全
  • v1.0.3
    2026-03-30 05:59 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

Obsidian

steipete
操作 Obsidian 仓库(纯 Markdown 笔记)并通过 obsidian-cli 自动化。
★ 429 📥 103,688
productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 437 📥 147,171
productivity

Nano Pdf

steipete
使用nano-pdf CLI通过自然语言指令编辑PDF
★ 274 📥 114,721