Upload a data file → get interactive charts and analysis reports, automatically.
3 步生成图表:
示例:
用户: 帮我分析这份销售数据 [上传 sales_2024.csv]
AI: 已加载 sales_2024.csv(120 行 × 8 列)
关键字段: date, region, product, revenue, profit, quantity
推荐分析方向:
1. 各区域营收对比 → 柱状图
2. 月度营收趋势 → 折线图
3. 产品利润占比 → 饼图
确认后开始生成?
用户: 确认
AI: [生成交互式图表] [生成分析报告]
Load this skill when any of the following is met:
/ 用户提到:「分析数据」「生成图表」「数据可视化」
| Category | Details |
|---|---|
| ---------- | --------- |
| File formats | CSV (.csv/.tsv), Excel (.xlsx/.xls), JSON (.json), Plain text (.txt) |
| Chart types | 16 types: line, bar, pie, scatter, area, radar, heatmap, treemap, graph, boxplot, waterfall, gauge, sankey, funnel, sunburst, wordcloud |
| Multi-file | Up to ~10 files; auto-merge when schemas match |
| Data size | Single file ≤ 100 MB; recommended ≤ 50 MB for smooth rendering |
| Encoding | Auto-detects UTF-8, GBK, GB2312 |
| Report templates | Markdown, Word (.docx), PDF, Plain text |
| Limitation | Details | Workaround |
|---|---|---|
| ------------ | --------- | ------------ |
| No database support | Cannot connect to MySQL/PostgreSQL etc. | Export to CSV first, then upload |
| No real-time data | Cannot fetch live APIs or streaming data | Prepare a static data file |
| No geo maps | No map-based visualization (choropleth, etc.) | Use external tools for geo data |
| Large datasets | >100 MB files are rejected; >50 MB may render slowly | Filter/split data before uploading |
| Complex joins | Auto-merge requires ≥50% column overlap | Specify join keys manually |
| Nested JSON | Only 1-level nested objects supported | Flatten nested structures before upload |
| Non-tabular data | Images, audio, video not supported | Convert to tabular format first |
| Transform code | LLM-generated code is sandboxed; no file I/O, imports, or network access | Pre-process data externally if needed |
# 标准安装(含哈希校验,确保包完整性)
pip install -r requirements.txt --require-hashes
# 如需更新依赖版本,先重新生成哈希
python {skill_base}/core/generate_hashes.py
| Package | Version | Required | Description |
|---|---|---|---|
| --------- | --------- | ---------- | ------------- |
pandas | ==3.0.1 | Yes | Data parsing (CSV, Excel, JSON) |
numpy | ==2.4.3 | Yes | Numerical computations |
openpyxl | ==3.1.5 | Yes | Excel file engine |
PyPDF2 | ==3.0.1 | Optional | PDF template extraction |
python-docx | ==1.1.2 | Optional | Word template processing |
> All versions are pinned with == and verified with SHA256 hashes to prevent supply-chain attacks.
> ECharts is loaded via CDN (jsdelivr) — no local installation required.
LLM-generated transform code is executed with multiple safety layers:
| Layer | What it does | Why it matters |
|---|---|---|
| ------- | ------------- | ---------------- |
| Keyword blacklist | Scans for dangerous keywords (exec, eval, open, import, os.system, etc.) before execution | Prevents file I/O, network access, and system commands that could harm your machine |
| AST whitelist | Parses code into an Abstract Syntax Tree; only allows safe node types (assignments, calls, loops, comprehensions) | Blocks code that tries to define classes, import modules, or use advanced Python features that aren't needed for data transformation |
| User confirmation | Shows code preview and asks for your approval before executing | In interactive mode, no code runs without your consent; in programmatic mode (auto_confirm=True, e.g. batch generation), confirmation is skipped but blacklist and AST checks still apply |
| Sandbox builtins | Only safe built-in functions (len, range, sorted, etc.) are available; open/exec/eval/__import__ are removed | Even if blacklist/AST checks are bypassed, the sandbox prevents access to dangerous functions |
What happens when code is blocked:
CodeValidationError: 代码包含危险关键字,已阻止执行: open(
Reason: 这些关键字可能用于文件操作,在数据转换场景中不需要。
如确需使用,请检查数据是否需要预处理。
The error message explains why the code was blocked and how to resolve it.
Prompt the user:
> Please upload the data file(s) you want to analyze. Supported formats:
> - CSV (.csv / .tsv / .txt)
> - Excel (.xlsx / .xls)
> - JSON (.json)
>
> You can drag files directly into the chat box. Multiple files are supported.
Step 1 — Parse and display a unified summary:
> Files loaded: 3
>
> | File | Rows | Cols | Key Fields |
> |------|------|------|------------|
> | east_sales.csv | 120 | 8 | date, revenue, profit… |
> | south_sales.csv | 98 | 8 | date, revenue, profit… |
> | products.xlsx | 45 | 5 | name, category, price… |
Step 2 — Infer file relationships and recommend an analysis strategy:
| Situation | Recommendation |
|---|---|
| ----------- | ---------------- |
| Same schema across files | Merge and compare |
| Shared common column(s) | Join on the common key |
| Unrelated schemas | Analyze each file separately |
| Single file | Analyze directly |
Step 3 — Execute after user confirmation.
Each error message explains why it happened and what to do about it:
| Error | User message | Why it happens | How to fix |
|---|---|---|---|
| ------- | ------------- | ---------------- | ------------ |
| File not found | "File not found. Please verify the path or drag the file into the chat." | The file path doesn't point to an existing file. | Check for typos in the path, or drag the file directly into the chat. |
| Unsupported format | "Unsupported file format (.abc). Only CSV, Excel, and JSON are supported. Please convert your file and retry." | The file extension is not in the supported list. | Open the file in its original application and "Save As" CSV or Excel. |
| File > 100 MB | "File too large (150 MB). The limit is 100 MB because large files cause slow parsing and rendering. Try filtering rows or splitting into smaller files." | Large files exceed memory/time limits for in-browser rendering. | Filter to relevant rows/columns, or split by date/category before uploading. |
| Empty file | "The file appears to be empty (0 rows). This usually means the file has headers but no data rows. Please check that it contains valid data." | The file was parsed successfully but yielded 0 data rows. | Open the file and verify it has data rows below the header. |
| Encoding error | "Encoding issue detected (not UTF-8/GBK/GB2312). Try re-saving the file as CSV with UTF-8 encoding: in Excel, use Save As → CSV UTF-8." | The file uses an encoding that the parser cannot auto-detect. | Re-save the file with UTF-8 encoding. Most spreadsheet apps have a "CSV UTF-8" export option. |
| Cannot auto-merge | "Files have different column structures and cannot be auto-merged. For example, file A has [date, revenue] but file B has [name, score]. You can: (1) analyze them separately, or (2) specify a common column to join on." | The files share less than 50% of their columns, making automatic joining unreliable. | Either analyze each file separately, or tell us which column to use as the join key. |
| Code blocked | "Transform code was blocked for security: contains 'open('. This keyword is used for file operations, which aren't needed for data transformation. If your data needs pre-processing that requires file access, please do that step before uploading." | The LLM-generated code contains a dangerous keyword or unsupported syntax. | Pre-process the data externally, or simplify the transform to use only pandas operations. |
1. Obtain data file(s)
└─ User uploads file(s) directly (primary method)
└─ Or user provides file path(s)
2. Parse data
└─ Call data_parser.py on all files
└─ Single file → parse directly
└─ Multiple files → parse each, assess merge feasibility
3. Confirm & recommend
└─ Display a summary table for all files
└─ Recommend: merge / separate / join
└─ Recommend chart type(s) based on data characteristics
3.5 Data transform (when needed)
└─ Compare original data structure with the target chart's input format
└─ If they match → skip transform, proceed to Step 4
└─ If they don't match → LLM generates pandas transform code
└─ Security check: keyword blacklist + AST whitelist validation
└─ User confirmation: show code preview, wait for approval (skipped when auto_confirm=True in programmatic calls)
└─ Execute in sandbox → producing a standardized DataFrame
└─ If transform fails → feed error back to LLM for retry (max 2 attempts)
└─ If retry still fails → fall back to original data + _prepare_axes auto-detection
4. Generate charts
└─ Call chart_generator.py → produces ECharts HTML
└─ Merged data → cross-group comparison charts
└─ Separate data → independent charts per file
└─ Chart type is chosen by the LLM based on data shape
5. Check for a report template
└─ Scan the templates/ subdirectory under the skill base
└─ Read each meta.json; let the LLM judge relevance
└─ No matching template → skip to free-form generation
6. Generate analysis report
└─ Matching template found → fill template.md with data insights
└─ No matching template → LLM generates report freely
7. Present results
└─ Interactive charts: use preview_url (HTML)
└─ Markdown report: use open_result_view
output_dir: output directory (optional; default: ./smart_charts_output)
templates_dir: report template directory (optional; default: ./templates)
> Important: Never hard-code absolute paths. All paths must be provided by the user or resolved dynamically from the working directory.
> Note: {skill_base} refers to the root directory of this skill (the directory containing SKILL.md). Replace it with the actual path when running commands manually.
# Single file / 单文件
python {skill_base}/core/data_parser.py <file_path> [--summary]
# Multiple files / 多文件
python {skill_base}/core/data_parser.py <file1> <file2> ... [--summary]
# Multiple files with auto-merge / 多文件自动合并
python {skill_base}/core/data_parser.py <file1> <file2> ... [--merge] [--summary]
Merge behavior:
| Condition | Result |
|---|---|
| ----------- | -------- |
| Identical column names | Vertical concat; a source_file column is added |
| Shared columns exist (≥50% overlap) | Horizontal join on shared key |
| No common structure | Error — advise analyzing separately |
| Format | Extensions | Notes |
|---|---|---|
| -------- | ----------- | ------- |
| CSV | .csv, .tsv | Auto-detects delimiter and encoding (UTF-8 / GBK / GB2312) |
| Plain text | .txt | Auto-detects delimiter (comma / tab / semicolon / pipe) |
| Excel | .xlsx, .xls | Reads first non-empty sheet |
| JSON | .json | Supports array format and 1-level nested objects |
python {skill_base}/core/chart_generator.py \
<file_path> <chart_type> \
--title "Chart Title" \
--x-axis "date" \
--y-axis "revenue profit" \
--output-dir "./output"
| Parameter | Required | Description |
|---|---|---|
| ----------- | ---------- | ------------- |
file_path | Yes | Path to the data file |
chart_type | Yes | Chart type identifier (see table below) |
--title | No | Chart title; default: "Data Chart" |
--x-axis | No | X-axis field; auto-detected if omitted |
--y-axis | No | Y-axis field(s), space-separated; defaults to first 5 numeric columns |
--transform-code | No | LLM-generated pandas transform code string; validated and executed before chart rendering |
--output-dir | No | Output directory; default: ./smart_charts_output |
| ID | Name | Best For | Data Shape |
|---|---|---|---|
| ---- | ------ | ---------- | ------------ |
line | Line chart | Time-series trends, continuous data | 1 category + 1~N numeric columns |
bar | Bar chart | Category comparison, ranked data | 1 category + 1~N numeric columns |
pie | Pie chart | Composition, share distribution | 1 name + 1 value column |
scatter | Scatter plot | Correlation, density | 2 numeric columns |
area | Area chart | Cumulative change, trend | 1 category + 1~N numeric columns |
radar | Radar chart | Multi-dimension comparison | 1 indicator + N numeric columns |
heatmap | Heatmap | Density, cross-tabulation | 2 category + 1 numeric column |
treemap | Treemap | Hierarchical proportion | 1 name + 1 value column |
graph | Network graph | Entity relationships | source + target (+ value) |
boxplot | Box plot | Distribution, outliers | N numeric columns |
waterfall | Waterfall chart | Incremental change | 1 category + 1 numeric column |
gauge | Gauge chart | KPI progress | 1 numeric column |
sankey | Sankey diagram | Flow transfer | source + target + value |
funnel | Funnel chart | Conversion rate | 1 name + 1 value column |
sunburst | Sunburst chart | Multi-level composition | 1 name + 1 value column |
wordcloud | Word cloud | Frequency, keywords | 1 name + 1 value column |
Users can store custom report templates under the templates_dir directory.
templates/
├── _template_index.json # Auto-generated metadata index
└── <template_id>/ # Each template has its own directory
├── meta.json # Template metadata card
├── template.md # Template content
└── original.docx # Source file (optional)
{
"id": "<auto_generated>",
"name": "Monthly Sales Report",
"description": "For monthly sales summaries: revenue trend, top products, regional breakdown.",
"scenarios": ["monthly sales report", "sales performance review", "quarterly comparison"],
"variables": ["period", "revenue", "profit", "order_count", "mom_growth", "yoy_growth"],
"categories": ["sales", "finance", "business analysis"],
"format": "markdown",
"created_time": "<auto_generated>",
"modified_time": "<auto_generated>"
}
Template matching is performed by the LLM, not by hard-coded algorithms.
template_manager.get_all_templates_summary() collects metadataFallback behavior:
| Scenario | Behavior |
|---|---|
| ---------- | ---------- |
| No suitable template | LLM generates report freely |
| Partial match | LLM uses template structure as reference, generates the rest |
| Empty template library | LLM creates a professional report from scratch |
| Format | Example |
|---|---|
| -------- | --------- |
| Single braces | {variable_name} |
| Double braces | {{variable_name}} |
| Square brackets | [variable_name] |
| Percent signs | %variable_name% |
| Format | Extension | Processing |
|---|---|---|
| -------- | ----------- | ----------- |
| Markdown | .md, .markdown | Native support |
| Word | .docx | Extracts text and preserves formatting |
| Extracts text and structure | ||
| Plain text | .txt | Simple template parsing |
upload template, add template, save templateUser: Save this sales report as a template.
AI: Template saved: "Sales Report" (Markdown, 8 variables detected)
my templates, template list, show templatesUser: Show my templates.
AI: Your templates (3):
1. Monthly Sales Report (Markdown) — monthly sales analysis
2. Project Progress (Word) — project tracking
3. Financial Report (PDF) — financial analysis
User: Analyze this month's sales data.
AI: Matched template: "Monthly Sales Report"
Auto-filling variables: revenue, profit, growth rate
Generating professional report…
| Error | User message | Why | How to fix |
|---|---|---|---|
| ------- | ------------- | ----- | ------------ |
| Unsupported format | "Template format '.abc' is not supported. Only PDF, Word (.docx), and Markdown are accepted." | The file extension is not in the supported list. | Convert the file to PDF, DOCX, or Markdown and retry. |
| Template already exists | "Template 'Sales Report' already exists. This happens when a template with the same name was saved earlier." | Duplicate template name. | Choose: overwrite the existing one, rename the new template, or cancel. |
| No match found | "No template matches your task. This is normal if you haven't saved any relevant templates yet." | No template's scenarios/variables align with the current task. | The LLM will generate a report from scratch, or you can save a template for future use. |
| Missing variables | "Data missing for variables: revenue, profit. The template expects these fields but they weren't found in your data." | The template requires variables that don't exist in the current dataset. | Check that your data file contains the expected columns, or use a different template. |
preview_url; Markdown reports via open_result_view.Each chart type expects the DataFrame in a specific shape. LLM must check whether the raw data matches; if not, generate transform code.
| Chart Type | Required DataFrame Format | Example Columns |
|---|---|---|
| ------------ | -------------------------- | ----------------- |
line | 1 category/time column + 1~N numeric columns | month, productA, productB |
bar | 1 category column + 1~N numeric columns | city, revenue, profit |
area | 1 category/time column + 1~N numeric columns | date, uv, pv |
pie | 1 name column + 1 value column | category, share |
scatter | 2 numeric columns, or 1 category + 1 numeric | height, weight |
radar | 1 indicator column + N numeric columns | metric, productA, productB |
heatmap | 2 category columns + 1 numeric column | row, col, value |
treemap | 1 name column + 1 value column | category, sales |
graph | source + target columns (+ optional value) | from, to, weight |
boxplot | N numeric columns | math, chinese, english |
waterfall | 1 category column + 1 numeric column (increments) | month, profit_delta |
gauge | 1 numeric column (mean used) | completion_rate |
sankey | source + target + value columns | origin, destination, amount |
funnel | 1 name column + 1 value column | stage, count |
sunburst | 1 name column + 1 value column | category, value |
wordcloud | 1 name column + 1 value column | word, frequency |
When the raw data structure doesn't match the target chart's input format, use the following prompt template to generate transform code:
Known information:
- Raw data columns: {columns_with_dtypes}
- Data sample (first 5 rows): {sample}
- Target chart type: {chart_type}
- Required format for this chart: {chart_input_spec}
Generate a pandas code snippet that transforms df into a result DataFrame matching the chart's input format.
Rules:
1. Only use variables: df, pd, np
2. Must produce a variable named result (pd.DataFrame)
3. Do not modify df in-place; use df.copy() or chain operations
4. Keep code concise; prefer pandas built-in methods (pivot_table, melt, groupby, rename, etc.)
5. If raw data already matches the required format, output an empty string
6. Do NOT use: import, open, exec, eval, os, sys, subprocess, file I/O, network calls
Output format:
{transform_code}
| Scenario | Raw Data | Chart | Transform Code |
|---|---|---|---|
| ---------- | ---------- | ------- | ---------------- |
| Long→multi-series line | month, region, sales | line | result = df.pivot_table(index='month', columns='region', values='sales', aggfunc='sum').reset_index() |
| Long→radar | city, metric, value | radar | result = df.pivot_table(index='city', columns='metric', values='value').reset_index() |
| Long→pie (filter) | category, metric_name, metric_value | pie | result = df[df['metric_name']=='revenue'][['category','metric_value']].rename(columns={'category':'name','metric_value':'value'}) |
| Rename→sankey | 来源, 去向, 金额 | sankey | result = df.rename(columns={'来源':'source','去向':'target','金额':'value'}) |
| Wide→long | date, productA, productB | pie | result = df.melt(id_vars=['date'], var_name='name', value_name='value') |
| Compute delta→waterfall | month, profit | waterfall | tmp = df.copy(); tmp['delta'] = tmp['profit'].diff().fillna(tmp['profit'].iloc[0]); result = tmp[['month','delta']] |
| Aggregate→bar | date, product, sales | bar | result = df.groupby('product')['sales'].sum().reset_index() |
共 6 个版本