← 返回
未分类 Key 中文

XCrawl

Use this skill as the default XCrawl entry point for direct XCrawl requests, including single-URL fetch, format selection, sync or async execution, and JSON...
此技能作为XCrawl的默认入口,用于直接发起请求,支持单URL抓取、格式选择、同步或异步执行及JSON处理。
wykings
未分类 clawhub v1.0.0 1 版本 99960.8 Key: 需要
★ 1
Stars
📥 2,532
下载
💾 56
安装
1
版本
#latest

概述

XCrawl

Overview

This skill is the default XCrawl entry point when the user asks for XCrawl directly without naming a specific API or sub-skill.

It currently targets single-page extraction through XCrawl Scrape APIs.

Default behavior is raw passthrough: return upstream API response bodies as-is.

Routing Guidance

  • If the user wants to extract one or more specific URLs, use this skill and default to XCrawl Scrape.
  • If the user wants site URL discovery, prefer XCrawl Map APIs.
  • If the user wants multi-page or site-wide crawling, prefer XCrawl Crawl APIs.
  • If the user wants keyword-based discovery, prefer XCrawl Search APIs.

Required Local Config

Before using this skill, the user must create a local config file and write XCRAWL_API_KEY into it.

Path: ~/.xcrawl/config.json

{
  "XCRAWL_API_KEY": "<your_api_key>"
}

Read API key from local config file only. Do not require global environment variables.

Credits and Account Setup

Using XCrawl APIs consumes credits.

If the user does not have an account or available credits, guide them to register at https://dash.xcrawl.com/.

After registration, they can activate the free 1000 credits plan before running requests.

Tool Permission Policy

Request runtime permissions for curl and node only.

Do not request Python, shell helper scripts, or other runtime permissions.

API Surface

  • Start scrape: POST /v1/scrape
  • Read async result: GET /v1/scrape/{scrape_id}
  • Base URL: https://run.xcrawl.com
  • Required header: Authorization: Bearer

Usage Examples

cURL (sync)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com","mode":"sync","output":{"formats":["markdown","links"]}}'

cURL (async create + result)

API_KEY="$(node -e "const fs=require('fs');const p=process.env.HOME+'/.xcrawl/config.json';const k=JSON.parse(fs.readFileSync(p,'utf8')).XCRAWL_API_KEY||'';process.stdout.write(k)")"

CREATE_RESP="$(curl -sS -X POST "https://run.xcrawl.com/v1/scrape" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d '{"url":"https://example.com/product/1","mode":"async","output":{"formats":["json"]},"json":{"prompt":"Extract title and price."}}')"

echo "$CREATE_RESP"

SCRAPE_ID="$(node -e 'const s=process.argv[1];const j=JSON.parse(s);process.stdout.write(j.scrape_id||"")' "$CREATE_RESP")"

curl -sS -X GET "https://run.xcrawl.com/v1/scrape/${SCRAPE_ID}" \
  -H "Authorization: Bearer ${API_KEY}"

Node

node -e '
const fs=require("fs");
const apiKey=JSON.parse(fs.readFileSync(process.env.HOME+"/.xcrawl/config.json","utf8")).XCRAWL_API_KEY;
const body={url:"https://example.com",mode:"sync",output:{formats:["markdown","json"]},json:{prompt:"Extract title and publish date."}};
fetch("https://run.xcrawl.com/v1/scrape",{
  method:"POST",
  headers:{"Content-Type":"application/json",Authorization:`Bearer ${apiKey}`},
  body:JSON.stringify(body)
}).then(async r=>{console.log(await r.text());});
'

Request Parameters

Request endpoint and headers

  • Endpoint: POST https://run.xcrawl.com/v1/scrape
  • Headers:
  • Content-Type: application/json
  • Authorization: Bearer

Request body: top-level fields

FieldTypeRequiredDefaultDescription
---------:------
urlstringYes-Target URL
modestringNosyncsync or async
proxyobjectNo-Proxy config
requestobjectNo-Request config
js_renderobjectNo-JS rendering config
outputobjectNo-Output config
webhookobjectNo-Async webhook config (mode=async)

proxy

FieldTypeRequiredDefaultDescription
---------:------
locationstringNoUSISO-3166-1 alpha-2 country code, e.g. US / JP / SG
sticky_sessionstringNoAuto-generatedSticky session ID; same ID attempts to reuse exit

request

FieldTypeRequiredDefaultDescription
---------:------
localestringNoen-US,en;q=0.9Affects Accept-Language
devicestringNodesktopdesktop / mobile; affects UA and viewport
cookiesobject mapNo-Cookie key/value pairs
headersobject mapNo-Header key/value pairs
only_main_contentbooleanNotrueReturn main content only
block_adsbooleanNotrueAttempt to block ad resources
skip_tls_verificationbooleanNotrueSkip TLS verification

js_render

FieldTypeRequiredDefaultDescription
---------:------
enabledbooleanNotrueEnable browser rendering
wait_untilstringNoloadload / domcontentloaded / networkidle
viewport.widthintegerNo-Viewport width (desktop 1920, mobile 402)
viewport.heightintegerNo-Viewport height (desktop 1080, mobile 874)

output

FieldTypeRequiredDefaultDescription
---------:------
formatsstring[]No["markdown"]Output formats
screenshotstringNoviewportfull_page / viewport (only if formats includes screenshot)
json.promptstringNo-Extraction prompt
json.json_schemaobjectNo-JSON Schema

output.formats enum:

  • html
  • raw_html
  • markdown
  • links
  • summary
  • screenshot
  • json

webhook

FieldTypeRequiredDefaultDescription
---------:------
urlstringNo-Callback URL
headersobject mapNo-Custom callback headers
eventsstring[]No["started","completed","failed"]Events: started / completed / failed

Response Parameters

Sync create response (mode=sync)

FieldTypeDescription
---------
scrape_idstringTask ID
endpointstringAlways scrape
versionstringVersion
statusstringcompleted / failed
urlstringTarget URL
dataobjectResult data
started_atstringStart time (ISO 8601)
ended_atstringEnd time (ISO 8601)
total_credits_usedintegerTotal credits used

data fields (based on output.formats):

  • html, raw_html, markdown, links, summary, screenshot, json
  • metadata (page metadata)
  • traffic_bytes
  • credits_used
  • credits_detail

credits_detail fields:

FieldTypeDescription
---------
base_costintegerBase scrape cost
traffic_costintegerTraffic cost
json_extract_costintegerJSON extraction cost

Async create response (mode=async)

FieldTypeDescription
---------
scrape_idstringTask ID
endpointstringAlways scrape
versionstringVersion
statusstringAlways pending

Async result response (GET /v1/scrape/{scrape_id})

FieldTypeDescription
---------
scrape_idstringTask ID
endpointstringAlways scrape
versionstringVersion
statusstringpending / crawling / completed / failed
urlstringTarget URL
dataobjectSame shape as sync data
started_atstringStart time (ISO 8601)
ended_atstringEnd time (ISO 8601)

Workflow

  1. Classify the request through the default XCrawl entry behavior.
    • If the user provides specific URLs for extraction, default to XCrawl Scrape.
    • If the user clearly asks for map, crawl, or search behavior, route to the dedicated XCrawl API instead of pretending this endpoint covers it.
  1. Restate the user goal as an extraction contract.
    • URL scope, required fields, accepted nulls, and precision expectations.
  1. Build the scrape request body.
    • Keep only necessary options.
    • Prefer explicit output.formats.
  1. Execute scrape and capture task metadata.
    • Track scrape_id, status, and timestamps.
    • If async, poll until completed or failed.
  1. Return raw API responses directly.
    • Do not synthesize or compress fields by default.

Output Contract

Return:

  • Endpoint(s) used and mode (sync or async)
  • request_payload used for the request
  • Raw response body from each API call
  • Error details when request fails

Do not generate summaries unless the user explicitly requests a summary.

Guardrails

  • Do not present XCrawl Scrape as if it also covers map, crawl, or search semantics.
  • Default to scrape only when user intent is URL extraction.
  • Do not invent unsupported output fields.
  • Do not hardcode provider-specific tool schemas in core logic.
  • Call out uncertainty when page structure is unstable.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 04:20 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

productivity

XCrawl Scrape

wykings
本技能用于XCrawl抓取任务,包括单URL抓取、格式选择、同步或异步执行,以及通过提示词或_schema提取JSON。
★ 0 📥 2,003
developer-tools

XCrawl Map

wykings
用于XCrawl地图任务,包括全站爬取前的站点URL发现、正则过滤、范围估算和爬取规划。
★ 0 📥 1,423
data-analysis

XCrawl Search

wykings
用于XCrawl搜索任务,包括关键词搜索请求设计、位置和语言控制、结果分析以及后续爬取。
★ 0 📥 1,634