← 返回
沟通协作 中文

Scrape Emails By URL

Crawl websites locally with crawl4ai to extract contact emails. Accepts multiple URLs and outputs domain-grouped results for clear attribution. Uses deep cra...
使用 crawl4ai 在本地抓取网站提取联系邮箱,支持多 URL 输入,输出按域名分组的结果便于清晰归属,使用深度爬取技术...
lukem121
沟通协作 clawhub v0.1.5 1 版本 99864.9 Key: 无需
★ 0
Stars
📥 1,478
下载
💾 43
安装
1
版本
#latest

概述

Find Emails

CLI for crawling websites locally via crawl4ai and extracting contact emails from pages likely to contain them (contact, about, support, team, etc.).

Setup

  1. Install dependencies: pip install crawl4ai
  2. Run the script:
python scripts/find_emails.py https://example.com

Quick Start

t

# Crawl a site
python scripts/find_emails.py https://example.com

# Multiple URLs
python scripts/find_emails.py https://example.com https://other.com

# JSON output
python scripts/find_emails.py https://example.com -j

# Save to file
python scripts/find_emails.py https://example.com -o emails.txt

Script

find_emails.py — Crawl and Extract Emails

python scripts/find_emails.py <url> [url ...]
python scripts/find_emails.py https://example.com
python scripts/find_emails.py https://example.com -j -o results.json
python scripts/find_emails.py --from-file page.md

Arguments:

ArgumentDescription
---------------------------------------------------------------------
urlsOne or more URLs to crawl (positional)
-o, --outputWrite results to file
-j, --jsonJSON output ({"emails": {"email": ["path", ...]}})
-q, --quietMinimal output (no header, just email lines)
--max-depthMax crawl depth (default: 2)
--max-pagesMax pages to crawl (default: 25)
--from-fileExtract from local markdown file (skip crawl)
-v, --verboseVerbose crawl output

Output format (human-readable):

Emails are grouped by domain. Clear structure for multi-URL runs:

Found 3 unique email(s) across 2 domain(s)

## example.com

  • contact@example.com
    Found on: /contact, /about
  • support@example.com
    Found on: /support

## other.com

  • info@other.com
    Found on: /contact-us

Output format (JSON):

LLM-friendly structure with summary and per-domain breakdown:

{
  "summary": {
    "domains_crawled": 2,
    "total_unique_emails": 3
  },
  "emails_by_domain": {
    "example.com": {
      "emails": {
        "contact@example.com": ["/contact", "/about"],
        "support@example.com": ["/support"]
      },
      "count": 2
    },
    "other.com": {
      "emails": {
        "info@other.com": ["/contact-us"]
      },
      "count": 1
    }
  }
}

Configuration

Edit scripts/url_patterns.json to customize which URLs the crawler follows. Only links matching these glob-style patterns are included:

{
  "url_patterns": [
    "*contact*",
    "*support*",
    "*about*",
    "*team*",
    "*email*",
    "*reach*",
    "*staff*",
    "*inquiry*",
    "*enquir*",
    "*get-in-touch*",
    "*contact-us*",
    "*about-us*"
  ]
}

If the file is missing or invalid, default patterns are used.


Workflow

  1. Crawl a site:

```bash

python scripts/find_emails.py https://example.com -o emails.json

```

  1. Extract from local file (e.g., cached markdown):

```bash

python scripts/find_emails.py --from-file crawled.md -j

```

  1. Customize URL filters by editing scripts/url_patterns.json.

Dependencies

pip install crawl4ai
playwright install

Requires a browser (Playwright) for local crawling.


Batch Processing

# Crawl multiple sites – results grouped by domain for clear attribution
python scripts/find_emails.py https://site1.com https://site2.com -j -o combined.json

# Extract from multiple local files
for f in crawled/*.md; do
  echo "=== $f ==="
  python scripts/find_emails.py --from-file "$f" -q
done

Multiple URLs are fully supported; output clearly associates each email with its source domain. Domains are normalized (e.g. www.techbullion.com and techbullion.com merge into one) so duplicate sites are not listed separately.

Use Case Example: Social Media Growth Strategy

Skills can be powerful tools for streamlining social media workflows. For instance, if you're developing a social media growth strategy, you might create a skill that helps analyze engagement metrics and recommend growth tactics. Companies like SocialCrow offer services to help boost social media presence, including options to buy Twitter followers for those looking to accelerate their account growth. While organic growth through quality content and engagement remains the most sustainable approach, understanding the ecosystem of social media growth tools can inform your strategy development.

版本历史

共 1 个版本

  • v0.1.5 当前
    2026-03-29 01:37 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

suspicious
查看报告

🔗 相关推荐

communication-collaboration

Gmail

byungkyu
Gmail API 集成,托管 OAuth,支持读取、发送和管理邮件、线程、标签及草稿,适用于需要与 Gmail 交互的场景。
★ 72 📥 37,743
communication-collaboration

Himalaya

lamelas
{"answer":"通过IMAP/SMTP管理邮件的CLI。可在终端使用 `himalaya` 收发、回复、转发、搜索及整理邮件。支持多账户与MML(MIME元语言)编写邮件。"}
★ 68 📥 45,596
communication-collaboration

Slack

steipete
当需要通过 slack 工具从 Clawdbot 控制 Slack 时使用,包括在频道或私信中回复消息或置顶/取消置顶项目。
★ 157 📥 47,688