← 返回
沟通协作 中文

Presidio Pii Scrubber for sensitive info

Local PII protection for OpenClaw agents. Scrubs customer data (names, phones, emails, addresses, credit cards, vessel names) before it reaches any AI model....
{ "answer": "OpenClaw 代理的本地 PII 保护。在数据到达 AI 模型前清洗客户数据(姓名、电话、邮箱、地址、信用卡、船名)。" }
sebclawops
沟通协作 clawhub v0.1.1 1 版本 99808.1 Key: 无需
★ 0
Stars
📥 520
下载
💾 8
安装
1
版本
#latest

概述

Presidio PII Protection

You have the Presidio PII skill. Customer data MUST be scrubbed before it reaches any AI model.

When to Use

ALWAYS use this skill before processing data from:

  • CRM systems (HubSpot, Salesforce, etc.)
  • Cloud storage (Google Drive, Dropbox, etc.)
  • Project management tools (TintWiz, Asana, etc.)
  • Any source containing customer names, phones, emails, or addresses

DO NOT use for:

  • Internal company data (product types, SOP terms, project statuses)
  • General conversation with no customer data
  • System administration tasks

Fail-Closed Rule

If Presidio is down, DO NOT query customer data sources. Tell the owner:

"Cannot query [source] because Presidio PII protection is offline. Customer data will not be sent unprotected."

How to Use

Step 1: Check Health

bash SKILL_DIR/scripts/presidio-health.sh

If unhealthy, STOP. Do not proceed with the data query.

Step 2: Scrub Data

After retrieving raw data from a source, pipe it through the scrubber:

echo "RAW DATA HERE" | python3 SKILL_DIR/scripts/presidio-scrub.py SESSION_ID

Use any unique session identifier (timestamp, request ID, etc).

The scrubber returns JSON:

{
  "text": "[PERSON_1] at [LOCATION_1], phone [PHONE_NUMBER_1]",
  "pii_found": 3,
  "entity_types": ["PERSON", "LOCATION", "PHONE_NUMBER"],
  "mapping_file": "/path/to/mapping.json",
  "session_id": "SESSION_ID"
}

Use the text field for all reasoning. The mapping file stays local.

Step 3: Reason with Clean Data

Process the anonymized text normally. Refer to customers as their tokens ([PERSON_1], [PERSON_2], etc). The model never sees real names.

Step 4: Restore Response

Before delivering the response to the user, de-anonymize:

echo "MODEL RESPONSE WITH TOKENS" | python3 SKILL_DIR/scripts/presidio-restore.py SESSION_ID

This swaps tokens back to real values and deletes the mapping file.

What Gets Scrubbed (Built-in)

  • Person names
  • Phone numbers (all formats)
  • Email addresses
  • Physical addresses
  • Credit card numbers (with Luhn validation)
  • US Social Security Numbers
  • Bank account / routing numbers
  • IP addresses
  • Dates of birth

What Passes Through (Safe)

  • Product names and specifications
  • Project statuses and type codes
  • Dollar amounts without customer context
  • Industry terminology and SOP references
  • Internal role names and office locations
  • Dates and timelines

Custom Recognizers

The configs/recognizers.json file contains example patterns you can customize for your business:

  • City/region names for boosted location detection
  • Industry-specific identifiers (vessel names, project IDs, etc.)
  • Custom entity patterns unique to your data

Edit configs/recognizers.json to add your own patterns. Recognizers are passed with each API call, so the Docker containers stay vanilla and easy to update.

Trust Statement

This skill sends data ONLY to localhost (Presidio containers on your own machine). No customer data is ever sent to any external service. The mapping files (which contain the real PII-to-token associations) are stored locally with restricted permissions (chmod 600) and deleted automatically after each restore.

版本历史

共 1 个版本

  • v0.1.1 当前
    2026-03-30 12:24 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

communication-collaboration

imap-smtp-email

gzlicanyi
使用IMAP/SMTP读取和发送邮件;检查新/未读邮件、获取内容、搜索邮箱、标记已读/未读、发送带附件的邮件。支持...
★ 113 📥 52,406
communication-collaboration

Slack

steipete
当需要通过 slack 工具从 Clawdbot 控制 Slack 时使用,包括在频道或私信中回复消息或置顶/取消置顶项目。
★ 157 📥 47,678
data-analysis

Openclaw Google Ads

sebclawops
Shared Google Ads API skill for OpenClaw agents. Query account, campaign, ad group, keyword, search term, and performanc
★ 1 📥 1,129