← 返回
AI智能 中文

Skill 109

Expertise in deploying, monitoring, detecting drift, automating retraining, and ensuring fairness and compliance for production ML models.
具备生产ML模型的部署、监控、漂移检测、自动再训练、公平与合规方面的专业能力。
timbohnett-farther
AI智能 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 527
下载
💾 6
安装
1
版本
#latest

概述

Skill 109: MLOps & Model Governance

Quality Grade: 94-95/100

Author: OpenClaw Assistant

Last Updated: March 2026

Difficulty: Advanced (requires statistics, operations, domain knowledge)


Overview

MLOps (Machine Learning Operations) is the discipline of deploying, monitoring, and governing machine learning models in production. It extends DevOps principles to the unique challenges of ML: data quality, model drift, retraining, and fairness.

This skill covers:

  • Model deployment and versioning
  • Data quality and feature management
  • Model drift detection and mitigation
  • Retraining pipelines and automation
  • Monitoring & observability for models
  • Governance (fairness, bias, compliance)

Part 1: Model Deployment & Versioning

Deployment Patterns

Batch Prediction:

  • Run model on batch of data at schedule (hourly, daily)
  • Store results in database for serving
  • ✓ Simple, no latency concerns
  • ✗ Stale predictions, high storage

Real-Time API:

  • Model served as HTTP/gRPC API
  • Called on-demand for predictions
  • ✓ Fresh predictions, scalable
  • ✗ Latency critical, need caching

Stream Processing:

  • Model processes events from Kafka/Pub-Sub stream
  • Results published to downstream systems
  • ✓ Real-time, event-driven
  • ✗ Exactly-once semantics complex, state management

Model Versioning

Model Registry:
  model_name: fraud_detector
  versions:
    v1.0:
      training_date: 2026-01-01
      dataset: Q4_2025_transactions (1M records)
      metrics:
        precision: 0.96
        recall: 0.92
        auc: 0.98
      status: production
    
    v1.1:
      training_date: 2026-02-15
      dataset: Q4_2025 + Q1_2026 (2M records)
      metrics:
        precision: 0.97
        recall: 0.94
        auc: 0.985
      status: staging (shadow running)
    
    v1.2:
      status: training (not ready)

Canary Deployment

Traffic split:
  90% → v1.0 (stable, proven)
  10% → v1.1 (new, being validated)

If v1.1 performs well (same metrics as v1.0):
  Day 1: 90/10
  Day 2: 80/20
  Day 3: 50/50
  Day 4: 20/80
  Day 5: 0/100 (v1.0 retired, v1.1 becomes prod)

If v1.1 performs poorly (accuracy drops):
  Immediately rollback to 100% v1.0

Part 2: Data Quality & Feature Management

Data Quality Checks

Before training:

@data_quality_check
def validate_raw_data(df):
    assert df.isnull().sum() < 0.01 * len(df), "Too many nulls"
    assert df.shape[0] > 100_000, "Dataset too small"
    assert df['target'].value_counts().min() > 100, "Class imbalance extreme"
    assert df['timestamp'].max() > now() - timedelta(days=1), "Data stale"

In production:

@data_quality_check
def validate_serving_features(request):
    assert request['user_age'] > 0 and request['user_age'] < 150
    assert request['transaction_amount'] > 0
    assert len(request['user_id']) < 100
    # If any check fails, return default prediction + alert

Feature Store

Centralized feature management:

Feature Store:
  
  customer_features (daily, batch):
    - customer_age
    - customer_account_age
    - customer_total_spend
    
  transaction_features (real-time, stream):
    - amount
    - merchant_category
    - is_foreign
    - time_since_last_transaction
  
  derived_features (computed):
    - risk_score = f(transaction_features, customer_features)
    - velocity_last_hour = count(transactions in last hour)

Serving:
  GET /features/customer/{id}?features=customer_age,risk_score
  → Real-time lookup, cached, monitored

Part 3: Model Drift & Retraining

Types of Drift

Data Drift:

  • Distribution of input features changes
  • Example: Customers' spending patterns changed post-recession
  • Detection: Compare feature distributions (current vs. historical)

Label Drift:

  • Distribution of labels changes
  • Example: Fraud rate increased due to new attack vector
  • Detection: Skew in model predictions vs. actual labels

Concept Drift:

  • Relationship between features and labels changes
  • Example: Customers' behavior changed; age is less predictive
  • Detection: Model accuracy degrades unexpectedly

Drift Detection

def monitor_data_drift():
    current_features = load_recent_features(days=7)
    historical_baseline = load_historical_features(months=3)
    
    for feature in current_features.columns:
        # Kolmogorov-Smirnov test
        ks_stat = ks_test(current_features[feature], 
                         historical_baseline[feature])
        
        if ks_stat > THRESHOLD:
            alert(f"Drift detected in {feature}")
            trigger_retraining()

Automated Retraining

Pipeline:
  1. Detect drift (automatic trigger)
  2. Fetch latest data (last 30 days)
  3. Train new model
  4. Validate metrics (must improve or match)
  5. Deploy canary (10% traffic)
  6. Monitor (24 hours for issues)
  7. If good, promote to 100% (else rollback)
  8. If bad, alert data science team for investigation

Part 4: Monitoring & Observability

Key Metrics

Model Metrics:

  • Accuracy (% correct predictions)
  • Precision/Recall (trade-off for each class)
  • AUC-ROC (discriminative ability)
  • F1 score (harmonic mean for class imbalance)

Business Metrics:

  • Fraud caught vs. false positives
  • Revenue impact of model decisions
  • Model latency (does it meet SLA?)
  • Model cost per prediction

Data Metrics:

  • Feature freshness (how old is data?)
  • Data completeness (% non-null)
  • Data distribution changes
  • Outlier detection

Model Observability

Dashboard:
  [Prediction Latency]  [Prediction Volume]  [Error Rate]
  p50: 45ms             10K/sec              0.1%
  p99: 250ms
  
  [Model Drift Indicators]
  Feature distribution: Green ✓
  Label distribution: Yellow ⚠ (2% change)
  Prediction accuracy: Red ✗ (↓ 2% from baseline)
  
  [Recommended Actions]
  - Initiate retraining (data drift detected)
  - Review error logs (unusual error pattern)
  - Monitor next 24h for issues

Part 5: Governance, Fairness & Compliance

Fairness & Bias

Check for demographic parity:

def check_fairness(predictions, demographics):
    for group in demographics.unique():
        positive_rate = predictions[demographics == group].mean()
        print(f"{group}: {positive_rate:.1%} positive")
    
    # All groups should have similar positive rates (within 5%)
    if max_rate - min_rate > 0.05:
        alert("Fairness issue: disparate impact detected")

Mitigation strategies:

  • Collect more data for underrepresented groups
  • Use fairness-aware training (adjust loss function)
  • Post-process predictions to equalize rates
  • Require human review for high-impact decisions

Model Cards & Governance

Every model should have a Model Card:

# Model Card: Fraud Detector v1.0

## Purpose
Identify fraudulent transactions in real-time

## Training Data
- Source: All transactions Q4 2025
- Size: 1M transactions
- Positive rate: 0.1% (1000 frauds)
- Temporal coverage: Jan-Dec 2025

## Performance
- Precision: 96% (when threshold=0.5)
- Recall: 92%
- False positive rate: 1% (blocks 1 in 100 legitimate transactions)

## Known Limitations
- Untested on: Cryptocurrency, cash advances, prepaid cards
- Assumes: Feature distributions similar to 2025

## Fairness
- Tested for disparate impact across: Gender, Age, Geographic region
- No significant bias found (|Δ| < 2%)

## Owner
ML Platform Team (ml-platform@company.com)

## Review Schedule
- Monthly performance review
- Quarterly fairness audit
- Annual retraining assessment

Conclusion

MLOps brings the rigor of DevOps to machine learning. By automating deployment, monitoring drift, retraining intelligently, and governing fairly, you ensure ML models stay valuable, reliable, and trustworthy in production.

Key Takeaway: Models aren't static—they degrade over time. Treat them like infrastructure: monitor continuously, rebuild when needed, and retire when value drops.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-29 22:52 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

ontology

oswalpalash
类型化知识图谱,用于结构化智能体记忆与可组合技能。支持创建/查询实体(人员、项目、任务、事件、文档)及关联...
★ 709 📥 243,509
developer-tools

Skill 108

timbohnett-farther
掌握平台工程原则,构建自助式内部平台,优化开发者体验、基础设施抽象和可观测性……
★ 0 📥 566
ai-intelligence

self-improving agent

pskoett
捕获经验教训、错误和纠正,以实现持续改进。使用时机:(1)命令或操作意外失败;(2)用户纠正……
★ 4,055 📥 795,674