← 返回
数据分析

Monitoring

Set up observability for applications and infrastructure with metrics, logs, traces, and alerts.
为应用和基础设施配置可观测性,包括指标、日志、链路追踪和告警。
ivangdavila
数据分析 clawhub v1.0.0 1 版本 98411.7 Key: 无需
★ 4
Stars
📥 6,116
下载
💾 364
安装
1
版本
#latest

概述

Complexity Levels

LevelToolsSetup TimeBest For
------------------------------------
MinimalUptimeRobot, Healthchecks.io15 minSide projects, MVPs
StandardUptime Kuma, Sentry, basic Grafana1-2 hoursSmall teams, startups
ProfessionalPrometheus, Grafana, Loki, Alertmanager1-2 daysProduction systems
EnterpriseDatadog, New Relic, or full OSS stackOngoingLarge-scale operations

The Three Pillars

PillarWhat It AnswersTools
--------------------------------
Metrics"How is the system performing?"Prometheus, Grafana, Datadog
Logs"What happened?"Loki, ELK, CloudWatch
Traces"Why is this request slow?"Jaeger, Tempo, Sentry

Quick Start by Use Case

"I just want to know if it's down"

→ UptimeRobot (free) or Uptime Kuma (self-hosted). See simple.md.

"I need to debug production errors"

→ Sentry with your framework SDK. 5-minute setup. See apm.md.

"I want real observability"

→ Prometheus + Grafana + Loki. See prometheus.md.

"I need to centralize logs"

→ Loki for simple, ELK for complex queries. See logs.md.

What to Monitor

Applications (RED Method)

  • Rate — requests per second
  • Errors — error rate by endpoint
  • Duration — latency (p50, p95, p99)

Infrastructure (USE Method)

  • Utilization — CPU, memory, disk usage
  • Saturation — queue depth, load average
  • Errors — hardware/system errors

Alerting Principles

DoDon't
-----------
Alert on symptoms (user impact)Alert on causes (CPU high)
Include runbook linkRequire investigation to understand
Set appropriate severityMake everything P1
Require actionAlert on "interesting" metrics

Alert fatigue kills monitoring. If alerts are ignored, you have no monitoring.

For alert configuration, severities, and on-call setup, see alerting.md.

Cost Comparison

SolutionMonthly Cost (small)Monthly Cost (medium)
-----------------------------------------------------
UptimeRobotFree$7
Uptime Kuma$5 (VPS)$5 (VPS)
SentryFree / $26$80
Grafana CloudFree tier$50+
Datadog$15/host$23/host + features
Self-hosted stack$10-20 (VPS)$50-100 (VPS)

Common Mistakes

  • Starting with Prometheus/Grafana when Uptime Kuma would suffice
  • No alerting (dashboards nobody watches)
  • Too many alerts (alert fatigue → ignored)
  • Missing runbooks (alert fires, nobody knows what to do)
  • Not monitoring from outside (only internal checks)
  • Storing logs forever (cost explodes)

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-28 12:32 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,349 📥 317,677
data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 162 📥 59,660
productivity

Word / DOCX

ivangdavila
创建、检查和编辑 Microsoft Word 文档及 DOCX 文件,支持样式、编号、修订记录、表格、分节符及兼容性检查等功能。
★ 437 📥 147,150