ClickHouse OLAP 加速器

> 亿级数据秒级响应——把"慢"字从分析词汇表中删掉。

一、ClickHouse 全景图

                        ┌─────────────────────────────┐
                        │     ClickHouse 分析引擎       │
                        ├─────────────┬───────────────┤
                        │ 列式存储     │  向量化执行     │
                        │ 数据压缩     │  SIMD 加速     │
                        │ MergeTree   │  实时摄入       │
                        └─────────────┴───────────────┘
                                      │
         ┌──────────┬──────────┬──────┴──────┬──────────┬──────────┐
         ▼          ▼          ▼             ▼          ▼          ▼
    MergeTree   Replacing   Summing     Aggregating  Collapsing  Distributed
    基础引擎     MergeTree   MergeTree   MergeTree   MergeTree   分布式表
    万能引擎     去重引擎     预聚合引擎   增量聚合      折叠引擎    集群查询

二、快速开始

# 一键部署 ClickHouse 集群
docker-compose up -d

# 初始化测试数据（5000万条）
python scripts/init_data.py

# 基准查询测试
python scripts/benchmark_queries.py

# 物化视图创建
python scripts/create_materialized_views.py

# 表优化建议
python scripts/table_optimizer.py

三、核心概念速查

3.1 列式存储 vs 行式存储

行式存储（MySQL）：
┌──────┬────────┬──────┬──────────┐
│ id   │ name   │ age  │ email    │  ← 一行连续存储
├──────┼────────┼──────┼──────────┤
│ 1    │ Alice  │ 25   │ a@x.com  │
│ 2    │ Bob    │ 30   │ b@x.com  │
│ 3    │ Carol  │ 28   │ c@x.com  │
└──────┴────────┴──────┴──────────┘
扫描 SELECT age FROM users → 需要跳过 id/name/email 字段

列式存储（ClickHouse）：
id 列:    [1, 2, 3, ...]         ← 独立压缩
name 列:  [Alice, Bob, Carol, ...]
age 列:   [25, 30, 28, ...]       ← SELECT age 只读这一列
email 列: [a@x.com, b@x.com, ...]
扫描 SELECT age FROM users → 只读 age 列，I/O 减少 3/4

3.2 MergeTree 家族速查

引擎	用途	特点
:---	:---	:---
MergeTree	通用基础引擎	按主键排序 + 分区 + TTL。万物之始
ReplacingMergeTree	去重	相同排序键只保留最新版本。UPSERT 场景
SummingMergeTree	预聚合	相同主键自动求和数值列。报表场景首选
AggregatingMergeTree	增量聚合	配合物化视图实现实时预聚合
CollapsingMergeTree	折叠更新	SIGN 列标识增删。状态变更场景
VersionedCollapsingMergeTree	版本折叠	带版本的折叠更新
Distributed	分布式代理	不存数据，路由查询到分片

四、表设计实战

4.1 订单分析表设计

-- =====================================================
-- 订单明细表（MergeTree 基础引擎）
-- 数据量预估：日均 500 万条，保留 90 天 = 4.5 亿条
-- =====================================================

CREATE TABLE orders (
    order_id        UInt64,
    user_id         UInt64,
    product_id      UInt32,
    category_id     UInt16,
    amount          Decimal64(2),
    quantity         UInt16,
    status          LowCardinality(String),   -- 低基数枚举：created/paid/shipped/delivered
    channel         LowCardinality(String),   -- app/web/mini_program
    city            LowCardinality(String),
    create_time     DateTime,
    pay_time        DateTime DEFAULT now()
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(create_time)            -- 按月分区，90天=3个分区
ORDER BY (category_id, user_id, create_time)  -- 排序键：品类→用户→时间
PRIMARY KEY (category_id, user_id, create_time)
TTL create_time + INTERVAL 90 DAY             -- 90天自动删除
SETTINGS
    index_granularity = 8192,                 -- 索引粒度：8192行一个稀疏索引标记
    compression_codec = 'ZSTD(3)',            -- ZSTD压缩，level 3（速度优先）
    min_rows_for_wide_part = 0,
    min_bytes_for_wide_part = 0;

-- 排序键设计原则：
-- 1. 最频繁过滤的列放前面（category_id）
-- 2. 区分度高的列放前面
-- 3. 范围查询列放最后（create_time）

4.2 零售销售流水表设计

-- =====================================================
-- 零售销售流水表（ReplacingMergeTree + SummingMergeTree 组合）
-- 场景：实时大屏 → 每分钟交易额、Top商品、Top门店
-- =====================================================

-- 明细层级：销售流水（ReplacingMergeTree 支持 UPSERT）
CREATE TABLE sales_transactions (
    transaction_id  String,
    store_id        UInt16,
    product_id      UInt32,
    category        LowCardinality(String),
    quantity         UInt16,
    unit_price      Decimal64(2),
    total_amount    Decimal64(2),
    transaction_time DateTime,
    version         UInt32   -- 用于去重，取最大版本
)
ENGINE = ReplacingMergeTree(version)
PARTITION BY toYYYYMMDD(transaction_time)
ORDER BY (store_id, product_id, transaction_time, transaction_id)
TTL transaction_time + INTERVAL 30 DAY;

-- 聚合层级：分钟级聚合（SummingMergeTree 自动求和）
CREATE TABLE sales_per_minute (
    minute          DateTime,
    store_id        UInt16,
    category        LowCardinality(String),
    total_quantity  UInt32,
    total_amount    Decimal64(2),
    order_count     UInt32
)
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(minute)
ORDER BY (minute, store_id, category);

-- 物化视图：明细 → 分钟聚合
CREATE MATERIALIZED VIEW sales_per_minute_mv
TO sales_per_minute
AS SELECT
    toStartOfMinute(transaction_time) AS minute,
    store_id,
    category,
    sum(quantity)                     AS total_quantity,
    sum(total_amount)                 AS total_amount,
    count()                           AS order_count
FROM sales_transactions
GROUP BY minute, store_id, category;

五、物化视图实战

5.1 物化视图类型

┌────────────────────────────────────────────────────┐
│              物化视图三种模式                        │
├────────────┬───────────────┬───────────────────────┤
│ 模式       │  语法         │  适用场景              │
├────────────┼───────────────┼───────────────────────┤
│ 即时聚合   │ TO target     │ 明细表写入时自动聚合   │
│            │ 自动触发      │ 实时大屏/实时报表      │
├────────────┼───────────────┼───────────────────────┤
│ 定时刷新   │ REFRESH       │ 周期性汇总，容忍延迟   │
│            │ EVERY 1 HOUR  │ T+1报表/DW层          │
├────────────┼───────────────┼───────────────────────┤
│ 手动刷新   │ REFRESH       │ 按需触发               │
│            │ 无定时        │ 临时分析/补数据        │
└────────────┴───────────────┴───────────────────────┘

5.2 多层聚合架构

-- =====================================================
-- 三层聚合架构：明细 → 分钟 → 小时 → 天
-- 每层数据量递减 60x，查询逐层加速
-- =====================================================

-- 明细表（日增 1 亿条）
CREATE TABLE events_raw ( ... )
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (event_type, user_id, event_time);

-- L1: 分钟聚合表（日增 170 万条，压缩 60x）
CREATE TABLE events_1min (
    minute     DateTime,
    event_type LowCardinality(String),
    metric1    AggregateFunction(sum, UInt64),
    metric2    AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(minute)
ORDER BY (event_type, minute);

CREATE MATERIALIZED VIEW events_1min_mv TO events_1min AS
SELECT
    toStartOfMinute(event_time) AS minute,
    event_type,
    sumState(cnt)               AS metric1,
    uniqState(user_id)          AS metric2
FROM events_raw
GROUP BY minute, event_type;

-- L2: 小时聚合表（日增 2.8 万条）
CREATE TABLE events_1hour (
    hour       DateTime,
    event_type LowCardinality(String),
    metric1    AggregateFunction(sum, UInt64),
    metric2    AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (event_type, hour);

CREATE MATERIALIZED VIEW events_1hour_mv TO events_1hour AS
SELECT
    toStartOfHour(minute)    AS hour,
    event_type,
    sumMergeState(metric1)   AS metric1,
    uniqMergeState(metric2)  AS metric2
FROM events_1min
GROUP BY hour, event_type;

-- L3: 天聚合表（日增 470 条）
CREATE TABLE events_1day (
    day        Date,
    event_type LowCardinality(String),
    metric1    AggregateFunction(sum, UInt64),
    metric2    AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(day)
ORDER BY (event_type, day);

CREATE MATERIALIZED VIEW events_1day_mv TO events_1day AS
SELECT
    toDate(hour)             AS day,
    event_type,
    sumMergeState(metric1)   AS metric1,
    uniqMergeState(metric2)  AS metric2
FROM events_1hour
GROUP BY day, event_type;

-- 查询时自动选择最佳层级：
-- 查最近1小时 → events_1min
-- 查最近7天   → events_1hour
-- 查最近90天  → events_1day
SELECT
    event_type,
    sumMerge(metric1)  AS total,
    uniqMerge(metric2) AS unique_users
FROM events_1day
WHERE day BETWEEN '2026-03-01' AND '2026-05-31'
GROUP BY event_type;

六、分布式集群

6.1 集群架构

                 ┌─────────────────┐
                 │  Distributed     │
                 │  Table (代理层)  │
                 └────────┬────────┘
                          │
          ┌───────────────┼───────────────┐
          ▼               ▼               ▼
    ┌──────────┐    ┌──────────┐    ┌──────────┐
    │ Shard 1  │    │ Shard 2  │    │ Shard 3  │
    │          │    │          │    │          │
    │ Replica1 │    │ Replica1 │    │ Replica1 │
    │ Replica2 │    │ Replica2 │    │ Replica2 │
    └──────────┘    └──────────┘    └──────────┘

    数据分布：rand() % shard_count 或 cityHash64(key) % shard_count
    副本同步：ZooKeeper 协调，异步复制

6.2 集群配置

<!-- config.xml 集群定义 -->
<remote_servers>
    <olap_cluster>
        <!-- 3 分片 × 2 副本 -->
        <shard>
            <internal_replication>true</internal_replication>
            <replica><host>ch-shard1-r1</host><port>9000</port></replica>
            <replica><host>ch-shard1-r2</host><port>9000</port></replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica><host>ch-shard2-r1</host><port>9000</port></replica>
            <replica><host>ch-shard2-r2</host><port>9000</port></replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica><host>ch-shard3-r1</host><port>9000</port></replica>
            <replica><host>ch-shard3-r2</host><port>9000</port></replica>
        </shard>
    </olap_cluster>
</remote_servers>

-- 本地表（每个节点创建）
CREATE TABLE orders_local ON CLUSTER olap_cluster (
    ...
)
ENGINE = ReplicatedMergeTree(
    '/clickhouse/tables/{shard}/orders',
    '{replica}'
)
PARTITION BY toYYYYMM(create_time)
ORDER BY (category_id, create_time);

-- 分布式表（查询入口）
CREATE TABLE orders_distributed ON CLUSTER olap_cluster
AS orders_local
ENGINE = Distributed(
    'olap_cluster',      -- 集群名
    'default',           -- 数据库名
    'orders_local',      -- 本地表名
    rand()               -- 分片键（rand 均匀分布）
);

七、查询优化实战

7.1 优化 Checklist

查询优化 7 条军规：

1. □ 分区裁剪
     WHERE 必须包含分区键 → PARTITION BY toYYYYMM(create_time)
     错误: WHERE create_time > '2026-05-01'  （扫描所有分区）
     正确: WHERE toYYYYMM(create_time) = '202605' AND create_time > '2026-05-01'

2. □ 主键过滤
     ORDER BY = (category_id, create_time)
     WHERE category_id = 10           ← 命中主键，极快
     WHERE category_id = 10           ← 等值+范围，快
       AND create_time > '2026-05-01'
     WHERE create_time > '2026-05-01' ← 未命中主键前缀，全扫描

3. □ PREWHERE 优化
     PREWHERE 在读取列前过滤，减少 I/O
     适用：过滤率 > 30% 的列
     SELECT * FROM orders PREWHERE status = 'paid'
     ClickHouse 自动选择是否使用 PREWHERE，手动指定强制使用

4. □ 避免 SELECT *
     只选需要的列。列式存储下每减少一列 = 减少一次 I/O
     SELECT category_id, sum(amount)  ← 只读2列，比*快10x+

5. □ 用物化视图/SummingMergeTree替代实时聚合
     明细表直接 count() → 慢（全扫描）
     SummingMergeTree → 快（预聚合）

6. □ JOIN 优化
     - 小表放右边（Build Side）
     - 用字典替代小表 JOIN
     - 大表 JOIN 考虑 Global JOIN（广播右表到所有分片）

7. □ 采样查询
     SELECT ... FROM orders SAMPLE 0.1   ← 10% 采样
     SAMPLE 必须配合 SAMPLE BY 主键使用

7.2 字典表

-- 字典表：替代 JOIN 小表的海量方案
-- 场景：订单表 JOIN 商品表（100万商品）

CREATE DICTIONARY product_dict (
    product_id   UInt32,
    product_name String,
    category     String,
    unit_price   Decimal64(2)
)
PRIMARY KEY product_id
SOURCE(CLICKHOUSE(
    HOST 'localhost'
    PORT 9000
    USER 'default'
    TABLE 'products'
    DB   'default'
))
LAYOUT(HASHED())       -- 内存哈希，查询 O(1)
LIFETIME(MIN 300 MAX 600)  -- 5-10分钟刷新
;

-- 使用 dictGet 替代 JOIN
SELECT
    order_id,
    product_id,
    dictGet('product_dict', 'product_name', product_id) AS product_name,
    dictGet('product_dict', 'category', product_id)     AS category,
    amount
FROM orders
WHERE create_time > now() - INTERVAL 1 HOUR;
-- dictGet 是内存查询，比 JOIN 快 10-100x

7.3 编码压缩对比

-- 压缩方案对比

-- ZSTD: 通用，平衡
ALTER TABLE orders MODIFY COLUMN amount Decimal64(2) CODEC(ZSTD(3));

-- Delta: 时间序列/递增ID
ALTER TABLE orders MODIFY COLUMN order_id UInt64 CODEC(Delta, ZSTD);

-- DoubleDelta: 时间戳
ALTER TABLE orders MODIFY COLUMN create_time DateTime CODEC(DoubleDelta, ZSTD);

-- T64: 随机整数
ALTER TABLE orders MODIFY COLUMN user_id UInt64 CODEC(T64, ZSTD);

-- Gorilla: 浮点数监控指标
ALTER TABLE metrics MODIFY COLUMN cpu_usage Float64 CODEC(Gorilla, ZSTD);

-- LowCardinality: 低基数字符串（status/category/channel）
ALTER TABLE orders MODIFY COLUMN status LowCardinality(String);
-- 字典编码，status列 100亿条 'paid' 只存一次

八、性能基准

场景	数据量	查询	耗时
:---	:---	:---	:---
单表聚合	1亿行	`SELECT category, count(), sum(amount) GROUP BY category`	0.05s
多维分析	5亿行	3维度 GROUP BY + WHERE	0.3s
TopN	10亿行	`ORDER BY xxx LIMIT 100`	0.8s
时间序列	50亿行	按天聚合 + 多个指标	1.2s
JOIN字典	1亿行	关联100万商品字典	0.04s
分布式查询	30亿行	3分片 GROUP BY	1.5s

> 硬件: 单节点 32C64G, NVMe SSD, 16核分配给 ClickHouse

九、生产部署

9.1 硬件配置建议

计算节点:
  CPU:  16-32 核（向量化/SIMD 密集）
  内存: 32-128 GB（ORDER BY 内存/字典缓存/缓存）
  磁盘: NVMe SSD, 2-8TB（IOPS > 集群吞吐的关键）
  网络: 10GbE

存储节点（存算分离）:
  · ClickHouse 原生不支持存算分离
  · S3 外部存储（Keeper 3.13+）：冷热分层
  · 方案：本地 SSD → 热数据 · S3 → 冷数据

9.2 核心配置

<!-- config.xml 核心参数 -->

<!-- 内存 -->
<max_server_memory_usage>42949672960</max_server_memory_usage>  <!-- 40GB -->
<max_memory_usage>34359738368</max_memory_usage>                <!-- 32GB per query -->

<!-- 并发 -->
<max_concurrent_queries>100</max_concurrent_queries>
<max_threads>16</max_threads>                    <!-- 等于 CPU 核心数 -->

<!-- 后台合并 -->
<background_pool_size>16</background_pool_size>
<number_of_free_entries_in_pool_to_execute_mutation>20</number_of_free_entries_in_pool_to_execute_mutation>

<!-- 分布式 -->
<distributed_ddl_entry_format_version>5</distributed_ddl_entry_format_version>
<prefer_localhost_replica>1</prefer_localhost_replica>  <!-- 优先本地副本 -->

<!-- 压缩 -->
<compression>
    <case>
        <method>zstd</method>
    </case>
</compression>

十、ClickHouse vs 其他 OLAP

维度	ClickHouse	StarRocks	Doris	DuckDB
:---	:---	:---	:---	:---
单表查询	★★★★★	★★★★	★★★★	★★★★
多表 JOIN	★★★	★★★★★	★★★★★	★★★
写入吞吐	★★★★★	★★★	★★★	★★
实时性	★★★★★	★★★★★	★★★★★	★★
运维复杂度	★★★	★★★★	★★★★	★★★★★
SQL 兼容	★★★	★★★★★	★★★★★	★★★★★
物化视图	★★★★★	★★★★	★★★★	N/A
适用场景	海量明细查询+聚合	多表 JOIN+联邦	多表 JOIN+湖仓	嵌入式/单机分析

选型建议

ClickHouse 最适合：
  ✅ 单表/星型模型（2-3表 JOIN）
  ✅ 海量明细扫描 + 聚合（> 1 亿行）
  ✅ 实时摄入 + 即时查询（< 3s 延迟）
  ✅ 物化视图预聚合
  ✅ 日志/事件/行为/指标分析

Doris/StarRocks 更适合：
  ⭐ 多表 JOIN / 雪花模型
  ⭐ 数据更新/删除频繁
  ⭐ 高并发点查（> 1000 QPS）
  ⭐ 联邦查询生态

版本历史

共 1 个版本

v1.0.0 1. 核心技能文档（19.9KB）： - ClickHouse 全景图与 MergeTree 引擎家族速查 - 表设计实战（订单分析表 MergeTree + 零售流水表 ReplacingMergeTree/SummingMergeTree 组合） - 物化视图三层聚合金字塔（明细 → 分钟 → 小时 → 天，每层压缩 60x） - 分布式集群架构（3 分片拓扑 + Distributed 代理表 + ON CLUSTER DDL） - 查询优化 7 条军规（分区裁剪、主键过滤、PREWHERE、字典替代 JOIN、编码压缩） - 性能基准（1 亿行聚合 0.05s / 50 亿行时间序列 1.2s） - vs StarRocks/Doris/DuckDB 选型对比 2. 实验环境脚本（6 个）： - docker-compose.yml（3 节点 ClickHouse 集群 + ZooKeeper + Python） - init_data.py（5000 万条订单数据生成 + 物化视图自动创建） - benchmark_queries.py（10 项查询基准测试：聚合/多维/TopN/采样/分区裁剪） - table_optimizer.py（自动分析表结构并给出优化建议） - setup_clickhouse.sh（一键部署） - clickhouse-config/（3 节点宏配置） 3. 深度参考文档（3 个）： - clickhouse-deep-dive.md（向量化执行/SIMD/稀疏索引/TTL/查询执行计划/数据类型最佳实践） - materialized-view-guide.md（物化视图三种模式实战 + 多级金字塔 + 聚合函数嵌套 + 常见问题） - clickhouse-ops.md（日常巡检 SQL + 4 种故障排查 + 系统表速查 + 备份恢复 + 升级 Checklist）当前

2026-06-01 21:18 安全

clickhouse-olap-accelerator

概述

ClickHouse OLAP 加速器

一、ClickHouse 全景图

二、快速开始

三、核心概念速查

3.1 列式存储 vs 行式存储

3.2 MergeTree 家族速查

四、表设计实战

4.1 订单分析表设计

4.2 零售销售流水表设计

五、物化视图实战

5.1 物化视图类型

5.2 多层聚合架构

六、分布式集群

6.1 集群架构

6.2 集群配置

七、查询优化实战

7.1 优化 Checklist

7.2 字典表

7.3 编码压缩对比

八、性能基准

九、生产部署

9.1 硬件配置建议

9.2 核心配置

十、ClickHouse vs 其他 OLAP

选型建议

版本历史

安全检测

腾讯云安全 (Keen)

腾讯云安全 (Sanbu)

🔗 相关推荐

向量数据库大师

Flink 实时数仓

RAG 知识库搭建