强曰为道
与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

ClickHouse 教程 / 数据类型详解

数据类型详解

ClickHouse 提供了丰富的数据类型体系,合理选择数据类型对查询性能和存储空间至关重要。


1. 整数类型

ClickHouse 支持有符号和无符号整数,范围比传统数据库更灵活。

类型字节数范围(有符号)范围(无符号)
Int8 / UInt81-128 ~ 1270 ~ 255
Int16 / UInt162-32768 ~ 327670 ~ 65535
Int32 / UInt324-2^31 ~ 2^31-10 ~ 2^32-1
Int64 / UInt648-2^63 ~ 2^63-10 ~ 2^64-1
Int128 / UInt12816±2^1270 ~ 2^128-1
Int256 / UInt25632±2^2550 ~ 2^256-1
-- 创建使用不同整数类型的表
CREATE TABLE integer_demo
(
    id         UInt64,
    age        UInt8,       -- 年龄 0~255 足够
    score      Int16,       -- 分数可为负
    amount     UInt32,      -- 金额(分为单位)
    bigint_id UInt256       -- 区块链地址等超大整数
)
ENGINE = MergeTree()
ORDER BY id;

INSERT INTO integer_demo VALUES (1, 25, 95, 1999, 115792089237316195423570985008687907853269984665640564039457584007913129639935);

SELECT toTypeName(age), toTypeName(bigint_id);

💡 提示:使用最小够用的整数类型,能显著减少存储空间和提升查询速度。例如 UInt8 只占 1 字节,比 UInt64 节省 87.5%。


2. 浮点类型

类型字节数精度说明
Float324~7 位有效数字单精度
Float648~16 位有效数字双精度
CREATE TABLE float_demo
(
    id UInt64,
    latitude  Float64,     -- 纬度需要高精度
    longitude Float64,     -- 经度
    score     Float32      -- 评分用单精度即可
)
ENGINE = MergeTree()
ORDER BY id;

-- 浮点数精度问题演示
SELECT 0.1 + 0.2 AS result;
-- 输出: 0.30000000000000004

SELECT toDecimal64(0.1, 10) + toDecimal64(0.2, 10) AS result;
-- 输出: 0.3000000000 (精确)

⚠️ 注意:浮点数存在精度误差,涉及金额计算必须使用 Decimal 类型!


3. Decimal 定点数

类型精度(总位数)整数位数小数位数
Decimal32(S)1~99-SS
Decimal64(S)1~1818-SS
Decimal128(S)1~3838-SS
Decimal256(S)1~7676-SS
CREATE TABLE decimal_demo
(
    order_id   UInt64,
    -- 金额:最多 10 位整数 + 2 位小数
    amount     Decimal128(2),
    -- 汇率:2 位整数 + 6 位小数
    rate       Decimal64(6),
    -- 精确百分比:3 位小数
    discount   Decimal32(3)
)
ENGINE = MergeTree()
ORDER BY order_id;

INSERT INTO decimal_demo VALUES
    (1, 1234567890.99, 7.123456, 0.888),
    (2, 0.01, 0.000001, 0.001);

SELECT
    amount,
    rate,
    amount * rate AS calculated
FROM decimal_demo;

-- Decimal 运算精度
SELECT toDecimal64(100, 2) / toDecimal64(3, 2) AS result;
-- 输出: 33.33

💡 提示:金额字段推荐使用 Decimal128(2)(以"元"存储)或 UInt64(以"分"存储)。两者各有利弊:Decimal 直观但运算较慢,UInt64 运算快但需要应用层转换。


4. 字符串类型

类型说明适用场景
String变长字符串,无长度限制通用文本
FixedString(N)定长 N 字节固定长度编码(如 MD5、UUID 手写)
LowCardinality(String)低基数优化字符串枚举值少的字段(如国家、状态)
CREATE TABLE string_demo
(
    id       UInt64,
    name     String,                     -- 普通字符串
    md5      FixedString(32),            -- MD5 哈希(固定 32 字符)
    country  LowCardinality(String),     -- 国家(低基数优化)
    status   LowCardinality(String)      -- 状态:'active','inactive','deleted'
)
ENGINE = MergeTree()
ORDER BY id;

INSERT INTO string_demo VALUES
    (1, 'Alice', '5d41402abc4b2a76b9719d911017c592', 'China', 'active'),
    (2, 'Bob',   '7d793037a0760186574b0282f2f435e7', 'USA',   'inactive'),
    (3, 'Carol', 'b6d767d2f8ed5d21a44b0e5886680cb9', 'China', 'active');

-- LowCardinality 会自动维护字典,查询时可直接比较字典索引
SELECT country, count() AS cnt
FROM string_demo
GROUP BY country;

⚠️ 注意LowCardinality(String) 适合唯一值在 10000 以下的列。如果唯一值过多(如 user_id),使用 LowCardinality 反而会降低性能。

4.1 String 常用函数

SELECT
    lower('Hello World')         AS lower_case,
    upper('Hello World')         AS upper_case,
    substring('Hello', 1, 3)     AS substr,
    length('Hello')              AS len,
    replace('Hello', 'l', 'L')  AS replaced,
    concat('Hello', ' ', 'World') AS concatenated,
    splitByChar(',', 'a,b,c')   AS arr,          -- 返回数组
    match('Hello123', '\\d+')    AS has_digits    -- 正则匹配
;

5. 日期与时间类型

类型字节数范围精度
Date21970-01-01 ~ 2149-06-06
Date3241900-01-01 ~ 2299-12-31
DateTime41970-01-01 ~ 2106-02-07
DateTime64(N)8同 DateTime10^(-N) 秒
CREATE TABLE datetime_demo
(
    event_id    UInt64,
    event_date  Date,                           -- 精确到天
    created_at  DateTime,                       -- 精确到秒
    timestamp   DateTime64(3),                  -- 精确到毫秒
    event_time  DateTime64(6, 'Asia/Shanghai')  -- 带时区,精确到微秒
)
ENGINE = MergeTree()
ORDER BY event_date;

INSERT INTO datetime_demo VALUES
    (1, '2026-05-11', '2026-05-11 10:30:00', '2026-05-11 10:30:00.123', '2026-05-11 10:30:00.123456');

-- 日期时间函数
SELECT
    toDayOfWeek(now())              AS day_of_week,    -- 星期几
    toStartOfMonth(now())           AS month_start,    -- 本月第一天
    toStartOfQuarter(now())         AS quarter_start,  -- 本季度第一天
    toStartOfFifteenMinutes(now())  AS fifteen_min,    -- 15 分钟对齐
    formatDateTime(now(), '%Y年%m月%d日') AS formatted;  -- 格式化

-- 时间间隔
SELECT
    now() AS current,
    now() - INTERVAL 7 DAY  AS seven_days_ago,
    now() + INTERVAL 1 YEAR AS one_year_later;

⚠️ 注意Date 类型范围仅从 1970 年开始。如需存储更早的历史日期,请使用 Date32


6. 枚举类型

类型范围字节数
Enum8-128 ~ 1271
Enum16-32768 ~ 327672
CREATE TABLE enum_demo
(
    order_id  UInt64,
    status    Enum8('pending' = 1, 'processing' = 2, 'shipped' = 3, 'delivered' = 4),
    priority  Enum8('low' = 1, 'medium' = 2, 'high' = 3, 'urgent' = 4)
)
ENGINE = MergeTree()
ORDER BY order_id;

INSERT INTO enum_demo VALUES
    (1, 'pending', 'high'),
    (2, 'shipped', 'low'),
    (3, 'delivered', 'medium');

-- 枚举按内部数值排序
SELECT status, count() AS cnt
FROM enum_demo
GROUP BY status
ORDER BY status;

-- 错误:插入不存在的枚举值会报错
-- INSERT INTO enum_demo VALUES (4, 'cancelled', 'low');
-- Error: Unknown element 'cancelled' for type Enum8(...)

💡 提示:枚举类型在存储上仅占 1~2 字节,比 String 更省空间,且能在插入时校验值的合法性。但枚举值不能动态添加,需提前规划。


7. 复合类型

7.1 数组 Array(T)

CREATE TABLE array_demo
(
    user_id  UInt64,
    tags     Array(String),
    scores   Array(UInt32),
    events   Array(DateTime64(3))
)
ENGINE = MergeTree()
ORDER BY user_id;

INSERT INTO array_demo VALUES
    (1, ['tech', 'ai', 'python'], [95, 87, 92], ['2026-05-01 10:00:00.000', '2026-05-02 14:30:00.000']),
    (2, ['music', 'travel'],       [78, 85],     ['2026-05-01 11:00:00.000']);

-- 数组展开
SELECT user_id, tag
FROM array_demo
ARRAY JOIN tags AS tag;

-- 数组操作
SELECT
    arrayFilter(x -> x > 80, scores)  AS high_scores,
    arraySort(scores)                 AS sorted,
    arraySum(scores)                  AS total,
    arrayUniq(tags)                   AS unique_tags,
    has(tags, 'ai')                   AS has_ai,
    length(tags)                      AS tag_count
FROM array_demo;

7.2 元组 Tuple

-- 元组常用于函数返回多个值
SELECT
    (1, 'Alice', 28) AS person,
    tuple.1          AS id,        -- 访问第一个元素
    tuple.2          AS name,
    tuple.3          AS age;

-- 典型用途:map 函数返回 Tuple
SELECT map('a', 1, 'b', 2) AS my_map;

7.3 Map(K, V)

CREATE TABLE map_demo
(
    user_id UInt64,
    props   Map(String, String)
)
ENGINE = MergeTree()
ORDER BY user_id;

INSERT INTO map_demo VALUES
    (1, {'city': 'Beijing', 'level': 'vip', 'score': '95'}),
    (2, {'city': 'Shanghai', 'level': 'normal'});

-- 访问 Map
SELECT
    user_id,
    props['city']  AS city,
    props['level'] AS level,
    props['score'] AS score  -- 不存在的 key 返回空字符串
FROM map_demo;

7.4 嵌套 Nested

CREATE TABLE nested_demo
(
    order_id   UInt64,
    items Nested(
        product_id UInt64,
        product_name String,
        quantity UInt16,
        price Decimal128(2)
    )
)
ENGINE = MergeTree()
ORDER BY order_id;

INSERT INTO nested_demo VALUES
    (1, [101, 102], ['Laptop', 'Mouse'], [1, 2], [5999.00, 99.00]),
    (2, [103],      ['Keyboard'],         [3],     [299.00]);

-- 查询嵌套字段
SELECT
    order_id,
    items.product_name AS names,
    items.quantity     AS quantities,
    items.price        AS prices
FROM nested_demo;

-- 展开嵌套
SELECT order_id, product_name, quantity, price
FROM nested_demo
ARRAY JOIN items;

8. Nullable 类型

CREATE TABLE nullable_demo
(
    id         UInt64,
    name       String,
    age        Nullable(UInt8),
    email      Nullable(String),
    score      Nullable(Float64)
)
ENGINE = MergeTree()
ORDER BY id;

INSERT INTO nullable_demo VALUES
    (1, 'Alice', 28, '[email protected]', 95.5),
    (2, 'Bob',   NULL, NULL, NULL),
    (3, 'Carol', 22, '[email protected]', NULL);

-- NULL 值处理
SELECT
    name,
    age,
    if(age IS NULL, 'Unknown', toString(age)) AS age_display,
    coalesce(email, 'N/A') AS email_display,
    assumeNotNull(score) AS score_not_null   -- 如果 score 为 NULL 则会报错
FROM nullable_demo;

⚠️ 注意Nullable 类型会额外占用 1 字节的标记位,且使索引和压缩失效。在列式数据库中,建议使用特殊值(如 0'')代替 NULL


9. LowCardinality

LowCardinality 是 ClickHouse 独有的优化类型,对低唯一值列进行字典编码:

-- 效果对比
CREATE TABLE lc_demo
(
    event_date Date,
    event_type LowCardinality(String),  -- 唯一值少,性能好
    user_id    UInt64,                   -- 唯一值多,不适合 LowCardinality
    country    LowCardinality(String),
    device     LowCardinality(String)
)
ENGINE = MergeTree()
ORDER BY (event_date, event_type);

-- 查看字典大小
SELECT
    column,
    formatReadableSize(data_compressed_bytes) AS compressed_size,
    formatReadableSize(data_uncompressed_bytes) AS uncompressed_size
FROM system.columns
WHERE table = 'lc_demo' AND database = currentDatabase();

💡 提示LowCardinality 字段在 GROUP BYWHERE 查询中性能极佳,因为直接比较字典索引而非字符串值。建议用于唯一值 < 10000 的列。


10. 特殊类型

10.1 UUID

CREATE TABLE uuid_demo
(
    id     UUID DEFAULT generateUUIDv4(),
    name   String
)
ENGINE = MergeTree()
ORDER BY id;

INSERT INTO uuid_demo (name) VALUES ('Alice'), ('Bob');

SELECT * FROM uuid_demo;

10.2 IPv4 / IPv6

CREATE TABLE ip_demo
(
    ts       DateTime,
    ip       IPv4,
    ip6      IPv6
)
ENGINE = MergeTree()
ORDER BY ts;

INSERT INTO ip_demo VALUES
    (now(), '192.168.1.1', '::1'),
    (now(), '10.0.0.1', '2001:db8::1');

-- IPv4 子网匹配
SELECT * FROM ip_demo WHERE ip = toIPv4('192.168.1.1');
SELECT * FROM ip_demo WHERE IPv4CIDRToRange(ip, 16) = (toIPv4('192.168.0.0'), toIPv4('192.168.255.255'));

10.3 JSON / Object

-- ClickHouse 23.x+ 支持原生 JSON 类型(实验特性)
CREATE TABLE json_demo
(
    id   UInt64,
    data JSON
)
ENGINE = MergeTree()
ORDER BY id;

INSERT INTO json_demo FORMAT JSONEachRow
{"id": 1, "data": {"name": "Alice", "age": 28, "tags": ["admin", "user"]}}
{"id": 2, "data": {"name": "Bob", "score": 95.5}};

-- 访问 JSON 字段
SELECT
    id,
    data.name AS name,
    data.age  AS age,
    data.tags AS tags
FROM json_demo;

11. 类型选择最佳实践

业务场景推荐类型说明
主键 IDUInt64自增或雪花 ID
用户年龄UInt8范围 0~255 足够
金额Decimal128(2) 或 UInt64Decimal 直观,UInt64 运算快
状态字段LowCardinality(String) 或 Enum8低基数,存储小
日志文本String无需长度限制
哈希值FixedString(32/64)MD5/SHA 长度固定
时间戳DateTime64(3)毫秒精度足够大多数场景
标签列表Array(String)多值属性
动态属性Map(String, String)灵活的 KV 结构
地理坐标Float64精度到小数点后 8 位
-- 完整的类型选择示例
CREATE TABLE orders
(
    order_id      UInt64,                             -- 订单 ID
    user_id       UInt64,                             -- 用户 ID
    order_date    Date,                               -- 订单日期
    created_at    DateTime64(3, 'Asia/Shanghai'),      -- 创建时间(毫秒)
    status        LowCardinality(String),              -- 状态
    amount        Decimal128(2),                       -- 订单金额
    quantity      UInt32,                              -- 数量
    product_ids   Array(UInt64),                       -- 商品 ID 列表
    extra_info    Map(String, String),                 -- 扩展信息
    ip            IPv4,                                -- IP 地址
    trace_id      UUID                                 -- 链路追踪 ID
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(order_date)
ORDER BY (order_date, user_id, order_id);

扩展阅读

  1. ClickHouse 数据类型文档
  2. LowCardinality 详解
  3. JSON 类型文档
  4. Decimal 类型文档
  5. Nested 类型文档