VictoriaMetrics 完全指南 / 06 - Prometheus 兼容与数据迁移

06 · Prometheus 兼容与数据迁移

本章目标

掌握 Prometheus 与 VictoriaMetrics 的 remote_write / remote_read 配置
了解如何将 Prometheus 数据平滑迁移到 VM
掌握 vmctl 数据迁移工具的使用
理解数据模型映射与兼容性细节

6.1 兼容性概览

VictoriaMetrics 提供了全面的 Prometheus 生态兼容：

┌────────────────────────────────────────────────┐
│            Prometheus 生态兼容                   │
├────────────────────┬───────────────────────────┤
│ 写入协议           │ ✅ remote_write (Prometheus│
│                    │    remote write protocol)  │
│ 查询协议           │ ✅ 完全兼容 PromQL          │
│ 采集协议           │ ✅ 支持 Prometheus scrape   │
│ 告警               │ ✅ vmalert 兼容 Alertmanager│
│ 命名规则           │ ✅ metric_name{labels}     │
│ 数据格式           │ ✅ OpenMetrics / Protobuf  │
│ 服务发现           │ ✅ 支持所有 SD 后端         │
│ Recording Rules    │ ✅ 完全支持                 │
│ Alerting Rules     │ ✅ 完全支持                 │
│ Metadata API       │ ✅ /api/v1/metadata        │
│ Target API         │ ✅ /api/v1/targets         │
└────────────────────┴───────────────────────────┘

6.2 remote_write 配置

6.2.1 基础配置

将 Prometheus 数据发送到 VictoriaMetrics：

# prometheus.yml
global:
  scrape_interval: 15s

remote_write:
  - url: "http://victoria-metrics:8428/api/v1/write"
    # 可选：队列配置
    queue_config:
      max_samples_per_send: 10000
      batch_send_deadline: 5s
      max_shards: 30
      capacity: 20000

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

6.2.2 带认证的配置

remote_write:
  - url: "http://victoria-metrics:8428/api/v1/write"
    # Basic 认证
    basic_auth:
      username: "admin"
      password: "secret"

    # 或者 Bearer Token 认证
    # bearer_token: "your-token"

    # 或者 TLS 客户端证书
    # tls_config:
    #   cert_file: /path/to/cert.pem
    #   key_file: /path/to/key.pem
    #   ca_file: /path/to/ca.pem

6.2.3 数据重标记（Relabeling）

在写入前过滤或修改指标：

remote_write:
  - url: "http://victoria-metrics:8428/api/v1/write"
    write_relabel_configs:
      # 只保留 prod 环境的指标
      - source_labels: [env]
        regex: "prod"
        action: keep

      # 丢弃不需要的指标
      - source_labels: [__name__]
        regex: "go_.*"
        action: drop

      # 添加自定义标签
      - target_label: "source"
        replacement: "prometheus"

6.2.4 发送队列参数详解

参数	默认值	说明
`max_samples_per_send`	5000	每批发送的最大样本数
`batch_send_deadline`	5s	批次最大等待时间
`max_shards`	2000	最大并行发送分片数
`min_shards`	1	最小分片数
`capacity`	2500	队列容量
`max_backoff`	5s	最大重试间隔
`min_backoff`	30ms	最小重试间隔
`retry_on_http_429`	false	429 时是否重试

# 高吞吐场景配置
remote_write:
  - url: "http://victoria-metrics:8428/api/v1/write"
    queue_config:
      max_samples_per_send: 20000
      batch_send_deadline: 2s
      max_shards: 50
      min_shards: 10
      capacity: 50000
      retry_on_http_429: true

6.3 remote_read 配置

6.3.1 基础配置

让 Prometheus 从 VictoriaMetrics 读取历史数据：

remote_read:
  - url: "http://victoria-metrics:8428/api/v1/read"
    # 只在本地无数据时查询远程
    read_recent: false
    # 查询超时
    remote_timeout: 30s

6.3.2 remote_read 参数

参数	默认值	说明
`read_recent`	false	是否总是查询远程存储
`remote_timeout`	30s	查询超时时间
`required_matchers`	无	只查询匹配的序列

remote_read:
  - url: "http://victoria-metrics:8428/api/v1/read"
    read_recent: true  # 始终查询远程（合并本地和远程数据）
    remote_timeout: 60s
    required_matchers:
      job: "node"

6.4 代理模式（Prometheus + VM）

6.4.1 架构模式

推荐架构（渐进式迁移）：
┌──────────┐
│Prometheus│ ──remote_write──▶ VictoriaMetrics（长期存储）
│          │ ──本地查询─────▶ 本地 TSDB（近期数据）
└──────────┘
    │
    ├── 保留 Prometheus 作为数据采集入口
    ├── VM 作为长期存储和历史数据查询
    └── Grafana 配置双数据源（近期用 Prometheus，历史用 VM）

6.4.2 Prometheus Agent 模式

Prometheus 2.37+ 支持 Agent 模式，只做采集不做存储：

# prometheus.yml
global:
  scrape_interval: 15s

# Prometheus Agent 模式启动参数:
# --enable-feature=agent
# --storage.agent.path=/data/agent
# --storage.agent.wal-compression

remote_write:
  - url: "http://victoria-metrics:8428/api/v1/write"
    queue_config:
      max_samples_per_send: 10000
      capacity: 50000

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

启动命令：

prometheus \
    --config.file=prometheus.yml \
    --enable-feature=agent \
    --storage.agent.path=/data/agent \
    --storage.agent.wal-compression \
    --web.listen-address=:9090

6.5 vmctl 数据迁移工具

6.5.1 简介

vmctl 是 VictoriaMetrics 提供的数据迁移工具，支持多种来源：

支持的迁移源：
├── Prometheus → VM (单节点/集群)
├── Thanos → VM
├── Cortex → VM
├── InfluxDB → VM
├── OpenTSDB → VM
├── Mimir → VM
└── VM (单节点) → VM (集群)

6.5.2 安装 vmctl

# 从 GitHub 下载
VM_VERSION="v1.106.0"
curl -LO "https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/${VM_VERSION}/vmctl-linux-amd64-${VM_VERSION}.tar.gz"
tar xzf "vmctl-linux-amd64-${VM_VERSION}.tar.gz"
sudo mv vmctl-prod /usr/local/bin/vmctl
chmod +x /usr/local/bin/vmctl

6.5.3 从 Prometheus 迁移

# 基础迁移命令
vmctl prometheus \
    --prom-snapshot=/path/to/prometheus/snapshot \
    --vm-addr=http://localhost:8428 \
    --vm-concurrency=10

# 使用 Prometheus API 迁移（在线迁移）
vmctl prometheus \
    --promAddr=http://localhost:9090 \
    --vm-addr=http://localhost:8428 \
    --prom-filter-time-start="2024-01-01T00:00:00Z" \
    --prom-filter-time-end="2024-06-30T23:59:59Z" \
    --prom-filter="{job='node'}" \
    --vm-concurrency=5

6.5.4 从 Prometheus 快照迁移（推荐）

# 步骤 1：创建 Prometheus 快照
curl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot

# 步骤 2：找到快照目录
ls /data/prometheus/snapshots/

# 步骤 3：使用 vmctl 迁移
vmctl prometheus \
    --prom-snapshot=/data/prometheus/snapshots/<snapshot_id> \
    --vm-addr=http://localhost:8428 \
    --vm-concurrency=16 \
    --prom-concurrency=4

# 步骤 4：确认迁移结果
curl 'http://localhost:8428/api/v1/status/tsdb'

6.5.5 从 InfluxDB 迁移

# 基础迁移
vmctl influx \
    --influx-addr=http://localhost:8086 \
    --influx-db=mydb \
    --influx-retention-policy=autogen \
    --vm-addr=http://localhost:8428 \
    --vm-concurrency=5

# 带过滤条件
vmctl influx \
    --influx-addr=http://localhost:8086 \
    --influx-db=mydb \
    --influx-measurement=cpu \
    --influx-filter="time >= '2024-01-01T00:00:00Z'" \
    --vm-addr=http://localhost:8428 \
    --influx-concurrency=3

6.5.6 VM 单节点到集群迁移

# 从 VM 单节点迁移到集群版
vmctl vm-native \
    --vm-src-addr=http://vm-single:8428 \
    --vm-dst-addr=http://vminsert:8480 \
    --vm-native-filter-time-start="2024-01-01T00:00:00Z" \
    --vm-native-filter-match="{job='node'}" \
    --vm-concurrency=10

6.5.7 vmctl 常用参数

参数	说明
`--vm-addr`	VM 目标地址
`--vm-concurrency`	并发写入数
`--vm-batch-size`	每批写入的样本数
`--vm-account-id`	集群版租户 ID
`--prom-filter`	Prometheus 系列过滤
`--prom-filter-time-start`	起始时间
`--prom-filter-time-end`	结束时间
`--prom-concurrency`	并发读取数

6.6 vmagent：轻量级采集代理

6.6.1 vmagent vs Prometheus

特性	Prometheus	vmagent
内存占用	较高	低 5-10x
Push 支持	不支持	支持
多 remote_write	需 relabel	原生支持
Kafka 集成	插件	内置
适用场景	独立监控	采集代理

6.6.2 vmagent 配置

# prometheus.yml (vmagent 使用相同格式)
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets:
          - 'web01:9100'
          - 'web02:9100'
          - 'db01:9100'

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

# 启动 vmagent
vmagent \
    -promscrape.config=/etc/vmagent/prometheus.yml \
    -remoteWrite.url=http://victoria-metrics:8428/api/v1/write \
    -remoteWrite.tmpDataPath=/data/vmagent-remotewrite-data \
    -remoteWrite.maxDailyIngestionRate=10GB \
    -httpListenAddr=:8429

6.6.3 多目标写入

# 同时写入多个 VictoriaMetrics 实例（HA）
vmagent \
    -remoteWrite.url=http://vm1:8428/api/v1/write \
    -remoteWrite.url=http://vm2:8428/api/v1/write \
    -remoteWrite.label="cluster=prod"

6.7 数据模型映射

6.7.1 Prometheus → VictoriaMetrics

Prometheus 概念	VictoriaMetrics 概念
Metric name	`__name__` label
Label	Label（完全相同）
Sample	(timestamp, value)
Exemplar	暂不支持
Metadata	可选保留

6.7.2 标签注意事项

# Prometheus 中的 __address__ 标签
# 迁移后会变成普通 label
{__name__="up", __address__="web01:9100", instance="web01:9100", job="node"}

# 建议在 remote_write 时清理冗余标签
write_relabel_configs:
  - action: labeldrop
    regex: "__meta_.*"

6.8 完整迁移方案

迁移步骤清单

步骤	操作	验证
1	部署 VictoriaMetrics	健康检查通过
2	配置 Prometheus remote_write	数据开始流入 VM
3	配置 Grafana 双数据源	两个源均可查询
4	使用 vmctl 迁移历史数据	历史数据可查询
5	切换 Grafana 默认数据源到 VM	仪表盘正常显示
6	停止 Prometheus 本地存储	减少资源占用
7	（可选）切换到 vmagent	进一步降低资源

业务场景：零停机迁移

时间线：
T0 - 部署 VM + 配置 remote_write
     ↓ (VM 开始接收新数据)
T1 - 使用 vmctl 迁移 T0 之前的历史数据
     ↓ (历史数据补全)
T2 - Grafana 添加 VM 数据源
     ↓ (双源并行查询)
T3 - 切换默认数据源到 VM
     ↓ (验证所有面板正常)
T4 - 停止 Prometheus remote_write
     ↓ (观察是否影响)
T5 - 关闭 Prometheus（完成迁移）

本章小结

要点	内容
remote_write	Prometheus 原生支持，配置简单
remote_read	让 Prometheus 从 VM 查询历史数据
vmctl	官方迁移工具，支持多种数据源
vmagent	轻量级采集代理，推荐替代 Prometheus
迁移策略	渐进式迁移，零停机