Prometheus 完全指南 / 10 - 服务发现
10 - 服务发现
10.1 概述
服务发现(Service Discovery)是 Prometheus 的核心能力之一,允许动态发现和管理监控目标,无需手动维护目标列表。
服务发现类型
| 类型 | 配置关键字 | 适用场景 |
|---|---|---|
| 静态配置 | static_configs | 固定服务器、小型环境 |
| 文件发现 | file_sd_configs | 自定义脚本生成目标 |
| Kubernetes | kubernetes_sd_configs | K8s 集群 |
| Consul | consul_sd_configs | 微服务注册中心 |
| DNS | dns_sd_configs | 域名解析 |
| EC2 | ec2_sd_configs | AWS |
| GCE | gce_sd_configs | Google Cloud |
| Azure | azure_sd_configs | Azure |
| Marathon | marathon_sd_configs | DC/OS |
| Eureka | eureka_sd_configs | Spring Cloud |
| Triton | triton_sd_configs | Joyent Triton |
10.2 静态配置(Static Config)
最简单的方式,直接指定目标地址列表。
scrape_configs:
- job_name: 'my-app'
static_configs:
- targets:
- 'host1:8080'
- 'host2:8080'
- 'host3:8080'
labels:
env: production
team: backend
适用场景:
- 开发/测试环境
- 服务器数量少且固定
- 不使用服务注册中心
缺点:
- 需要手动维护目标列表
- 添加/删除目标需修改配置并重载
10.3 文件发现(File SD)
通过文件定义目标列表,文件变化时自动重载。
scrape_configs:
- job_name: 'file-sd'
file_sd_configs:
- files:
- /etc/prometheus/targets/*.json
- /etc/prometheus/targets/*.yml
refresh_interval: 5m # 默认 5m
JSON 格式
[
{
"targets": ["web1:8080", "web2:8080"],
"labels": {
"env": "production",
"service": "api",
"team": "backend"
}
},
{
"targets": ["db1:9104"],
"labels": {
"env": "production",
"service": "mysql"
}
}
]
YAML 格式
- targets:
- web1:8080
- web2:8080
labels:
env: production
service: api
- targets:
- db1:9104
labels:
env: production
service: mysql
动态生成脚本
#!/bin/bash
# generate_targets.sh - 从 CMDB 动态生成目标列表
OUTPUT="/etc/prometheus/targets/generated.json"
curl -s "http://cmdb.internal/api/hosts?service=api" | \
jq '[.[] | {targets: [.hostname + ":8080"], labels: {env: .env, team: .team}}]' \
> ${OUTPUT}.tmp
mv ${OUTPUT}.tmp ${OUTPUT}
# Crontab: 每 5 分钟更新
*/5 * * * * /opt/scripts/generate_targets.sh
10.4 Kubernetes 服务发现
Kubernetes 是 Prometheus 最常用的服务发现方式之一。
角色类型
| 角色 | 说明 | 元标签 |
|---|---|---|
node | 节点 | __meta_kubernetes_node_name, __meta_kubernetes_node_label_* |
pod | Pod | __meta_kubernetes_pod_name, __meta_kubernetes_pod_label_*, __meta_kubernetes_pod_annotation_* |
service | Service | __meta_kubernetes_service_name, __meta_kubernetes_service_label_* |
endpoints | Endpoints | 包含 Pod IP 和端口 |
endpointslice | EndpointSlice | K8s 1.21+ 推荐 |
ingress | Ingress | __meta_kubernetes_ingress_name |
Pod 发现
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
# 只抓取有 prometheus.io/scrape: "true" 注解的 Pod
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# 使用自定义指标路径
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# 使用自定义端口
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# 添加 Pod 标签
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- source_labels: [__meta_kubernetes_pod_label_app]
target_label: app
Pod 注解约定
# 在 Deployment 中添加注解
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
metadata:
annotations:
prometheus.io/scrape: "true" # 启用抓取
prometheus.io/port: "8080" # 指标端口
prometheus.io/path: "/metrics" # 指标路径
spec:
containers:
- name: my-app
ports:
- containerPort: 8080
Service 发现
scrape_configs:
- job_name: 'kubernetes-services'
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}:${2}
Node 发现
scrape_configs:
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
# 使用节点地址的 9100 端口(Node Exporter)
- source_labels: [__address__]
regex: '(.+):(\d+)'
target_label: __address__
replacement: '${1}:9100'
- source_labels: [__meta_kubernetes_node_name]
target_label: node
Ingress 发现
scrape_configs:
- job_name: 'kubernetes-ingresses'
kubernetes_sd_configs:
- role: ingress
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_scrape]
action: keep
regex: true
使用 EndpointSlice(推荐)
scrape_configs:
- job_name: 'kubernetes-endpointslices'
kubernetes_sd_configs:
- role: endpointslice
relabel_configs:
- source_labels: [__meta_kubernetes_endpointslice_annotation_prometheus_io_scrape]
action: keep
regex: true
10.5 Consul 服务发现
Consul 是 HashiCorp 的服务网格和服务发现工具。
基本配置
scrape_configs:
- job_name: 'consul'
consul_sd_configs:
- server: 'consul.internal:8500'
tags:
- 'prometheus' # 只发现带此标签的服务
services: [] # 空列表 = 所有服务
relabel_configs:
# 使用 Consul 服务名作为 job 标签
- source_labels: [__meta_consul_service]
target_label: job
# 使用 Consul 节点名作为 instance 标签
- source_labels: [__meta_consul_node]
target_label: instance
# 添加数据中心标签
- source_labels: [__meta_consul_dc]
target_label: datacenter
# 添加服务标签
- source_labels: [__meta_consul_tags]
regex: ',(?:[^,]+,)*prometheus-path=([^,]+),.*'
target_label: __metrics_path__
Consul 服务注册
{
"service": {
"name": "api-service",
"port": 8080,
"tags": ["prometheus"],
"meta": {
"prometheus_path": "/metrics",
"prometheus_port": "8080"
},
"check": {
"http": "http://localhost:8080/health",
"interval": "10s"
}
}
}
10.6 DNS 服务发现
通过 DNS SRV 记录或 A 记录发现目标。
SRV 记录
scrape_configs:
- job_name: 'dns-srv'
dns_sd_configs:
- names:
- '_prometheus._tcp.example.com'
type: SRV
refresh_interval: 30s
# DNS SRV 记录示例
_prometheus._tcp.example.com. IN SRV 10 60 9100 node1.example.com.
_prometheus._tcp.example.com. IN SRV 10 60 9100 node2.example.com.
A 记录
scrape_configs:
- job_name: 'dns-a'
dns_sd_configs:
- names:
- 'nodes.example.com'
type: A
port: 9100
refresh_interval: 30s
10.7 EC2 服务发现
scrape_configs:
- job_name: 'ec2'
ec2_sd_configs:
- region: 'us-east-1'
access_key: '<access_key>'
secret_key: '<secret_key>'
port: 9100
filters:
- name: 'tag:Environment'
values: ['production']
- name: 'instance-state-name'
values: ['running']
relabel_configs:
- source_labels: [__meta_ec2_tag_Name]
target_label: instance
- source_labels: [__meta_ec2_tag_Environment]
target_label: env
- source_labels: [__meta_ec2_tag_Team]
target_label: team
10.8 Relabel 进阶
基于元标签过滤
relabel_configs:
# 只保留生产环境
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: 'production'
# 丢弃特定 Pod
- source_labels: [__meta_kubernetes_pod_name]
action: drop
regex: '.*-debug.*'
# 只保留特定注解的 Pod
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: 'true'
标签映射
relabel_configs:
# 映射所有 Kubernetes 标签
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
# 映射特定标签
- action: labelmap
regex: __meta_kubernetes_service_label_(app|version)
哈希分片
# 将目标分片到多个 Prometheus 实例
relabel_configs:
- source_labels: [__address__]
modulus: 3 # 分为 3 片
target_label: __tmp_shard
action: hashmod
- source_labels: [__tmp_shard]
regex: 0 # 当前实例只处理第 0 片
action: keep
10.9 多环境配置
scrape_configs:
# 生产环境
- job_name: 'k8s-prod'
kubernetes_sd_configs:
- role: pod
api_server: 'https://k8s-prod.internal:6443'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- target_label: environment
replacement: production
# 测试环境
- job_name: 'k8s-staging'
kubernetes_sd_configs:
- role: pod
api_server: 'https://k8s-staging.internal:6443'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- target_label: environment
replacement: staging
10.10 本章小结
| 发现方式 | 适用场景 | 动态能力 |
|---|---|---|
| static_configs | 小型环境/测试 | ❌ 手动 |
| file_sd_configs | 自定义脚本 | ✅ 文件变化 |
| kubernetes_sd_configs | K8s 集群 | ✅ API 驱动 |
| consul_sd_configs | Consul 微服务 | ✅ 注册中心 |
| dns_sd_configs | DNS 环境 | ✅ DNS 记录 |
| ec2_sd_configs | AWS | ✅ API 驱动 |
扩展阅读
上一章:09 - 录制规则 下一章:11 - Exporter 生态