第11章:性能数据与可视化
第11章:性能数据与可视化
性能数据(Performance Data)是监控系统的重要输出,用于趋势分析和容量规划。本章详解性能数据格式、PNP4Nagios 图表工具、Grafana 集成和数据导出方案。
一、性能数据基础
1.1 性能数据格式
# 标准格式:
# label=value[UOM];[warn];[crit];[min];[max]
# 完整示例:
check_http 输出:
HTTP OK - Response time = 0.15s | time=0.15s;1;5;0; size=15234B;;;0; pages=25;;;0;
# 各字段说明:
# time = 标签名
# 0.15s = 当前值 + 单位
# 1 = 警告阈值
# 5 = 严重阈值
# 0 = 最小值
# (空) = 最大值
1.2 常用单位(UOM)
| 单位 | 含义 | 示例 |
|---|---|---|
| (无) | 无单位数值 | users=5;10;20 |
% | 百分比 | cpu=75%;80;95 |
s | 秒 | time=0.15s;1;5 |
ms | 毫秒 | rta=10.50ms;100;200 |
B | 字节 | size=1024B;;;0; |
KB | 千字节 | mem=512KB;;;0; |
MB | 兆字节 | mem=1024MB;;;0; |
GB | 吉字节 | disk=50GB;;;0; |
TB | 太字节 | disk=2TB;;;0; |
c | 计数器 | errors=5c;;;0; |
1.3 多性能数据项
# 用空格分隔多个性能数据项
check_disk 输出:
DISK OK - free space: / 1024 MB (80%), /var 512 MB (60%) | /=1024MB;800;400;0;1280 /var=512MB;400;200;0;800
# 解析:
# / 分区: 1024MB, 警告=800MB, 严重=400MB, 最小=0, 最大=1280MB
# /var 分区: 512MB, 警告=400MB, 严重=200MB, 最小=0, 最大=800MB
1.4 性能数据配置
# nagios.cfg 性能数据配置
# 方式一:使用命令处理(推荐)
service_perfdata_command=process-service-perfdata
host_perfdata_command=process-host-perfdata
# 方式二:写入文件
service_perfdata_file=/var/log/nagios/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file
# 方式三:使用 OCSP(Obsessive Compulsive Service Processor)
ocsp_command=process-service-perfdata
ochp_command=process-host-perfdata
二、PNP4Nagios 安装
2.1 安装依赖
# CentOS/RHEL
yum install -y rrdtool rrdtool-perl php-gd php-xml perl-Time-HiRes perl-rrdtool
# Ubuntu/Debian
apt-get install -y rrdtool librrd-dev php-gd php-xml php-mbstring
2.2 编译安装 PNP4Nagios
cd /tmp
wget https://sourceforge.net/projects/pnp4nagios/files/PNP-0.6.26/pnp4nagios-0.6.26.tar.gz
tar xzf pnp4nagios-0.6.26.tar.gz
cd pnp4nagios-0.6.26
./configure \
--with-nagios-user=nagios \
--with-nagios-group=nagios \
--prefix=/usr/local/pnp4nagios
make all
sudo make install
sudo make install-init
sudo make install-config
sudo make install-webconf
2.3 配置 PNP4Nagios
# 配置文件
/usr/local/pnp4nagios/etc/pnp4nagios.conf
# 关键配置项
# /usr/local/pnp4nagios/etc/process_perfdata.cfg
CFG_DIR=/usr/local/pnp4nagios/etc/
RRDDIR=/usr/local/pnp4nagios/var/perfdata
RRD_STORAGE_TYPE=SINGLE
LOG_LEVEL=0
2.4 配置 Nagios 集成
# 在 nagios.cfg 中添加:
process_performance_data=1
service_perfdata_command=process-service-perfdata
host_perfdata_command=process-host-perfdata
# commands.cfg 中定义处理命令:
define command {
command_name process-service-perfdata
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata
}
define command {
command_name process-host-perfdata
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/host-perfdata
}
# 模板中添加图表链接:
define host {
name generic-host-pnp
action_url /pnp4nagios/graph?host=$HOSTNAME$
register 0
}
define service {
name generic-service-pnp
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
register 0
}
2.5 启动服务
# 删除安装页
rm /usr/local/pnp4nagios/share/install.php
# 验证配置
/usr/local/pnp4nagios/bin/verify_config.pl /usr/local/pnp4nagios/etc/
# 启动 NPCD(性能数据守护进程)
systemctl start npcd
systemctl enable npcd
# 重启 Nagios
systemctl restart nagios
# 访问图表
# http://your-server/pnp4nagios/
三、Grafana 集成
3.1 架构方案
┌─────────────┐ perfdata ┌─────────────┐ PromQL ┌─────────────┐
│ Nagios Core │───────────────→│ InfluxDB │←─────────────│ Grafana │
│ │ │ /Prometheus│ │ │
└─────────────┘ └─────────────┘ └─────────────┘
│ │
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ PNP4Nagios │ │ Graphite │
│ (RRD 文件) │ │ (可选后端) │
└─────────────┘ └─────────────┘
3.2 方案一:InfluxDB + Grafana
# 安装 InfluxDB
wget https://dl.influxdata.com/influxdb/releases/influxdb2-2.7.1.x86_64.rpm
sudo yum install -y influxdb2-2.7.1.x86_64.rpm
systemctl start influxdb
systemctl enable influxdb
# 安装 Grafana
yum install -y https://dl.grafana.com/oss/release/grafana-10.0.0-1.x86_64.rpm
systemctl start grafana-server
systemctl enable grafana-server
#!/usr/bin/env python3
# nagios_to_influxdb.py - 将 Nagios 性能数据写入 InfluxDB
import re
import sys
import time
from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS
INFLUX_URL = "http://localhost:8086"
INFLUX_TOKEN = "your-token"
INFLUX_ORG = "your-org"
INFLUX_BUCKET = "nagios"
def parse_perfdata(line):
"""解析性能数据行"""
parts = line.strip().split('\t')
if len(parts) < 10:
return None
return {
'timestamp': int(parts[0]),
'hostname': parts[1],
'service': parts[2],
'state': parts[3],
'perfdata': parts[8]
}
def parse_metrics(perfdata_str):
"""解析性能数据指标"""
metrics = {}
for item in perfdata_str.split(' '):
if '=' not in item:
continue
label, value = item.split('=', 1)
# 提取数值
match = re.match(r'([\d.]+)(\D*)', value.split(';')[0])
if match:
metrics[label] = {
'value': float(match.group(1)),
'unit': match.group(2)
}
return metrics
def write_to_influx(client, data):
"""写入 InfluxDB"""
write_api = client.write_api(write_options=SYNCHRONOUS)
metrics = parse_metrics(data['perfdata'])
for label, metric in metrics.items():
point = Point("nagios_perfdata") \
.tag("host", data['hostname']) \
.tag("service", data['service']) \
.tag("state", data['state']) \
.tag("metric", label) \
.field("value", metric['value']) \
.time(data['timestamp'])
write_api.write(bucket=INFLUX_BUCKET, record=point)
def main():
client = InfluxDBClient(url=INFLUX_URL, token=INFLUX_TOKEN, org=INFLUX_ORG)
for line in sys.stdin:
data = parse_perfdata(line)
if data:
write_to_influx(client, data)
client.close()
if __name__ == '__main__':
main()
# Nagios 命令配置
define command {
command_name process-service-perfdata-influx
command_line /bin/cat /usr/local/pnp4nagios/var/service-perfdata | /usr/local/nagios/libexec/nagios_to_influxdb.py
}
3.3 方案二:Prometheus + Grafana
# 安装 Prometheus exporter for Nagios
pip install nagios-prometheus-exporter
# 启动 exporter
nagios_exporter --nagios-status=/var/log/nagios/status.dat --port=9219
# Prometheus 配置 scrape target
# prometheus.yml
scrape_configs:
- job_name: 'nagios'
static_configs:
- targets: ['localhost:9219']
# Grafana 数据源配置
# Add Data Source → Prometheus → http://prometheus:9090
3.4 Grafana 仪表板
{
"dashboard": {
"title": "Nagios Monitoring",
"panels": [
{
"title": "Host Status Overview",
"type": "stat",
"targets": [
{
"expr": "count(nagios_host_state{state=\"UP\"})",
"legendFormat": "UP"
},
{
"expr": "count(nagios_host_state{state=\"DOWN\"})",
"legendFormat": "DOWN"
}
]
},
{
"title": "Service Response Time",
"type": "timeseries",
"targets": [
{
"expr": "nagios_service_perfdata{metric=\"time\"}",
"legendFormat": "{{host}}/{{service}}"
}
]
}
]
}
}
四、数据导出
4.1 RRD 数据导出
# 导出 RRD 数据为 XML
rrdtool dump /usr/local/pnp4nagios/var/perfdata/web-server-01/check_http.rrd > /tmp/check_http.xml
# 导出为 CSV
rrdtool fetch /usr/local/pnp4nagios/var/perfdata/web-server-01/check_http.rrd AVERAGE -s -1d > /tmp/check_http.csv
# 批量导出脚本
#!/bin/bash
PERFDIR="/usr/local/pnp4nagios/var/perfdata"
EXPORT_DIR="/tmp/rrd_export"
mkdir -p $EXPORT_DIR
for HOST_DIR in $PERFDIR/*/; do
HOST=$(basename $HOST_DIR)
mkdir -p $EXPORT_DIR/$HOST
for RRD_FILE in $HOST_DIR*.rrd; do
SVC=$(basename $RRD_FILE .rrd)
rrdtool fetch $RRD_FILE AVERAGE -s -7d > $EXPORT_DIR/$HOST/$SVC.csv
done
done
echo "Exported to $EXPORT_DIR"
4.2 历史数据查询
# 查询最近 24 小时数据
rrdtool fetch perfdata.rrd AVERAGE -s now-24h -e now
# 查询特定时间范围
rrdtool fetch perfdata.rrd AVERAGE \
-s 2024-01-01T00:00:00 \
-e 2024-01-02T00:00:00
# 生成图表图片
rrdtool graph /tmp/response_time.png \
--title "HTTP Response Time" \
--vertical-label "Seconds" \
-w 800 -h 400 \
DEF:value=perfdata.rrd:1:AVERAGE \
LINE1:value#FF0000:"Response Time" \
AREA:value#FF000033
五、注意事项
| 注意事项 | 说明 |
|---|---|
| 性能数据格式 | 严格遵循 label=value;warn;crit;min;max 格式 |
| 存储空间 | RRD 文件会持续增长,定期清理旧数据 |
| 处理延迟 | 合理设置 service_perfdata_file_processing_interval |
| Grafana 权限 | 设置合理的用户权限和数据源访问控制 |
| 备份 | 定期备份 RRD 数据文件和 Grafana 仪表板配置 |
六、本章小结
- 性能数据通过标准格式输出,包含指标值和阈值
- PNP4Nagios 是最常用的性能数据图表工具
- Grafana 提供更强大的可视化和告警能力
- 数据导出用于历史分析和容量规划
- 合理存储管理性能数据的生命周期
下一章:第12章:Docker 部署 - 学习使用 Docker 容器化部署 Nagios。