强曰为道

与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

第11章:性能数据与可视化

第11章:性能数据与可视化

性能数据(Performance Data)是监控系统的重要输出,用于趋势分析和容量规划。本章详解性能数据格式、PNP4Nagios 图表工具、Grafana 集成和数据导出方案。


一、性能数据基础

1.1 性能数据格式

# 标准格式:
# label=value[UOM];[warn];[crit];[min];[max]

# 完整示例:
check_http 输出:
HTTP OK - Response time = 0.15s | time=0.15s;1;5;0; size=15234B;;;0; pages=25;;;0;

# 各字段说明:
# time      = 标签名
# 0.15s     = 当前值 + 单位
# 1         = 警告阈值
# 5         = 严重阈值
# 0         = 最小值
# (空)      = 最大值

1.2 常用单位(UOM)

单位含义示例
(无)无单位数值users=5;10;20
%百分比cpu=75%;80;95
stime=0.15s;1;5
ms毫秒rta=10.50ms;100;200
B字节size=1024B;;;0;
KB千字节mem=512KB;;;0;
MB兆字节mem=1024MB;;;0;
GB吉字节disk=50GB;;;0;
TB太字节disk=2TB;;;0;
c计数器errors=5c;;;0;

1.3 多性能数据项

# 用空格分隔多个性能数据项
check_disk 输出:
DISK OK - free space: / 1024 MB (80%), /var 512 MB (60%) | /=1024MB;800;400;0;1280 /var=512MB;400;200;0;800

# 解析:
# / 分区: 1024MB, 警告=800MB, 严重=400MB, 最小=0, 最大=1280MB
# /var 分区: 512MB, 警告=400MB, 严重=200MB, 最小=0, 最大=800MB

1.4 性能数据配置

# nagios.cfg 性能数据配置

# 方式一:使用命令处理(推荐)
service_perfdata_command=process-service-perfdata
host_perfdata_command=process-host-perfdata

# 方式二:写入文件
service_perfdata_file=/var/log/nagios/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file

# 方式三:使用 OCSP(Obsessive Compulsive Service Processor)
ocsp_command=process-service-perfdata
ochp_command=process-host-perfdata

二、PNP4Nagios 安装

2.1 安装依赖

# CentOS/RHEL
yum install -y rrdtool rrdtool-perl php-gd php-xml perl-Time-HiRes perl-rrdtool

# Ubuntu/Debian
apt-get install -y rrdtool librrd-dev php-gd php-xml php-mbstring

2.2 编译安装 PNP4Nagios

cd /tmp
wget https://sourceforge.net/projects/pnp4nagios/files/PNP-0.6.26/pnp4nagios-0.6.26.tar.gz
tar xzf pnp4nagios-0.6.26.tar.gz
cd pnp4nagios-0.6.26

./configure \
    --with-nagios-user=nagios \
    --with-nagios-group=nagios \
    --prefix=/usr/local/pnp4nagios

make all
sudo make install
sudo make install-init
sudo make install-config
sudo make install-webconf

2.3 配置 PNP4Nagios

# 配置文件
/usr/local/pnp4nagios/etc/pnp4nagios.conf

# 关键配置项
# /usr/local/pnp4nagios/etc/process_perfdata.cfg
CFG_DIR=/usr/local/pnp4nagios/etc/
RRDDIR=/usr/local/pnp4nagios/var/perfdata
RRD_STORAGE_TYPE=SINGLE
LOG_LEVEL=0

2.4 配置 Nagios 集成

# 在 nagios.cfg 中添加:
process_performance_data=1
service_perfdata_command=process-service-perfdata
host_perfdata_command=process-host-perfdata

# commands.cfg 中定义处理命令:
define command {
    command_name    process-service-perfdata
    command_line    /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/service-perfdata
}

define command {
    command_name    process-host-perfdata
    command_line    /usr/local/pnp4nagios/libexec/process_perfdata.pl --bulk=/usr/local/pnp4nagios/var/host-perfdata
}

# 模板中添加图表链接:
define host {
    name                    generic-host-pnp
    action_url              /pnp4nagios/graph?host=$HOSTNAME$
    register                0
}

define service {
    name                    generic-service-pnp
    action_url              /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
    register                0
}

2.5 启动服务

# 删除安装页
rm /usr/local/pnp4nagios/share/install.php

# 验证配置
/usr/local/pnp4nagios/bin/verify_config.pl /usr/local/pnp4nagios/etc/

# 启动 NPCD(性能数据守护进程)
systemctl start npcd
systemctl enable npcd

# 重启 Nagios
systemctl restart nagios

# 访问图表
# http://your-server/pnp4nagios/

三、Grafana 集成

3.1 架构方案

┌─────────────┐    perfdata    ┌─────────────┐    PromQL    ┌─────────────┐
│ Nagios Core │───────────────→│  InfluxDB   │←─────────────│   Grafana   │
│             │                │  /Prometheus│              │             │
└─────────────┘                └─────────────┘              └─────────────┘
      │                              │
      │                              │
      ▼                              ▼
┌─────────────┐                ┌─────────────┐
│ PNP4Nagios  │                │ Graphite    │
│ (RRD 文件)  │                │ (可选后端)   │
└─────────────┘                └─────────────┘

3.2 方案一:InfluxDB + Grafana

# 安装 InfluxDB
wget https://dl.influxdata.com/influxdb/releases/influxdb2-2.7.1.x86_64.rpm
sudo yum install -y influxdb2-2.7.1.x86_64.rpm
systemctl start influxdb
systemctl enable influxdb

# 安装 Grafana
yum install -y https://dl.grafana.com/oss/release/grafana-10.0.0-1.x86_64.rpm
systemctl start grafana-server
systemctl enable grafana-server
#!/usr/bin/env python3
# nagios_to_influxdb.py - 将 Nagios 性能数据写入 InfluxDB

import re
import sys
import time
from influxdb_client import InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS

INFLUX_URL = "http://localhost:8086"
INFLUX_TOKEN = "your-token"
INFLUX_ORG = "your-org"
INFLUX_BUCKET = "nagios"

def parse_perfdata(line):
    """解析性能数据行"""
    parts = line.strip().split('\t')
    if len(parts) < 10:
        return None

    return {
        'timestamp': int(parts[0]),
        'hostname': parts[1],
        'service': parts[2],
        'state': parts[3],
        'perfdata': parts[8]
    }

def parse_metrics(perfdata_str):
    """解析性能数据指标"""
    metrics = {}
    for item in perfdata_str.split(' '):
        if '=' not in item:
            continue
        label, value = item.split('=', 1)
        # 提取数值
        match = re.match(r'([\d.]+)(\D*)', value.split(';')[0])
        if match:
            metrics[label] = {
                'value': float(match.group(1)),
                'unit': match.group(2)
            }
    return metrics

def write_to_influx(client, data):
    """写入 InfluxDB"""
    write_api = client.write_api(write_options=SYNCHRONOUS)

    metrics = parse_metrics(data['perfdata'])
    for label, metric in metrics.items():
        point = Point("nagios_perfdata") \
            .tag("host", data['hostname']) \
            .tag("service", data['service']) \
            .tag("state", data['state']) \
            .tag("metric", label) \
            .field("value", metric['value']) \
            .time(data['timestamp'])

        write_api.write(bucket=INFLUX_BUCKET, record=point)

def main():
    client = InfluxDBClient(url=INFLUX_URL, token=INFLUX_TOKEN, org=INFLUX_ORG)

    for line in sys.stdin:
        data = parse_perfdata(line)
        if data:
            write_to_influx(client, data)

    client.close()

if __name__ == '__main__':
    main()
# Nagios 命令配置
define command {
    command_name    process-service-perfdata-influx
    command_line    /bin/cat /usr/local/pnp4nagios/var/service-perfdata | /usr/local/nagios/libexec/nagios_to_influxdb.py
}

3.3 方案二:Prometheus + Grafana

# 安装 Prometheus exporter for Nagios
pip install nagios-prometheus-exporter

# 启动 exporter
nagios_exporter --nagios-status=/var/log/nagios/status.dat --port=9219

# Prometheus 配置 scrape target
# prometheus.yml
scrape_configs:
  - job_name: 'nagios'
    static_configs:
      - targets: ['localhost:9219']
# Grafana 数据源配置
# Add Data Source → Prometheus → http://prometheus:9090

3.4 Grafana 仪表板

{
  "dashboard": {
    "title": "Nagios Monitoring",
    "panels": [
      {
        "title": "Host Status Overview",
        "type": "stat",
        "targets": [
          {
            "expr": "count(nagios_host_state{state=\"UP\"})",
            "legendFormat": "UP"
          },
          {
            "expr": "count(nagios_host_state{state=\"DOWN\"})",
            "legendFormat": "DOWN"
          }
        ]
      },
      {
        "title": "Service Response Time",
        "type": "timeseries",
        "targets": [
          {
            "expr": "nagios_service_perfdata{metric=\"time\"}",
            "legendFormat": "{{host}}/{{service}}"
          }
        ]
      }
    ]
  }
}

四、数据导出

4.1 RRD 数据导出

# 导出 RRD 数据为 XML
rrdtool dump /usr/local/pnp4nagios/var/perfdata/web-server-01/check_http.rrd > /tmp/check_http.xml

# 导出为 CSV
rrdtool fetch /usr/local/pnp4nagios/var/perfdata/web-server-01/check_http.rrd AVERAGE -s -1d > /tmp/check_http.csv

# 批量导出脚本
#!/bin/bash
PERFDIR="/usr/local/pnp4nagios/var/perfdata"
EXPORT_DIR="/tmp/rrd_export"

mkdir -p $EXPORT_DIR

for HOST_DIR in $PERFDIR/*/; do
    HOST=$(basename $HOST_DIR)
    mkdir -p $EXPORT_DIR/$HOST
    for RRD_FILE in $HOST_DIR*.rrd; do
        SVC=$(basename $RRD_FILE .rrd)
        rrdtool fetch $RRD_FILE AVERAGE -s -7d > $EXPORT_DIR/$HOST/$SVC.csv
    done
done

echo "Exported to $EXPORT_DIR"

4.2 历史数据查询

# 查询最近 24 小时数据
rrdtool fetch perfdata.rrd AVERAGE -s now-24h -e now

# 查询特定时间范围
rrdtool fetch perfdata.rrd AVERAGE \
    -s 2024-01-01T00:00:00 \
    -e 2024-01-02T00:00:00

# 生成图表图片
rrdtool graph /tmp/response_time.png \
    --title "HTTP Response Time" \
    --vertical-label "Seconds" \
    -w 800 -h 400 \
    DEF:value=perfdata.rrd:1:AVERAGE \
    LINE1:value#FF0000:"Response Time" \
    AREA:value#FF000033

五、注意事项

注意事项说明
性能数据格式严格遵循 label=value;warn;crit;min;max 格式
存储空间RRD 文件会持续增长,定期清理旧数据
处理延迟合理设置 service_perfdata_file_processing_interval
Grafana 权限设置合理的用户权限和数据源访问控制
备份定期备份 RRD 数据文件和 Grafana 仪表板配置

六、本章小结

  1. 性能数据通过标准格式输出,包含指标值和阈值
  2. PNP4Nagios 是最常用的性能数据图表工具
  3. Grafana 提供更强大的可视化和告警能力
  4. 数据导出用于历史分析和容量规划
  5. 合理存储管理性能数据的生命周期

下一章第12章:Docker 部署 - 学习使用 Docker 容器化部署 Nagios。