强曰为道

与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

第 11 章:故障排查

第 11 章:故障排查

11.1 排查流程总览

网络问题
    │
    ├─ 1. 确认 NM 服务状态
    │      systemctl status NetworkManager
    │
    ├─ 2. 检查设备状态
    │      nmcli device status
    │
    ├─ 3. 检查连接状态
    │      nmcli connection show --active
    │
    ├─ 4. 检查 IP 配置
    │      ip addr show
    │      ip route
    │
    ├─ 5. 检查 DNS
    │      resolvectl status
    │      nslookup example.com
    │
    ├─ 6. 测试连通性
    │      ping <网关>
    │      ping <DNS>
    │      ping 8.8.8.8
    │      ping example.com
    │
    └─ 7. 查看日志
           journalctl -u NetworkManager -f

11.2 连接失败

问题:无法获取 IP(DHCP 失败)

# 症状
nmcli device status
# eth0  ethernet  disconnected  --

# 步骤 1:检查物理连接
ip link show eth0
# 确认 state UP

# 步骤 2:检查 NM 管理状态
nmcli device show eth0 | grep -i managed
# GENERAL.NM-MANAGED: yes

# 如果显示 unmanaged
sudo nmcli device set eth0 managed yes

# 步骤 3:检查 DHCP 日志
journalctl -u NetworkManager | grep -i "dhcp"

# 步骤 4:手动触发 DHCP
sudo dhclient -v eth0

# 步骤 5:检查 DHCP 服务器连通性
sudo nmap -sU -p 67 255.255.255.255

# 步骤 6:尝试静态 IP
nmcli connection add type ethernet con-name "temp-static" \
    ifname eth0 ipv4.method manual ipv4.addresses "192.168.1.100/24" \
    ipv4.gateway "192.168.1.1"
nmcli connection up "temp-static"

问题:连接频繁断开

# 查看断开历史
journalctl -u NetworkManager | grep -i "disconnect\|deactivat"

# 检查设备状态变化
journalctl -u NetworkManager | grep -i "device.*state"

# 检查链路状态
ip monitor link

# 检查以太网自协商
ethtool eth0 | grep -i "speed\|duplex\|link"

# 检查是否有电源管理干扰
ethtool -i eth0 | grep driver
# 禁用以太网省电模式
sudo ethtool -s eth0 autoneg on

# 查看内核环缓冲区
dmesg | grep -i "eth0\|link\|carrier"

问题:连接激活超时

# 检查连接配置
nmcli connection show "problem-connection"

# 常见原因:
# 1. 指定的接口不存在
nmcli -t -f DEVICE device status

# 2. MAC 地址不匹配
nmcli connection show "problem-connection" | grep mac-address
ip link show eth0 | grep ether

# 3. MTU 不匹配
nmcli connection show "problem-connection" | grep mtu

# 修复:重置 MAC 绑定
nmcli connection modify "problem-connection" \
    ethernet.cloned-mac-address ""

# 修复:重新绑定接口
nmcli connection modify "problem-connection" \
    connection.interface-name eth0

11.3 IP 配置问题

问题:静态 IP 不生效

# 检查连接配置
nmcli connection show "my-connection" | grep -i "ipv4"

# 确认 ipv4.method 是 manual
nmcli connection show "my-connection" | grep ipv4.method
# ipv4.method:  manual

# 确认 IP 地址正确
nmcli connection show "my-connection" | grep ipv4.addresses

# 确认连接绑定了正确的接口
nmcli connection show "my-connection" | grep connection.interface-name

# 重新应用
nmcli connection down "my-connection"
nmcli connection up "my-connection"

# 或直接 reapply
nmcli device reapply eth0

问题:多 IP 地址冲突

# 查看接口上的所有 IP
ip addr show eth0

# 检查是否有多个连接绑定了同一接口
nmcli -t -f NAME,DEVICE connection show | grep eth0

# 删除冲突的连接
nmcli connection delete "conflicting-connection"

# 查看路由表
ip route show
ip -6 route show

# 检查是否有多个默认路由
ip route | grep default

问题:IPv6 问题

# 检查 IPv6 配置
nmcli connection show "my-connection" | grep ipv6

# 临时禁用 IPv6(测试)
sudo sysctl -w net.ipv6.conf.eth0.disable_ipv6=1

# 永久禁用 IPv6
nmcli connection modify "my-connection" ipv6.method disabled

# IPv6 邻居发现问题
ip -6 neigh show
ping6 -c 3 fe80::1%eth0

# IPv6 SLAAC 不工作
# 检查 IPv6 路由通告
tcpdump -i eth0 icmp6

11.4 DNS 问题

问题:域名无法解析

# 步骤 1:检查 resolv.conf
cat /etc/resolv.conf

# 步骤 2:检查 NM 管理的 DNS
nmcli device show | grep DNS

# 步骤 3:手动 DNS 测试
nslookup example.com 8.8.8.8
dig @8.8.8.8 example.com

# 步骤 4:检查 DNS 后端状态
resolvectl status   # systemd-resolved
systemctl status dnsmasq  # dnsmasq

# 步骤 5:清除缓存
resolvectl flush-caches
sudo systemctl restart dnsmasq

# 步骤 6:检查防火墙规则
sudo iptables -L -n | grep 53
sudo iptables -L -n -t nat | grep 53

# 常见原因:resolv.conf 被锁定
ls -la /etc/resolv.conf
# 如果是不可变文件
sudo chattr -i /etc/resolv.conf
sudo rm /etc/resolv.conf
sudo systemctl restart NetworkManager

问题:DNS 解析缓慢

# 测试 DNS 响应时间
time nslookup example.com

# 检查 DNS 服务器列表
nmcli device show | grep DNS

# 检查搜索域(可能导致额外查询)
nmcli connection show | grep dns-search

# 使用 systemd-resolved 统计
resolvectl statistics

# 优化:减少 DNS 超时
# /etc/resolv.conf(如果手动管理)
options timeout:1 attempts:1

# 优化:使用本地 DNS 缓存
sudo apt install systemd-resolved
sudo systemctl enable --now systemd-resolved

问题:DNS 泄露

# 检查 DNS 服务器
resolvectl status | grep "DNS Servers"

# 连接 VPN 后检查
# 1. 访问 https://dnsleaktest.com
# 2. 或使用命令行工具
nslookup whoami.akamai.net

# 修复 VPN DNS 泄露
nmcli connection modify "VPN" \
    ipv4.dns-priority -50 \
    ipv4.ignore-auto-dns no

11.5 WiFi 问题

问题:WiFi 看不到任何网络

# 步骤 1:检查 WiFi 硬件状态
rfkill list
# 如果 soft blocked
sudo rfkill unblock wifi
# 如果 hard blocked → 物理开关或 Fn+Fx 键

# 步骤 2:检查 NM WiFi 射频状态
nmcli radio wifi
# 如果 disabled
sudo nmcli radio wifi on

# 步骤 3:检查设备状态
nmcli device status | grep wifi
# 如果 unavailable → 驱动问题
# 如果 unmanaged → NM 不管理
sudo nmcli device set wlan0 managed yes

# 步骤 4:检查驱动
lsmod | grep -i wifi
lsmod | grep -i iwl   # Intel
lsmod | grep -i ath   # Atheros
lsmod | grep -i rtl   # Realtek

# 步骤 5:扫描
sudo nmcli device wifi rescan
sleep 2 && nmcli device wifi list

# 步骤 6:检查固件
dmesg | grep -i "firmware\|wlan\|wifi"
journalctl -u NetworkManager | grep -i "firmware"

问题:WiFi 连接失败

# 检查连接日志
journalctl -u NetworkManager | grep -i "wifi\|wlan"

# 检查 wpa_supplicant
journalctl | grep -i "wpa_supplicant"

# 检查密码
nmcli connection show "WiFi-Name" | grep wifi-sec.psk

# 检查安全类型
nmcli connection show "WiFi-Name" | grep wifi-sec.key-mgmt

# 删除并重新创建连接
nmcli connection delete "WiFi-Name"
nmcli device wifi connect "SSID" password "password"

# 检查 MAC 过滤
# 查看 MAC 地址
ip link show wlan0 | grep ether
# 尝试禁用 MAC 随机化
nmcli connection modify "WiFi-Name" \
    wifi.cloned-mac-address preserve

问题:WiFi 信号弱/频繁断开

# 查看当前信号强度
iwconfig wlan0 | grep "Signal level"

# 扫描并查看信号强度
nmcli device wifi list | sort -k7 -rn

# 检查漫游
journalctl -u NetworkManager | grep -i "roam\|bssid"

# 固定到特定 AP(BSSID)
nmcli connection modify "WiFi-Name" \
    wifi.bssid "AA:BB:CC:DD:EE:FF"

# 调整 WiFi 驱动参数(如果支持)
# 查看驱动参数
iwconfig wlan0
ethtool -i wlan0

11.6 VPN 问题

问题:VPN 连接失败

# 查看 VPN 日志
journalctl -u NetworkManager | grep -i "vpn"

# OpenVPN 详细日志
journalctl -u NetworkManager | grep -i "openvpn"

# WireGuard 调试
wg show

# IPSec 调试
journalctl -u strongswan
# 或
journalctl -u ipsec

# 检查 VPN 配置
nmcli connection show "VPN-Name" | grep vpn

# 检查证书
openssl x509 -in client.crt -noout -dates  # 证书有效期
openssl verify -CAfile ca.crt client.crt     # 证书链验证

# 检查端口连通性
nc -zv vpn.example.com 1194   # OpenVPN
nc -zuv vpn.example.com 51820 # WireGuard

问题:VPN 连接但无法访问内网

# 检查 VPN 接口
ip addr show tun0   # OpenVPN
ip addr show wg0    # WireGuard

# 检查路由
ip route | grep tun0
ip route | grep wg0

# 检查 split tunnel 配置
nmcli connection show "VPN-Name" | grep "ipv4.never-default"

# 手动添加路由
sudo ip route add 10.0.0.0/8 via $(ip addr show tun0 | grep "inet " | awk '{print $2}' | cut -d/ -f1)

# 检查 VPN 服务器端是否推送了路由
journalctl -u NetworkManager | grep -i "route\|push"

11.7 日志分析工具

关键日志命令

# NM 核心日志
journalctl -u NetworkManager

# NM Dispatcher 日志
journalctl -u NetworkManager-dispatcher

# wpa_supplicant 日志(WiFi)
journalctl | grep wpa_supplicant

# DHCP 客户端日志
journalctl | grep -i "dhclient\|dhcp"

# 内核网络日志
dmesg | grep -i "net\|eth\|wlan\|bond\|bridge\|vlan"

# 按时间范围查看
journalctl -u NetworkManager --since "1 hour ago"
journalctl -u NetworkManager --since "2026-05-10 09:00" --until "2026-05-10 12:00"

# JSON 格式输出(便于解析)
journalctl -u NetworkManager -o json-pretty | jq '.MESSAGE'

实时调试

# 开启详细日志
sudo nmcli general logging level DEBUG domains ALL

# 只调试特定域
sudo nmcli general logging level DEBUG domains WIFI,DHCP4,DEVICE

# 实时跟踪
journalctl -u NetworkManager -f

# 恢复默认日志级别
sudo nmcli general logging level INFO domains DEFAULT

日志模式匹配

# 常见错误模式

# DHCP 超时
journalctl -u NetworkManager | grep "DHCP.*timeout"

# 认证失败
journalctl -u NetworkManager | grep "auth.*fail\|authentication.*failed"

# 设备状态变化
journalctl -u NetworkManager | grep "device.*state.*changed"

# 连接失败
journalctl -u NetworkManager | grep "connection.*failed\|activation.*failed"

# DNS 解析失败
journalctl -u NetworkManager | grep "dns.*fail\|resolve.*fail"

11.8 调试工具链

# 1. ip 命令(最基础)
ip addr show              # 接口和 IP
ip route show             # 路由表
ip link show              # 链路状态
ip neigh show             # ARP 表
ip monitor                # 实时监控变化

# 2. ss 命令(Socket 统计)
ss -tlnp                  # 监听的 TCP 端口
ss -ulnp                  # 监听的 UDP 端口
ss -s                     # 统计信息

# 3. ethtool(以太网工具)
ethtool eth0              # 接口详情
ethtool -i eth0           # 驱动信息
ethtool -S eth0           # 统计信息
ethtool -k eth0           # Offload 功能

# 4. tcpdump(抓包)
sudo tcpdump -i eth0 -n
sudo tcpdump -i eth0 port 53    # DNS 流量
sudo tcpdump -i eth0 port 67 or port 68  # DHCP 流量

# 5. nmap(端口扫描)
nmap -sn 192.168.1.0/24   # 主机发现
nmap -sU -p 53 192.168.1.1  # DNS 端口

# 6. mtr(路由追踪)
mtr 8.8.8.8
mtr -r -c 10 8.8.8.8      # 报告模式

# 7. dig/nslookup(DNS 调试)
dig +trace example.com     # 完整 DNS 追踪
dig @8.8.8.8 example.com   # 指定 DNS 查询

11.9 NM 重置与恢复

完全重置 NM 配置

#!/bin/bash
# nm-reset.sh - 完全重置 NM 配置(谨慎使用!)

echo "⚠️ 警告:此操作将删除所有 NM 连接配置"
read -p "确认继续? (yes/no): " confirm
[ "$confirm" != "yes" ] && exit 1

# 备份
cp -a /etc/NetworkManager /etc/NetworkManager.bak.$(date +%Y%m%d)

# 停止服务
sudo systemctl stop NetworkManager

# 删除所有连接配置
sudo rm -f /etc/NetworkManager/system-connections/*

# 删除自定义配置
sudo rm -f /etc/NetworkManager/conf.d/*.conf

# 重置主配置
sudo tee /etc/NetworkManager/NetworkManager.conf << 'EOF'
[main]
plugins=keyfile

[device]
wifi.scan-rand-mac-address=yes
EOF

# 重置 resolv.conf
sudo rm -f /etc/resolv.conf

# 启动服务
sudo systemctl start NetworkManager

# 等待自动检测
sleep 5

# 检查状态
nmcli device status
nmcli connection show

echo "重置完成。请重新配置网络连接。"

快速恢复方法

# 方法 1:恢复备份
sudo cp -a /etc/NetworkManager.bak.*/system-connections/* \
    /etc/NetworkManager/system-connections/
sudo nmcli connection reload

# 方法 2:使用 DHCP 快速恢复
nmcli connection add type ethernet con-name "recovery" ifname eth0
nmcli connection up "recovery"

# 方法 3:临时静态 IP
sudo ip addr add 192.168.1.100/24 dev eth0
sudo ip route add default via 192.168.1.1
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf

11.10 本章小结

问题类型首要排查步骤
连接失败nmcli device status → 物理连接 → 管理状态 → DHCP
DNS 问题resolvectl statuscat /etc/resolv.confdig
WiFi 问题rfkill listnmcli radio wifinmcli device wifi list
VPN 问题journalctl → 证书/密钥 → 端口连通性
IP 冲突ip addrnmcli connection show → 路由表
通用journalctl -u NetworkManager -f 实时查看日志

扩展阅读