强曰为道
与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

PostgreSQL 完全指南 / 21 - 排错指南

第 21 章 · 排错指南

本章汇总 PostgreSQL 常见错误、连接问题、锁争用和 WAL 堆积的排查方法。


21.1 连接问题

FATAL: too many connections

-- 查看当前连接数
SELECT count(*) FROM pg_stat_activity;
SHOW max_connections;

-- 查看连接分布
SELECT datname, usename, client_addr, state, count(*)
FROM pg_stat_activity
GROUP BY datname, usename, client_addr, state
ORDER BY count(*) DESC;

-- 终止空闲连接
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
  AND query_start < now() - INTERVAL '10 minutes';

-- 终止空闲事务
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND state_change < now() - INTERVAL '5 minutes';

💡 技巧:根本解决方案是使用连接池(PgBouncer)。

FATAL: password authentication failed

# 检查 pg_hba.conf
sudo cat /etc/postgresql/17/main/pg_hba.conf

# 确认认证方式
# host    all    all    127.0.0.1/32    scram-sha-256

# 重置密码
sudo -u postgres psql
ALTER USER postgres PASSWORD 'newpassword';

# 重新加载配置
SELECT pg_reload_conf();

FATAL: database “xxx” does not exist

-- 检查数据库名
\l

-- 可能是大小写问题(PG 默认小写)
-- CREATE DATABASE "MyDB" 会创建大小写敏感的库名
-- 连接时需要加引号:psql -d "MyDB"

could not connect to server: Connection refused

# 检查 PG 是否运行
sudo systemctl status postgresql

# 检查监听地址
sudo -u postgres psql -c "SHOW listen_addresses;"
# 如果是 localhost,远程连接会失败
# 改为 '*' 或具体 IP

# 检查端口
sudo -u postgres psql -c "SHOW port;"

# 检查防火墙
sudo ufw status
sudo iptables -L -n

21.2 锁问题排查

-- 查看所有锁
SELECT
    a.pid,
    a.usename,
    a.query,
    l.locktype,
    l.mode,
    l.granted,
    l.relation::regclass AS table_name
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE l.relation IS NOT NULL
ORDER BY l.pid;

-- 查找阻塞关系
SELECT
    blocked.pid AS blocked_pid,
    blocked.query AS blocked_query,
    blocking.pid AS blocking_pid,
    blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.locktype = bl.locktype
    AND gl.relation = bl.relation AND gl.granted
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid;

-- 终止阻塞进程
SELECT pg_terminate_backend(blocking_pid);

21.3 WAL 堆积

-- 查看 WAL 文件数量和大小
SELECT
    count(*) AS wal_count,
    pg_size_pretty(sum(size)) AS total_size
FROM pg_ls_waldir();

-- 检查复制槽是否导致堆积
SELECT
    slot_name,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained
FROM pg_replication_slots;

-- 删除不活跃的复制槽
SELECT pg_drop_replication_slot('slot_name');

-- 检查归档状态
SELECT * FROM pg_stat_archiver;
-- 如果 last_failed_time 比 last_archived_time 新,归档可能有问题

-- 手动触发 checkpoint
CHECKPOINT;

21.4 表膨胀

-- 检查膨胀
SELECT
    relname,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    n_live_tup,
    n_dead_tup,
    round(n_dead_tup::numeric / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct,
    last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

-- 手动 VACUUM
VACUUM VERBOSE mytable;

-- VACUUM FULL(锁表,完全回收空间)
VACUUM FULL mytable;

-- 使用 pg_repack(不锁表重建)
-- pg_repack -d mydb -t mytable

21.5 慢查询排查

-- 找到当前慢查询
SELECT pid, now() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY duration DESC;

-- 分析查询计划
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

-- 常见慢查询原因
-- 1. 缺少索引 → 创建索引
-- 2. 统计信息过时 → ANALYZE table
-- 3. work_mem 不足 → 增加 work_mem
-- 4. 表膨胀 → VACUUM
-- 5. 锁等待 → 检查锁

21.6 磁盘空间不足

-- 查看数据库大小
SELECT
    datname,
    pg_size_pretty(pg_database_size(datname)) AS size
FROM pg_database
ORDER BY pg_database_size(datname) DESC;

-- 查看表大小 Top 20
SELECT
    relname,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 20;

-- 清理方案
-- 1. VACUUM FULL(锁表)
-- 2. pg_repack(在线)
-- 3. 清理旧数据
-- 4. 清理 WAL:删除不活跃的复制槽、检查归档

21.7 事务 ID 回卷预警

-- 检查各数据库的事务 ID 年龄
SELECT
    datname,
    age(datfrozenxid) AS xid_age,
    2147483647 - age(datfrozenxid) AS remaining
FROM pg_database
ORDER BY xid_age DESC;

-- 如果 remaining < 100000000,需要紧急 VACUUM FREEZE
VACUUM FREEZE VERBOSE mytable;

-- 检查表级别
SELECT
    relname,
    age(relfrozenxid) AS xid_age
FROM pg_class
WHERE relkind = 'r'
ORDER BY xid_age DESC
LIMIT 20;

⚠️ 注意事项:当 remaining 接近 0 时,PostgreSQL 会强制关闭数据库!必须确保 autovacuum 正常运行。


业务场景

问题排查方向
连接失败pg_hba.conf + 防火墙 + 监听地址
查询突然变慢EXPLAIN ANALYZE + 统计信息
CPU 飙高pg_stat_statements 找高频查询
磁盘满大表/WAL 文件/复制槽
事务挂起锁等待 + 空闲事务

扩展阅读