强曰为道

与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

第 21 章:排错指南

第 21 章:排错指南

遇到 MySQL 故障不要慌。本章整理最常见的错误、原因和解决方案。


21.1 连接问题

21.1.1 ERROR 1045: Access denied

-- 原因:用户名或密码错误
-- 解决步骤:

-- 1. 确认用户存在
SELECT User, Host, plugin FROM mysql.user WHERE User = 'appuser';

-- 2. 确认密码正确
ALTER USER 'appuser'@'%' IDENTIFIED BY 'CorrectPassword';

-- 3. 检查认证插件(8.0 默认 caching_sha2_password)
-- 老客户端可能不支持
ALTER USER 'appuser'@'%' IDENTIFIED WITH mysql_native_password BY 'CorrectPassword';

-- 4. 检查主机权限
GRANT ALL ON myapp.* TO 'appuser'@'%';
FLUSH PRIVILEGES;

21.1.2 ERROR 2003: Can’t connect to MySQL server

# 检查 MySQL 是否运行
sudo systemctl status mysql

# 检查端口是否监听
ss -tlnp | grep 3306

# 检查防火墙
sudo iptables -L -n | grep 3306
sudo ufw status

# 检查 bind-address
grep bind-address /etc/mysql/my.cnf
# 如果是 127.0.0.1,只允许本地连接
# 改为 0.0.0.0 或具体 IP

# 检查 max_connections
SHOW VARIABLES LIKE 'max_connections';
SHOW STATUS LIKE 'Threads_connected';

21.1.3 ERROR 1040: Too many connections

-- 查看当前连接数
SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';

-- 临时增加
SET GLOBAL max_connections = 500;

-- 查找连接来源
SELECT USER, HOST, COUNT(*) AS cnt
FROM information_schema.PROCESSLIST
GROUP BY USER, HOST
ORDER BY cnt DESC;

-- 杀掉空闲连接
SELECT CONCAT('KILL ', ID, ';') 
FROM information_schema.PROCESSLIST 
WHERE COMMAND = 'Sleep' AND TIME > 600;

-- 查找连接泄漏
SELECT * FROM information_schema.PROCESSLIST 
WHERE COMMAND = 'Sleep' AND TIME > 300
ORDER BY TIME DESC;

21.1.4 ERROR 1129: Host is blocked

-- 原因:连接失败次数过多,主机被封锁
-- 查看被封锁的主机
SELECT * FROM performance_schema.host_cache WHERE SUM_CONNECT_ERRORS > 0;

-- 解决:刷新主机缓存
TRUNCATE TABLE performance_schema.host_cache;
-- 或
FLUSH HOSTS;

21.2 锁等待问题

21.2.1 ERROR 1205: Lock wait timeout exceeded

-- 查看锁等待
SELECT * FROM performance_schema.data_lock_waits\G

-- 查看持锁的事务
SELECT 
    r.REQUESTING_ENGINE_TRANSACTION_ID AS waiting_trx_id,
    r.REQUESTING_ENGINE_LOCK_ID AS waiting_lock,
    r.BLOCKING_ENGINE_TRANSACTION_ID AS blocking_trx_id,
    r.BLOCKING_ENGINE_LOCK_ID AS blocking_lock
FROM performance_schema.data_lock_waits r;

-- 查看阻塞事务的详细信息
SELECT trx_id, trx_state, trx_started, trx_query,
       TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS running_seconds
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC;

-- 杀掉阻塞事务
KILL <blocking_thread_id>;

21.2.2 死锁排查

-- 查看最近一次死锁信息
SHOW ENGINE INNODB STATUS\G
-- 找 "LATEST DETECTED DEADLOCK" 部分

-- 开启死锁日志记录
SET GLOBAL innodb_print_all_deadlocks = ON;
-- 死锁信息会记录到错误日志

-- 查看死锁统计
SHOW STATUS LIKE 'Innodb_deadlocks';

21.3 性能问题

21.3.1 查询慢

-- 1. 开启慢查询日志
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;

-- 2. 分析慢查询
SELECT DIGEST_TEXT, COUNT_STAR, 
       ROUND(AVG_TIMER_WAIT/1e12, 4) AS avg_sec,
       SUM_ROWS_EXAMINED, SUM_ROWS_SENT
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;

-- 3. EXPLAIN 分析
EXPLAIN SELECT ...;

-- 4. 检查索引
SHOW INDEX FROM table_name;

21.3.2 CPU 飙高

-- 查看当前执行的查询
SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE, INFO
FROM information_schema.PROCESSLIST
WHERE COMMAND != 'Sleep'
ORDER BY TIME DESC;

-- 查看线程状态
SELECT * FROM sys.session ORDER BY time DESC LIMIT 20;

-- 常见原因:
-- 1. 缺少索引的全表扫描
-- 2. 排序/临时表溢出到磁盘
-- 3. 大量并发查询
-- 4. 锁争用

21.3.3 磁盘空间不足

-- 查看数据库大小
SELECT TABLE_SCHEMA,
       ROUND(SUM(DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024 / 1024, 2) AS size_gb
FROM information_schema.TABLES
GROUP BY TABLE_SCHEMA
ORDER BY size_gb DESC;

-- 查看 Binlog 大小
SHOW BINARY LOGS;
PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 3 DAY);

-- 查看大表
SELECT TABLE_NAME,
       ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 0) AS size_mb,
       TABLE_ROWS
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'myapp'
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC
LIMIT 20;

-- 清理临时文件
-- 检查 ibtmp1 大小
ls -lh /var/lib/mysql/ibtmp1

21.4 数据恢复

21.4.1 误删数据

-- 方案 1:从 Binlog 恢复(PITR)
-- 找到误操作的精确位置
mysqlbinlog --base64-output=DECODE-ROWS -v mysql-bin.000005 | grep -B5 "DELETE FROM users"

-- 停止在误操作之前
mysqlbinlog --stop-position=12345 mysql-bin.000005 | mysql -uroot -p

-- 跳过误操作,继续
mysqlbinlog --start-position=12400 mysql-bin.000005 mysql-bin.000006 | mysql -uroot -p

-- 方案 2:从备份恢复
mysql -uroot -p myapp < /backup/myapp_yesterday.sql

21.4.2 表损坏修复

-- InnoDB 表损坏
-- 1. 设置强制恢复模式
-- my.cnf
-- [mysqld]
-- innodb_force_recovery = 1  -- 从 1 到 6 逐级尝试

-- 2. 导出数据
mysqldump -uroot -p myapp damaged_table > recovered.sql

-- 3. 恢复正常模式,重建表
DROP TABLE damaged_table;
mysql -uroot -p myapp < recovered.sql

-- MyISAM 表修复
REPAIR TABLE myisam_table;
CHECK TABLE myisam_table;

21.4.3 误 DROP TABLE

-- 1. 立即停止所有写操作
-- 2. 从最近备份恢复该表
-- 3. 用 Binlog 补齐备份到误删之间的数据

21.5 复制问题

21.5.1 复制中断

-- 查看复制状态
SHOW REPLICA STATUS\G

-- 常见错误:
-- 1062: Duplicate entry(主键冲突)
-- 1032: Record not found(记录不存在)

-- 跳过错误
STOP REPLICA;
SET GLOBAL sql_replica_skip_counter = 1;
START REPLICA;

-- GTID 模式跳过
STOP REPLICA;
SET GTID_NEXT = 'uuid:transaction_id';
BEGIN; COMMIT;
SET GTID_NEXT = 'AUTOMATIC';
START REPLICA;

21.5.2 复制延迟

-- 查看延迟
SHOW REPLICA STATUS\G
-- Seconds_Behind_Source

-- 常见原因:
-- 1. 从库配置低
-- 2. 大事务(大批量 DELETE/UPDATE)
-- 3. 单线程回放
-- 4. 从库磁盘 I/O 瓶颈

-- 优化:
SET GLOBAL replica_parallel_workers = 8;
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';

21.6 常见错误码速查

错误码说明常见原因
1045Access denied密码错误
1062Duplicate entry唯一键冲突
1205Lock wait timeout锁等待超时
1213Deadlock死锁
2003Can’t connect网络/配置问题
2006Server gone away连接超时/包过大
2013Lost connection网络中断
1040Too many connections连接数超限
1114Table is full磁盘满
1153Packet too largemax_allowed_packet 太小
-- ERROR 2006: MySQL server has gone away
SHOW VARIABLES LIKE 'max_allowed_packet';
SET GLOBAL max_allowed_packet = 64 * 1024 * 1024;  -- 64MB

SHOW VARIABLES LIKE 'wait_timeout';
SET GLOBAL wait_timeout = 600;
SET GLOBAL interactive_timeout = 600;

业务场景

场景 1:紧急故障排查流程

1. 确认故障现象(连接失败?查询慢?数据丢失?)
         ↓
2. 检查 MySQL 进程状态
         ↓
3. 查看错误日志
         ↓
4. 检查系统资源(CPU、内存、磁盘、网络)
         ↓
5. 检查当前连接和查询
         ↓
6. 定位根因并修复
         ↓
7. 验证修复
         ↓
8. 记录故障报告,制定预防措施

扩展阅读