第 21 章:排错指南
第 21 章:排错指南
遇到 MySQL 故障不要慌。本章整理最常见的错误、原因和解决方案。
21.1 连接问题
21.1.1 ERROR 1045: Access denied
-- 原因:用户名或密码错误
-- 解决步骤:
-- 1. 确认用户存在
SELECT User, Host, plugin FROM mysql.user WHERE User = 'appuser';
-- 2. 确认密码正确
ALTER USER 'appuser'@'%' IDENTIFIED BY 'CorrectPassword';
-- 3. 检查认证插件(8.0 默认 caching_sha2_password)
-- 老客户端可能不支持
ALTER USER 'appuser'@'%' IDENTIFIED WITH mysql_native_password BY 'CorrectPassword';
-- 4. 检查主机权限
GRANT ALL ON myapp.* TO 'appuser'@'%';
FLUSH PRIVILEGES;
21.1.2 ERROR 2003: Can’t connect to MySQL server
# 检查 MySQL 是否运行
sudo systemctl status mysql
# 检查端口是否监听
ss -tlnp | grep 3306
# 检查防火墙
sudo iptables -L -n | grep 3306
sudo ufw status
# 检查 bind-address
grep bind-address /etc/mysql/my.cnf
# 如果是 127.0.0.1,只允许本地连接
# 改为 0.0.0.0 或具体 IP
# 检查 max_connections
SHOW VARIABLES LIKE 'max_connections';
SHOW STATUS LIKE 'Threads_connected';
21.1.3 ERROR 1040: Too many connections
-- 查看当前连接数
SHOW STATUS LIKE 'Threads_connected';
SHOW VARIABLES LIKE 'max_connections';
-- 临时增加
SET GLOBAL max_connections = 500;
-- 查找连接来源
SELECT USER, HOST, COUNT(*) AS cnt
FROM information_schema.PROCESSLIST
GROUP BY USER, HOST
ORDER BY cnt DESC;
-- 杀掉空闲连接
SELECT CONCAT('KILL ', ID, ';')
FROM information_schema.PROCESSLIST
WHERE COMMAND = 'Sleep' AND TIME > 600;
-- 查找连接泄漏
SELECT * FROM information_schema.PROCESSLIST
WHERE COMMAND = 'Sleep' AND TIME > 300
ORDER BY TIME DESC;
21.1.4 ERROR 1129: Host is blocked
-- 原因:连接失败次数过多,主机被封锁
-- 查看被封锁的主机
SELECT * FROM performance_schema.host_cache WHERE SUM_CONNECT_ERRORS > 0;
-- 解决:刷新主机缓存
TRUNCATE TABLE performance_schema.host_cache;
-- 或
FLUSH HOSTS;
21.2 锁等待问题
21.2.1 ERROR 1205: Lock wait timeout exceeded
-- 查看锁等待
SELECT * FROM performance_schema.data_lock_waits\G
-- 查看持锁的事务
SELECT
r.REQUESTING_ENGINE_TRANSACTION_ID AS waiting_trx_id,
r.REQUESTING_ENGINE_LOCK_ID AS waiting_lock,
r.BLOCKING_ENGINE_TRANSACTION_ID AS blocking_trx_id,
r.BLOCKING_ENGINE_LOCK_ID AS blocking_lock
FROM performance_schema.data_lock_waits r;
-- 查看阻塞事务的详细信息
SELECT trx_id, trx_state, trx_started, trx_query,
TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS running_seconds
FROM information_schema.INNODB_TRX
ORDER BY trx_started ASC;
-- 杀掉阻塞事务
KILL <blocking_thread_id>;
21.2.2 死锁排查
-- 查看最近一次死锁信息
SHOW ENGINE INNODB STATUS\G
-- 找 "LATEST DETECTED DEADLOCK" 部分
-- 开启死锁日志记录
SET GLOBAL innodb_print_all_deadlocks = ON;
-- 死锁信息会记录到错误日志
-- 查看死锁统计
SHOW STATUS LIKE 'Innodb_deadlocks';
21.3 性能问题
21.3.1 查询慢
-- 1. 开启慢查询日志
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;
-- 2. 分析慢查询
SELECT DIGEST_TEXT, COUNT_STAR,
ROUND(AVG_TIMER_WAIT/1e12, 4) AS avg_sec,
SUM_ROWS_EXAMINED, SUM_ROWS_SENT
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;
-- 3. EXPLAIN 分析
EXPLAIN SELECT ...;
-- 4. 检查索引
SHOW INDEX FROM table_name;
21.3.2 CPU 飙高
-- 查看当前执行的查询
SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE, INFO
FROM information_schema.PROCESSLIST
WHERE COMMAND != 'Sleep'
ORDER BY TIME DESC;
-- 查看线程状态
SELECT * FROM sys.session ORDER BY time DESC LIMIT 20;
-- 常见原因:
-- 1. 缺少索引的全表扫描
-- 2. 排序/临时表溢出到磁盘
-- 3. 大量并发查询
-- 4. 锁争用
21.3.3 磁盘空间不足
-- 查看数据库大小
SELECT TABLE_SCHEMA,
ROUND(SUM(DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024 / 1024, 2) AS size_gb
FROM information_schema.TABLES
GROUP BY TABLE_SCHEMA
ORDER BY size_gb DESC;
-- 查看 Binlog 大小
SHOW BINARY LOGS;
PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 3 DAY);
-- 查看大表
SELECT TABLE_NAME,
ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 0) AS size_mb,
TABLE_ROWS
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'myapp'
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC
LIMIT 20;
-- 清理临时文件
-- 检查 ibtmp1 大小
ls -lh /var/lib/mysql/ibtmp1
21.4 数据恢复
21.4.1 误删数据
-- 方案 1:从 Binlog 恢复(PITR)
-- 找到误操作的精确位置
mysqlbinlog --base64-output=DECODE-ROWS -v mysql-bin.000005 | grep -B5 "DELETE FROM users"
-- 停止在误操作之前
mysqlbinlog --stop-position=12345 mysql-bin.000005 | mysql -uroot -p
-- 跳过误操作,继续
mysqlbinlog --start-position=12400 mysql-bin.000005 mysql-bin.000006 | mysql -uroot -p
-- 方案 2:从备份恢复
mysql -uroot -p myapp < /backup/myapp_yesterday.sql
21.4.2 表损坏修复
-- InnoDB 表损坏
-- 1. 设置强制恢复模式
-- my.cnf
-- [mysqld]
-- innodb_force_recovery = 1 -- 从 1 到 6 逐级尝试
-- 2. 导出数据
mysqldump -uroot -p myapp damaged_table > recovered.sql
-- 3. 恢复正常模式,重建表
DROP TABLE damaged_table;
mysql -uroot -p myapp < recovered.sql
-- MyISAM 表修复
REPAIR TABLE myisam_table;
CHECK TABLE myisam_table;
21.4.3 误 DROP TABLE
-- 1. 立即停止所有写操作
-- 2. 从最近备份恢复该表
-- 3. 用 Binlog 补齐备份到误删之间的数据
21.5 复制问题
21.5.1 复制中断
-- 查看复制状态
SHOW REPLICA STATUS\G
-- 常见错误:
-- 1062: Duplicate entry(主键冲突)
-- 1032: Record not found(记录不存在)
-- 跳过错误
STOP REPLICA;
SET GLOBAL sql_replica_skip_counter = 1;
START REPLICA;
-- GTID 模式跳过
STOP REPLICA;
SET GTID_NEXT = 'uuid:transaction_id';
BEGIN; COMMIT;
SET GTID_NEXT = 'AUTOMATIC';
START REPLICA;
21.5.2 复制延迟
-- 查看延迟
SHOW REPLICA STATUS\G
-- Seconds_Behind_Source
-- 常见原因:
-- 1. 从库配置低
-- 2. 大事务(大批量 DELETE/UPDATE)
-- 3. 单线程回放
-- 4. 从库磁盘 I/O 瓶颈
-- 优化:
SET GLOBAL replica_parallel_workers = 8;
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';
21.6 常见错误码速查
| 错误码 | 说明 | 常见原因 |
|---|---|---|
| 1045 | Access denied | 密码错误 |
| 1062 | Duplicate entry | 唯一键冲突 |
| 1205 | Lock wait timeout | 锁等待超时 |
| 1213 | Deadlock | 死锁 |
| 2003 | Can’t connect | 网络/配置问题 |
| 2006 | Server gone away | 连接超时/包过大 |
| 2013 | Lost connection | 网络中断 |
| 1040 | Too many connections | 连接数超限 |
| 1114 | Table is full | 磁盘满 |
| 1153 | Packet too large | max_allowed_packet 太小 |
-- ERROR 2006: MySQL server has gone away
SHOW VARIABLES LIKE 'max_allowed_packet';
SET GLOBAL max_allowed_packet = 64 * 1024 * 1024; -- 64MB
SHOW VARIABLES LIKE 'wait_timeout';
SET GLOBAL wait_timeout = 600;
SET GLOBAL interactive_timeout = 600;
业务场景
场景 1:紧急故障排查流程
1. 确认故障现象(连接失败?查询慢?数据丢失?)
↓
2. 检查 MySQL 进程状态
↓
3. 查看错误日志
↓
4. 检查系统资源(CPU、内存、磁盘、网络)
↓
5. 检查当前连接和查询
↓
6. 定位根因并修复
↓
7. 验证修复
↓
8. 记录故障报告,制定预防措施