14 - 容器化部署
14 - 容器化部署
14.1 容器化部署概述
现代 Ceph 部署(cephadm)默认使用容器化方式运行所有守护进程,同时也支持通过 ROOK 在 Kubernetes 中部署。
| 部署方式 | 运行环境 | 适用场景 |
|---|---|---|
| cephadm + Podman | 裸机/虚拟机 | 传统部署,运维团队管理 |
| cephadm + Docker | 裸机/虚拟机 | Docker 生态 |
| ROOK Operator | Kubernetes | 云原生环境 |
cephadm 容器化架构
┌─────────────────────────────────────┐
│ 宿主机 │
│ ┌─────────┐ ┌─────────┐ │
│ │ Ceph │ │ Ceph │ │
│ │ MON │ │ OSD │ ← 每个组件运行在独立容器中
│ │(容器) │ │(容器) │ │
│ └─────────┘ └─────────┘ │
│ ┌─────────┐ ┌─────────┐ │
│ │ Ceph │ │ Ceph │ │
│ │ MGR │ │ RGW │ │
│ │(容器) │ │(容器) │ │
│ └─────────┘ └─────────┘ │
│ │
│ 宿主机 OS: Ubuntu 22.04 / Rocky 9 │
│ 容器运行时: Podman / Docker │
└─────────────────────────────────────┘
14.2 cephadm 容器化管理
查看容器状态
# 查看 Ceph 守护进程容器
sudo podman ps # 或 docker ps
# 查看所有 Ceph 容器
ceph orch ps
# 查看容器日志
sudo podman logs ceph-mon-node1
sudo podman logs ceph-osd-0
# 查看容器资源使用
sudo podman stats
容器镜像管理
# 查看当前使用的镜像
ceph config get mon container_image
ceph config get osd container_image
# 修改镜像版本(升级时使用)
ceph config set mon container_image quay.io/ceph/ceph:v18.2.2
ceph config set osd container_image quay.io/ceph/ceph:v18.2.2
# 拉取镜像
sudo podman pull quay.io/ceph/ceph:v18.2.2
# 使用私有镜像仓库
ceph config set global container_image_registry registry.example.com
自定义容器配置
# /etc/ceph/ceph.conf 中添加容器相关配置
[global]
container_image = quay.io/ceph/ceph:v18.2.2
# 容器额外参数
[osd]
container_cpus = 4
container_memory = 8g
14.3 ROOK 部署详解
ROOK 架构
┌──────────────────────────────────────────────┐
│ Kubernetes 集群 │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ ROOK Operator │ │
│ │ (管理 Ceph 集群生命周期) │ │
│ └──────────┬───────────────────────────┘ │
│ │ 创建/管理 │
│ ↓ │
│ ┌──────────────────────────────────────┐ │
│ │ CephCluster CR │ │
│ │ (声明式定义 Ceph 集群) │ │
│ └──────────┬───────────────────────────┘ │
│ │ 生成 │
│ ↓ │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │ MON │ │ MGR │ │ OSD │ │ RGW │ │
│ │ (Pod)│ │ (Pod)│ │ (Pod)│ │ (Pod)│ │
│ └──────┘ └──────┘ └──────┘ └──────┘ │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ CSI Drivers (RBD + CephFS) │ │
│ │ (动态供给 PersistentVolume) │ │
│ └──────────────────────────────────────┘ │
└──────────────────────────────────────────────┘
部署步骤
# 1. 添加 Helm 仓库
helm repo add rook-release https://charts.rook.io/release
helm repo update
# 2. 创建命名空间
kubectl create namespace rook-ceph
# 3. 安装 CRDs
kubectl create -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/crds.yaml
# 4. 安装 ROOK Operator
helm install rook-ceph rook-release/rook-ceph \
--namespace rook-ceph \
--set csi.enableCephfsDriver=true \
--set csi.enableRBDDriver=true \
--set enableDiscoveryDaemon=true
# 5. 等待 Operator 就绪
kubectl -n rook-ceph wait --for=condition=ready pod -l app=rook-ceph-operator --timeout=300s
Ceph 集群配置
# cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
image: quay.io/ceph/ceph:v18.2.2
allowUnsupported: false
dataDirHostPath: /var/lib/rook
skipUpgradeChecks: false
continueUpgradeAfterChecksEvenIfNotHealthy: false
mon:
count: 3
allowMultiplePerNode: false
volumeClaimTemplate:
spec:
storageClassName: local-storage
resources:
requests:
storage: 10Gi
mgr:
count: 2
modules:
- name: rook
enabled: true
- name: pg_autoscaler
enabled: true
- name: balancer
enabled: true
dashboard:
enabled: true
ssl: true
monitoring:
enabled: true
metricsDisabled: false
network:
provider: host
ipFamily: IPv4
storage:
useAllNodes: true
useAllDevices: true
config:
osdsPerDevice: "1"
storageClassDeviceSets: []
disruptionManagement:
managePodBudgets: true
osdMaintenanceTimeout: 30
pgHealthCheckTimeout: 0
# 应用集群配置
kubectl apply -f cluster.yaml
# 监控部署进度
watch kubectl -n rook-ceph get pods
# 验证集群
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph -s
14.4 ROOK 存储类配置
RBD StorageClass
# storageclass-rbd.yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
requireSafeReplicaSize: true
parameters:
compression_mode: none
enableRBDStats: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
pool: replicapool
imageFormat: "2"
imageFeatures: layering,exclusive-lock,object-map,fast-diff
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- discard
CephFS StorageClass
# storageclass-cephfs.yaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
name: myfs
namespace: rook-ceph
spec:
metadataPool:
replicated:
size: 3
enableRBDStats: true
dataPools:
- failureDomain: host
replicated:
size: 3
enableRBDStats: true
metadataServer:
activeCount: 2
activeStandby: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
clusterID: rook-ceph
fsName: myfs
pool: myfs-data0
csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
kubectl apply -f storageclass-rbd.yaml
kubectl apply -f storageclass-cephfs.yaml
# 设置默认 StorageClass
kubectl patch storageclass ceph-rbd -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
14.5 Kubernetes 中使用 Ceph 存储
PVC 示例
# pvc-rbd.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-data
namespace: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: ceph-rbd
resources:
requests:
storage: 50Gi
# pvc-cephfs.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-data
namespace: default
spec:
accessModes:
- ReadWriteMany
storageClassName: cephfs
resources:
requests:
storage: 100Gi
StatefulSet 示例
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 3
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: password
ports:
- containerPort: 3306
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: ceph-rbd
resources:
requests:
storage: 50Gi
14.6 ROOK 管理命令
# 使用 rook-ceph-tools
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
# 内部使用 ceph 命令
ceph -s
ceph osd tree
ceph df
# 查看 Ceph 集群状态
kubectl -n rook-ceph get cephcluster rook-ceph -o yaml
# 扩容 OSD
kubectl -n rook-ceph get cephcluster rook-ceph -o jsonpath='{.spec.storage.useAllNodes}'
# 修改 useAllNodes 或使用 storageClassDeviceSets
# 更新 Ceph 版本
kubectl -n rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cephVersion":{"image":"quay.io/ceph/ceph:v18.2.4"}}}'
# 查看 RGW
kubectl -n rook-ceph get cephobjectstore
# 查看 CephFS
kubectl -n rook-ceph get cephfilesystem
14.7 ROOK 监控集成
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: rook-ceph-mgr
namespace: rook-ceph
spec:
namespaceSelector:
matchNames:
- rook-ceph
selector:
matchLabels:
app: rook-ceph-mgr
rook_cluster: rook-ceph
endpoints:
- port: http-metrics
path: /metrics
interval: 5s
# 安装 ROOK Dashboard
kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/monitoring/service-monitor.yaml
# 获取 Dashboard 密码
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d
# 端口转发访问 Dashboard
kubectl -n rook-ceph port-forward svc/rook-ceph-mgr-dashboard 8443:8443
14.8 注意事项
| 事项 | 说明 |
|---|---|
| OSD 需要特权容器 | 需要访问裸设备 |
| 节点上需要 udev | 设备发现依赖 udev |
| 数据目录持久化 | dataDirHostPath 必须是持久化路径 |
| 网络要求 | Pod 网络需支持 hostNetwork 或 multi-tenancy |
| 升级顺序 | 先升级 Operator,再升级 Ceph 集群 |
| 备份 CRD | 定期备份 ROOK CRD 资源 |
扩展阅读
下一章:15 - 故障排查 — 学习常见问题的排查方法,包括 PG 异常、OSD 问题、网络问题和慢请求。