强曰为道

与天地相似,故不违。知周乎万物,而道济天下,故不过。旁行而不流,乐天知命,故不忧.
文档目录

14 - 容器化部署

14 - 容器化部署

14.1 容器化部署概述

现代 Ceph 部署(cephadm)默认使用容器化方式运行所有守护进程,同时也支持通过 ROOK 在 Kubernetes 中部署。

部署方式运行环境适用场景
cephadm + Podman裸机/虚拟机传统部署,运维团队管理
cephadm + Docker裸机/虚拟机Docker 生态
ROOK OperatorKubernetes云原生环境

cephadm 容器化架构

┌─────────────────────────────────────┐
│           宿主机                     │
│  ┌─────────┐  ┌─────────┐          │
│  │  Ceph   │  │  Ceph   │          │
│  │  MON    │  │  OSD    │   ← 每个组件运行在独立容器中
│  │(容器)   │  │(容器)   │          │
│  └─────────┘  └─────────┘          │
│  ┌─────────┐  ┌─────────┐          │
│  │  Ceph   │  │  Ceph   │          │
│  │  MGR    │  │  RGW    │          │
│  │(容器)   │  │(容器)   │          │
│  └─────────┘  └─────────┘          │
│                                     │
│  宿主机 OS: Ubuntu 22.04 / Rocky 9 │
│  容器运行时: Podman / Docker        │
└─────────────────────────────────────┘

14.2 cephadm 容器化管理

查看容器状态

# 查看 Ceph 守护进程容器
sudo podman ps  # 或 docker ps

# 查看所有 Ceph 容器
ceph orch ps

# 查看容器日志
sudo podman logs ceph-mon-node1
sudo podman logs ceph-osd-0

# 查看容器资源使用
sudo podman stats

容器镜像管理

# 查看当前使用的镜像
ceph config get mon container_image
ceph config get osd container_image

# 修改镜像版本(升级时使用)
ceph config set mon container_image quay.io/ceph/ceph:v18.2.2
ceph config set osd container_image quay.io/ceph/ceph:v18.2.2

# 拉取镜像
sudo podman pull quay.io/ceph/ceph:v18.2.2

# 使用私有镜像仓库
ceph config set global container_image_registry registry.example.com

自定义容器配置

# /etc/ceph/ceph.conf 中添加容器相关配置
[global]
container_image = quay.io/ceph/ceph:v18.2.2

# 容器额外参数
[osd]
container_cpus = 4
container_memory = 8g

14.3 ROOK 部署详解

ROOK 架构

┌──────────────────────────────────────────────┐
│              Kubernetes 集群                  │
│                                              │
│  ┌──────────────────────────────────────┐   │
│  │         ROOK Operator                 │   │
│  │   (管理 Ceph 集群生命周期)             │   │
│  └──────────┬───────────────────────────┘   │
│             │ 创建/管理                      │
│             ↓                                │
│  ┌──────────────────────────────────────┐   │
│  │         CephCluster CR                │   │
│  │   (声明式定义 Ceph 集群)               │   │
│  └──────────┬───────────────────────────┘   │
│             │ 生成                           │
│             ↓                                │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐      │
│  │ MON  │ │ MGR  │ │ OSD  │ │ RGW  │      │
│  │ (Pod)│ │ (Pod)│ │ (Pod)│ │ (Pod)│      │
│  └──────┘ └──────┘ └──────┘ └──────┘      │
│                                              │
│  ┌──────────────────────────────────────┐   │
│  │     CSI Drivers (RBD + CephFS)       │   │
│  │   (动态供给 PersistentVolume)          │   │
│  └──────────────────────────────────────┘   │
└──────────────────────────────────────────────┘

部署步骤

# 1. 添加 Helm 仓库
helm repo add rook-release https://charts.rook.io/release
helm repo update

# 2. 创建命名空间
kubectl create namespace rook-ceph

# 3. 安装 CRDs
kubectl create -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/crds.yaml

# 4. 安装 ROOK Operator
helm install rook-ceph rook-release/rook-ceph \
    --namespace rook-ceph \
    --set csi.enableCephfsDriver=true \
    --set csi.enableRBDDriver=true \
    --set enableDiscoveryDaemon=true

# 5. 等待 Operator 就绪
kubectl -n rook-ceph wait --for=condition=ready pod -l app=rook-ceph-operator --timeout=300s

Ceph 集群配置

# cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v18.2.2
    allowUnsupported: false
  dataDirHostPath: /var/lib/rook
  skipUpgradeChecks: false
  continueUpgradeAfterChecksEvenIfNotHealthy: false
  mon:
    count: 3
    allowMultiplePerNode: false
    volumeClaimTemplate:
      spec:
        storageClassName: local-storage
        resources:
          requests:
            storage: 10Gi
  mgr:
    count: 2
    modules:
      - name: rook
        enabled: true
      - name: pg_autoscaler
        enabled: true
      - name: balancer
        enabled: true
  dashboard:
    enabled: true
    ssl: true
  monitoring:
    enabled: true
    metricsDisabled: false
  network:
    provider: host
    ipFamily: IPv4
  storage:
    useAllNodes: true
    useAllDevices: true
    config:
      osdsPerDevice: "1"
    storageClassDeviceSets: []
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0
# 应用集群配置
kubectl apply -f cluster.yaml

# 监控部署进度
watch kubectl -n rook-ceph get pods

# 验证集群
kubectl -n rook-ceph exec deploy/rook-ceph-tools -- ceph -s

14.4 ROOK 存储类配置

RBD StorageClass

# storageclass-rbd.yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
    requireSafeReplicaSize: true
  parameters:
    compression_mode: none
  enableRBDStats: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-rbd
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: replicapool
  imageFormat: "2"
  imageFeatures: layering,exclusive-lock,object-map,fast-diff
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  - discard

CephFS StorageClass

# storageclass-cephfs.yaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: myfs
  namespace: rook-ceph
spec:
  metadataPool:
    replicated:
      size: 3
    enableRBDStats: true
  dataPools:
    - failureDomain: host
      replicated:
        size: 3
      enableRBDStats: true
  metadataServer:
    activeCount: 2
    activeStandby: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  clusterID: rook-ceph
  fsName: myfs
  pool: myfs-data0
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
kubectl apply -f storageclass-rbd.yaml
kubectl apply -f storageclass-cephfs.yaml

# 设置默认 StorageClass
kubectl patch storageclass ceph-rbd -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

14.5 Kubernetes 中使用 Ceph 存储

PVC 示例

# pvc-rbd.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ceph-rbd
  resources:
    requests:
      storage: 50Gi
# pvc-cephfs.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-data
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: cephfs
  resources:
    requests:
      storage: 100Gi

StatefulSet 示例

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
        - name: mysql
          image: mysql:8.0
          env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: password
          ports:
            - containerPort: 3306
          volumeMounts:
            - name: mysql-data
              mountPath: /var/lib/mysql
  volumeClaimTemplates:
    - metadata:
        name: mysql-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: ceph-rbd
        resources:
          requests:
            storage: 50Gi

14.6 ROOK 管理命令

# 使用 rook-ceph-tools
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

# 内部使用 ceph 命令
ceph -s
ceph osd tree
ceph df

# 查看 Ceph 集群状态
kubectl -n rook-ceph get cephcluster rook-ceph -o yaml

# 扩容 OSD
kubectl -n rook-ceph get cephcluster rook-ceph -o jsonpath='{.spec.storage.useAllNodes}'
# 修改 useAllNodes 或使用 storageClassDeviceSets

# 更新 Ceph 版本
kubectl -n rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cephVersion":{"image":"quay.io/ceph/ceph:v18.2.4"}}}'

# 查看 RGW
kubectl -n rook-ceph get cephobjectstore

# 查看 CephFS
kubectl -n rook-ceph get cephfilesystem

14.7 ROOK 监控集成

# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: rook-ceph-mgr
  namespace: rook-ceph
spec:
  namespaceSelector:
    matchNames:
      - rook-ceph
  selector:
    matchLabels:
      app: rook-ceph-mgr
      rook_cluster: rook-ceph
  endpoints:
    - port: http-metrics
      path: /metrics
      interval: 5s
# 安装 ROOK Dashboard
kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/monitoring/service-monitor.yaml

# 获取 Dashboard 密码
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 -d

# 端口转发访问 Dashboard
kubectl -n rook-ceph port-forward svc/rook-ceph-mgr-dashboard 8443:8443

14.8 注意事项

事项说明
OSD 需要特权容器需要访问裸设备
节点上需要 udev设备发现依赖 udev
数据目录持久化dataDirHostPath 必须是持久化路径
网络要求Pod 网络需支持 hostNetwork 或 multi-tenancy
升级顺序先升级 Operator,再升级 Ceph 集群
备份 CRD定期备份 ROOK CRD 资源

扩展阅读

  1. ROOK 官方文档
  2. ROOK Ceph 集群配置
  3. Ceph CSI 驱动
  4. ROOK 监控

下一章15 - 故障排查 — 学习常见问题的排查方法,包括 PG 异常、OSD 问题、网络问题和慢请求。