mirror of
https://github.com/ceph/ceph-csi.git
synced 2024-11-09 16:00:22 +00:00
doc: design document for rbd mirroring
This document outlines the internal cephcsi design to handle mirrored RBD images. Co-authored-by: ShyamsundarR <srangana@redhat.com> Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit is contained in:
parent
39b1f2b4d3
commit
28793efc90
102
docs/design/proposals/rbd-mirror.md
Normal file
102
docs/design/proposals/rbd-mirror.md
Normal file
@ -0,0 +1,102 @@
|
|||||||
|
# RBD MIRRORING
|
||||||
|
|
||||||
|
RBD mirroring is a process of replication of RBD images between two or more
|
||||||
|
Ceph clusters. Mirroring ensures point-in-time, crash-consistent RBD images
|
||||||
|
between clusters, RBD mirroring is mainly used for disaster recovery (i.e.
|
||||||
|
having a secondary site as a failover). See [Ceph
|
||||||
|
documentation](https://docs.ceph.com/en/latest/rbd/rbd-mirroring) on RBD
|
||||||
|
mirroring for complete information.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
![mirror](rbd-mirror.png)
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
Currently, CEPH-CSI generates its unique ID for each RBD image and stores the
|
||||||
|
corresponding PersistentVolume (PV) name and the unique ID mapping, It creates
|
||||||
|
the RBD image with the unique ID and returns the encoded value which contains
|
||||||
|
all the required information for other operations. For mirroring, the same RBD
|
||||||
|
image will be mirrored to the secondary cluster. As the journal(OMAP data) is
|
||||||
|
not mirrored to the secondary cluster, The RBD images corresponding to the PV
|
||||||
|
can not be identified without OMAP data.
|
||||||
|
|
||||||
|
**Pre-req** It's expected that the Kubernetes Admin/User will create the static
|
||||||
|
PersistentVolumeClaim (PVC) on the secondary site during the failover.
|
||||||
|
|
||||||
|
**Note:** when the static PVC created on the secondary site we cannot use the
|
||||||
|
VolumeHandle to identify the OMAP data nor the image anymore because as we have
|
||||||
|
only PoolID and ClusterID in the VolumeHandle. We cannot identify the correct
|
||||||
|
pool name from the PoolID because pool name will remain the same on both
|
||||||
|
clusters but not the PoolID even the ClusterID can be different on the
|
||||||
|
secondary cluster.
|
||||||
|
|
||||||
|
> Sample PV spec which will be used by rbdplugin controller to regenerate OMAP
|
||||||
|
> data
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
csi:
|
||||||
|
controllerExpandSecretRef:
|
||||||
|
name: rook-csi-rbd-provisioner
|
||||||
|
namespace: rook-ceph
|
||||||
|
driver: rook-ceph.rbd.csi.ceph.com
|
||||||
|
fsType: ext4
|
||||||
|
nodeStageSecretRef:
|
||||||
|
name: rook-csi-rbd-node
|
||||||
|
namespace: rook-ceph
|
||||||
|
volumeAttributes:
|
||||||
|
clusterID: rook-ceph
|
||||||
|
imageFeatures: layering
|
||||||
|
imageFormat: "2"
|
||||||
|
imageName: csi-vol-0c23de1c-18fb-11eb-a903-0242ac110005
|
||||||
|
journalPool: replicapool
|
||||||
|
pool: replicapool
|
||||||
|
radosNamespace: ""
|
||||||
|
volumeHandle: 0001-0009-rook-ceph-0000000000000002-0c23de1c-18fb-11eb-a903-0242ac110005
|
||||||
|
```
|
||||||
|
|
||||||
|
> **VolumeHandle** is the unique volume name returned by the CSI volume plugin’s
|
||||||
|
CreateVolume to refer to the volume on all subsequent calls.
|
||||||
|
|
||||||
|
Once the static PVC is created on the secondary cluster, the Kubernetes User
|
||||||
|
can try delete the PVC,expand the PVC or mount the PVC. Incase of mounting
|
||||||
|
(NodeStageVolume) we will get the volume context in RPC call but not in the
|
||||||
|
Delete/Expand Request. In Delete/Expand RPC request only the VolumeHandle
|
||||||
|
(`clusterID-poolID-volumeuniqueID`) will be sent where it contains the encoded
|
||||||
|
information related to ClusterID and PoolID. The VolumeHandle is not useful in
|
||||||
|
secondary cluster as the PoolID and ClusterID always may not be the same.
|
||||||
|
|
||||||
|
> In this design document we will talk about new controller(rbdplugin
|
||||||
|
> controller) not replication controller, in next releases we will design the
|
||||||
|
> replication controller to perform mirroring operations. The rbdplugin
|
||||||
|
> controller will run as a sidecar in RBD provisioner pod.
|
||||||
|
|
||||||
|
To solve this problem, We will have a new controller(rbdplugin controller)
|
||||||
|
running as part of provisioner pod which watches for the PV objects. When a PV
|
||||||
|
is created it will extract the required information from the PV spec and it
|
||||||
|
will regenerate the OMAP data and also it will generate a new VolumeHandle
|
||||||
|
(`newclusterID-newpoolID-volumeuniqueID`) and it creates an OMAP object for
|
||||||
|
mapping between old VolumeHandle and new VolumeHandle. Whenever Ceph-CSI gets a
|
||||||
|
RPC request with older VolumeHandle, it will check if any new VolumeHandle
|
||||||
|
exists for the old VolumeHandle. If yes, it uses the new VolumeHandle for
|
||||||
|
internal operations (to get pool name, Ceph monitor details from the ClusterID
|
||||||
|
etc).
|
||||||
|
|
||||||
|
Currently, We are making use of watchers in node stage request to make sure
|
||||||
|
ReadWriteOnce (RWO) PVC is mounted on a single node at a given point in time.
|
||||||
|
We need to change the watchers logic in the node stage request as when we
|
||||||
|
enable the RBD mirroring on an image, a watcher will be added on a RBD image by
|
||||||
|
the rbd mirroring daemon.
|
||||||
|
|
||||||
|
To solve the ClusterID problem, If the ClusterID is different on the second
|
||||||
|
cluster, the admin has to create a new ConfigMap for the mapped ClusterID's.
|
||||||
|
Whenever Ceph-CSI gets a request, it will check if the ClusterID mapping exists
|
||||||
|
and uses the mapped ClusterID to get the information like Ceph monitors etc.
|
||||||
|
|
||||||
|
**This design does not cover the below items:**
|
||||||
|
|
||||||
|
* Bootstrapping RBD mirror daemon.
|
||||||
|
* Mirroring of PVC snapshots
|
||||||
|
* Mirroring for topology provisioned PVC.
|
||||||
|
* Documenting the steps to handle failover/fallback of an image.
|
||||||
|
* Workflow of a Replication controller.
|
BIN
docs/design/proposals/rbd-mirror.png
Normal file
BIN
docs/design/proposals/rbd-mirror.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 59 KiB |
Loading…
Reference in New Issue
Block a user