mirror of
https://github.com/ceph/ceph-csi.git
synced 2025-01-01 17:55:32 +00:00
410 lines
15 KiB
Markdown
410 lines
15 KiB
Markdown
|
# Failover and Failback In Disaster Recovery
|
|||
|
|
|||
|
[RBD mirroring](https://docs.ceph.com/en/latest/rbd/rbd-mirroring/)
|
|||
|
is an asynchronous replication of RBD images between multiple Ceph clusters.
|
|||
|
This capability is available in two modes:
|
|||
|
|
|||
|
* Journal-based: Every write to the RBD image is first recorded
|
|||
|
to the associated journal before modifying the actual image.
|
|||
|
The remote cluster will read from this associated journal and
|
|||
|
replay the updates to its local image.
|
|||
|
* Snapshot-based: This mode uses periodically scheduled or
|
|||
|
manually created RBD image mirror-snapshots to replicate
|
|||
|
crash-consistent RBD images between clusters.
|
|||
|
|
|||
|
This documentation assumes that `rbd mirroring` is set up between
|
|||
|
two clusters.
|
|||
|
For more information on how to set up rbd mirroring, refer to
|
|||
|
[ceph documentation](https://docs.ceph.com/en/latest/rbd/rbd-mirroring/).
|
|||
|
|
|||
|
## Deploy the Volume Replication CRD
|
|||
|
|
|||
|
Volume Replication Operator is a kubernetes operator that provides common
|
|||
|
and reusable APIs for storage disaster recovery.
|
|||
|
It is based on [csi-addons/spec](https://github.com/csi-addons/spec)
|
|||
|
specification and can be used by any storage provider.
|
|||
|
|
|||
|
Volume Replication Operator follows controller pattern and provides
|
|||
|
extended APIs for storage disaster recovery.
|
|||
|
The extended APIs are provided via Custom Resource Definition (CRD).
|
|||
|
|
|||
|
>:bulb: For more information, please refer to the
|
|||
|
> [volume-replication-operator](https://github.com/csi-addons/volume-replication-operator).
|
|||
|
|
|||
|
* Deploy the `VolumeReplicationClass` CRD
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl create -f https://raw.githubusercontent.com/csi-addons/volume-replication-operator/release-v0.1/config/crd/bases/replication.storage.openshift.io_volumereplicationclasses.yaml
|
|||
|
|
|||
|
customresourcedefinition.apiextensions.k8s.io/volumereplicationclasses.replication.storage.openshift.io created
|
|||
|
|
|||
|
```
|
|||
|
|
|||
|
* Deploy the `VolumeReplication` CRD
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl create -f https://raw.githubusercontent.com/csi-addons/volume-replication-operator/release-v0.1/config/crd/bases/replication.storage.openshift.io_volumereplications.yaml
|
|||
|
|
|||
|
customresourcedefinition.apiextensions.k8s.io/volumereplications.replication.storage.openshift.io created created
|
|||
|
```
|
|||
|
|
|||
|
The VolumeReplicationClass and VolumeReplication CRDs are now created.
|
|||
|
|
|||
|
>:bulb: **Note:** Use the latest available release for Volume Replication Operator.
|
|||
|
> See [releases](https://github.com/csi-addons/volume-replication-operator/branches)
|
|||
|
> for more information.
|
|||
|
|
|||
|
### Add RBAC rules for Volume Replication Operator
|
|||
|
|
|||
|
Add the below mentioned rules to `rbd-external-provisioner-runner`
|
|||
|
ClusterRole in [csi-provisioner-rbac.yaml](https://github.com/ceph/ceph-csi/blob/release-v3.3/deploy/rbd/kubernetes/csi-provisioner-rbac.yaml)
|
|||
|
|
|||
|
```yaml
|
|||
|
- apiGroups: ["replication.storage.openshift.io"]
|
|||
|
resources: ["volumereplications", "volumereplicationclasses"]
|
|||
|
verbs: ["create", "delete", "get", "list", "patch", "update", "watch"]
|
|||
|
- apiGroups: ["replication.storage.openshift.io"]
|
|||
|
resources: ["volumereplications/finalizers"]
|
|||
|
verbs: ["update"]
|
|||
|
- apiGroups: ["replication.storage.openshift.io"]
|
|||
|
resources: ["volumereplications/status"]
|
|||
|
verbs: ["get", "patch", "update"]
|
|||
|
- apiGroups: ["replication.storage.openshift.io"]
|
|||
|
resources: ["volumereplicationclasses/status"]
|
|||
|
verbs: ["get"]
|
|||
|
```
|
|||
|
|
|||
|
### Deploy the Volume Replication Sidecar
|
|||
|
|
|||
|
To deploy `volume-replication` sidecar container in `csi-rbdplugin-provisioner`
|
|||
|
pod, add the following yaml to
|
|||
|
[csi-rbdplugin-provisioner deployment](https://github.com/ceph/ceph-csi/blob/release-v3.3/deploy/rbd/kubernetes/csi-rbdplugin-provisioner.yaml).
|
|||
|
|
|||
|
```yaml
|
|||
|
- name: volume-replication
|
|||
|
image: quay.io/csiaddons/volumereplication-operator:v0.1.0
|
|||
|
args :
|
|||
|
- "--metrics-bind-address=0"
|
|||
|
- "--leader-election-namespace=$(NAMESPACE)"
|
|||
|
- "--driver-name=rbd.csi.ceph.com"
|
|||
|
- "--csi-address=$(ADDRESS)"
|
|||
|
- "--rpc-timeout=150s"
|
|||
|
- "--health-probe-bind-address=:9998"
|
|||
|
- "--leader-elect=true"
|
|||
|
env:
|
|||
|
- name: ADDRESS
|
|||
|
value: unix:///csi/csi-provisioner.sock
|
|||
|
- name: NAMESPACE
|
|||
|
valueFrom:
|
|||
|
fieldRef:
|
|||
|
fieldPath: metadata.namespace
|
|||
|
imagePullPolicy: "IfNotPresent"
|
|||
|
volumeMounts:
|
|||
|
- name: socket-dir
|
|||
|
mountPath: /csi
|
|||
|
```
|
|||
|
|
|||
|
## VolumeReplicationClass and VolumeReplication
|
|||
|
|
|||
|
### VolumeReplicationClass
|
|||
|
|
|||
|
*VolumeReplicationClass* is a cluster scoped resource that contains
|
|||
|
driver related configuration parameters. It holds the storage admin
|
|||
|
information required for the volume replication operator.
|
|||
|
|
|||
|
### VolumeReplication
|
|||
|
|
|||
|
*VolumeReplication* is a namespaced resource that contains references
|
|||
|
to storage object to be replicated and VolumeReplicationClass
|
|||
|
corresponding to the driver providing replication.
|
|||
|
|
|||
|
>:bulb: For more information, please refer to the
|
|||
|
> [volume-replication-operator](https://github.com/csi-addons/volume-replication-operator).
|
|||
|
|
|||
|
Let's say we have a *PVC* (rbd-pvc) in BOUND state; created using
|
|||
|
*StorageClass* with `Retain` reclaimPolicy.
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl get pvc --context=cluster-1
|
|||
|
|
|||
|
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
|
|||
|
rbd-pvc Bound pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec 1Gi RWO csi-rbd-sc 44s
|
|||
|
```
|
|||
|
|
|||
|
* Create Volume Replication Class on cluster-1
|
|||
|
|
|||
|
```yaml
|
|||
|
$cat <<EOF | kubectl --context=cluster1 apply -f -
|
|||
|
apiVersion: replication.storage.openshift.io/v1alpha1
|
|||
|
kind: VolumeReplicationClass
|
|||
|
metadata:
|
|||
|
name: rbd-volumereplicationclass
|
|||
|
spec:
|
|||
|
provisioner: rbd.csi.ceph.com
|
|||
|
parameters:
|
|||
|
mirroringMode: snapshot
|
|||
|
schedulingInterval: "12m"
|
|||
|
schedulingStartTime: "16:18:43"
|
|||
|
replication.storage.openshift.io/replication-secret-name: csi-rbd-secret
|
|||
|
replication.storage.openshift.io/replication-secret-namespace: default
|
|||
|
EOF
|
|||
|
```
|
|||
|
|
|||
|
>:bulb: **Note:** The `schedulingInterval` can be specified in formats of
|
|||
|
> minutes, hours or days using suffix `m`,`h` and `d` respectively.
|
|||
|
> The optional schedulingStartTime can be specified using the ISO 8601
|
|||
|
> time format.
|
|||
|
|
|||
|
* Once VolumeReplicationClass is created,create a Volume Replication for
|
|||
|
the PVC which we intend to replicate to secondary cluster.
|
|||
|
|
|||
|
```yaml
|
|||
|
$cat <<EOF | kubectl --context=cluster-1 apply -f -
|
|||
|
apiVersion: replication.storage.openshift.io/v1alpha1
|
|||
|
kind: VolumeReplication
|
|||
|
metadata:
|
|||
|
name: pvc-volumereplication
|
|||
|
spec:
|
|||
|
volumeReplicationClass: rbd-volumereplicationclass
|
|||
|
replicationState: primary
|
|||
|
dataSource:
|
|||
|
apiGroup: ""
|
|||
|
kind: PersistentVolumeClaim
|
|||
|
name: rbd-pvc # Name of the PVC to which mirroring to be enabled.
|
|||
|
EOF
|
|||
|
```
|
|||
|
|
|||
|
>:memo: *VolumeReplication* is a namespace scoped object. Thus,
|
|||
|
> it should be created in the same namespace as of PVC.
|
|||
|
|
|||
|
`replicationState` is the state of the volume being referenced.
|
|||
|
Possible values are primary, secondary, and resync.
|
|||
|
|
|||
|
* `primary` denotes that the volume is primary.
|
|||
|
* `secondary` denotes that the volume is secondary.
|
|||
|
* `resync` denotes that the volume needs to be resynced.
|
|||
|
|
|||
|
To check VolumeReplication CR status:
|
|||
|
|
|||
|
```yaml
|
|||
|
kubectl get volumereplication pvc-volumereplication --context=cluster-1 -oyaml
|
|||
|
|
|||
|
...
|
|||
|
spec:
|
|||
|
dataSource:
|
|||
|
apiGroup: ""
|
|||
|
kind: PersistentVolumeClaim
|
|||
|
name: rbd-pvc
|
|||
|
replicationState: primary
|
|||
|
volumeReplicationClass: rbd-volumereplicationclass
|
|||
|
status:
|
|||
|
conditions:
|
|||
|
- lastTransitionTime: "2021-05-04T07:39:00Z"
|
|||
|
message: ""
|
|||
|
observedGeneration: 1
|
|||
|
reason: Promoted
|
|||
|
status: "True"
|
|||
|
type: Completed
|
|||
|
- lastTransitionTime: "2021-05-04T07:39:00Z"
|
|||
|
message: ""
|
|||
|
observedGeneration: 1
|
|||
|
reason: Healthy
|
|||
|
status: "False"
|
|||
|
type: Degraded
|
|||
|
- lastTransitionTime: "2021-05-04T07:39:00Z"
|
|||
|
message: ""
|
|||
|
observedGeneration: 1
|
|||
|
reason: NotResyncing
|
|||
|
status: "False"
|
|||
|
type: Resyncing
|
|||
|
lastCompletionTime: "2021-05-04T07:39:00Z"
|
|||
|
lastStartTime: "2021-05-04T07:38:59Z"
|
|||
|
message: volume is marked primary
|
|||
|
observedGeneration: 1
|
|||
|
state: Primary
|
|||
|
```
|
|||
|
|
|||
|
* Take a backup of PVC and PV object on primary cluster(cluster-1)
|
|||
|
|
|||
|
* Take backup of the PVC `rbd-pvc`
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl get pvc rbd-pvc -oyaml >pvc-backup.yaml
|
|||
|
```
|
|||
|
|
|||
|
* Take a backup of the PV, corresponding to the PVC
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl get pv/pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec -oyaml >pv_backup.yaml
|
|||
|
```
|
|||
|
|
|||
|
>:bulb: We can also take backup using external tools like **Velero**.
|
|||
|
> Refer [velero documentation]((https://velero.io/docs/main/)) for more information.
|
|||
|
|
|||
|
* Restoring on the secondary cluster(cluster-2)
|
|||
|
|
|||
|
* Create storageclass on the secondary cluster
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl create -f examples/rbd/storageclass.yaml --context=cluster-2
|
|||
|
|
|||
|
storageclass.storage.k8s.io/csi-rbd-sc created
|
|||
|
```
|
|||
|
|
|||
|
* Create VolumeReplicationClass on the secondary cluster
|
|||
|
|
|||
|
```bash
|
|||
|
cat <<EOF | kubectl --context=cluster-2 apply -f -
|
|||
|
apiVersion: replication.storage.openshift.io/v1alpha1
|
|||
|
kind: VolumeReplicationClass
|
|||
|
metadata:
|
|||
|
name: rbd-volumereplicationclass
|
|||
|
spec:
|
|||
|
provisioner: rbd.csi.ceph.com
|
|||
|
parameters:
|
|||
|
mirroringMode: snapshot
|
|||
|
replication.storage.openshift.io/replication-secret-name: csi-rbd-secret
|
|||
|
replication.storage.openshift.io/replication-secret-namespace: default
|
|||
|
EOF
|
|||
|
|
|||
|
volumereplicationclass.replication.storage.openshift.io/rbd-volumereplicationclass created
|
|||
|
```
|
|||
|
|
|||
|
* If Persistent Volumes and Claims are created manually
|
|||
|
on the secondary cluster, remove the `claimRef` on the
|
|||
|
backed up PV objects in yaml files; so that the PV can
|
|||
|
get bound to the new claim on the secondary cluster.
|
|||
|
|
|||
|
```yaml
|
|||
|
...
|
|||
|
spec:
|
|||
|
accessModes:
|
|||
|
- ReadWriteOnce
|
|||
|
capacity:
|
|||
|
storage: 1Gi
|
|||
|
claimRef:
|
|||
|
apiVersion: v1
|
|||
|
kind: PersistentVolumeClaim
|
|||
|
name: rbd-pvc
|
|||
|
namespace: default
|
|||
|
resourceVersion: "64252"
|
|||
|
uid: 65dc0aac-5e15-4474-90f4-7a3532c621ec
|
|||
|
csi:
|
|||
|
...
|
|||
|
```
|
|||
|
|
|||
|
* Apply the Persistent Volume backup from the primary cluster
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl create -f pv-backup.yaml --context=cluster-2
|
|||
|
|
|||
|
persistentvolume/pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec created
|
|||
|
```
|
|||
|
|
|||
|
* Apply the Persistent Volume claim from the restored backup
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl create -f pvc-backup.yaml --context=cluster-2
|
|||
|
|
|||
|
persistentvolumeclaim/rbd-pvc created
|
|||
|
```
|
|||
|
|
|||
|
```bash
|
|||
|
kubectl get pvc --context=cluster-2
|
|||
|
|
|||
|
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
|
|||
|
rbd-pvc Bound pvc-65dc0aac-5e15-4474-90f4-7a3532c621ec 1Gi RWO csi-rbd-sc 44s
|
|||
|
```
|
|||
|
|
|||
|
## Planned Migration
|
|||
|
|
|||
|
> Use cases: Datacenter maintenance, Technology refresh, Disaster avoidance, etc.
|
|||
|
|
|||
|
### Failover
|
|||
|
|
|||
|
The failover operation is the process of switching production to a
|
|||
|
backup facility (normally your recovery site). In the case of Failover,
|
|||
|
access to the image on the primary site should be stopped.
|
|||
|
The image should now be made *primary* on the secondary cluster so that
|
|||
|
the access can be resumed there.
|
|||
|
|
|||
|
:memo: As mentioned in the pre-requisites, periodic or one time backup of
|
|||
|
the application should be available for restore on the secondary site (cluster-b).
|
|||
|
|
|||
|
Follow the below steps for planned migration of workload from primary
|
|||
|
cluster to secondary cluster:
|
|||
|
|
|||
|
* Scale down all the application pods which are using the
|
|||
|
mirrored PVC on the Primary Cluster
|
|||
|
* Take a back up of PVC and PV object from the primary cluster.
|
|||
|
This can be done using some backup tools like
|
|||
|
[velero](https://velero.io/docs/main/).
|
|||
|
* Update `replicationState` to `secondary` in VolumeReplication CR at Primary Site.
|
|||
|
When the operator sees this change, it will pass the information down to the
|
|||
|
driver via GRPC request to mark the dataSource as `secondary`.
|
|||
|
* If you are manually recreating the PVC and PV on the secondary cluster,
|
|||
|
remove the `claimRef` section in the PV objects.
|
|||
|
* Recreate the storageclass, PVC, and PV objects on the secondary site.
|
|||
|
* As you are creating the static binding between PVC and PV, a new PV won’t
|
|||
|
be created here, the PVC will get bind to the existing PV.
|
|||
|
* Create the VolumeReplicationClass on the secondary site.
|
|||
|
* Create the VolumeReplications for all the PVC’s for which mirroring
|
|||
|
is enabled
|
|||
|
* `replicationState` should be `primary` for all the PVC’s on
|
|||
|
the secondary site.
|
|||
|
* Check whether the image is marked `primary` on the secondary site
|
|||
|
by verifying it in VolumeReplication CR status.
|
|||
|
* Once the Image is marked as `primary`, the PVC is now ready
|
|||
|
to be used. Now, we can scale up the applications to use the PVC.
|
|||
|
|
|||
|
>:memo: **WARNING**: In Async Disaster recovery use case, we don't
|
|||
|
> get the complete data. We will only get the crash-consistent data
|
|||
|
> based on the snapshot interval time.
|
|||
|
|
|||
|
### Failback
|
|||
|
|
|||
|
To perform a failback operation to primary cluster in case of planned migration
|
|||
|
, just repeat the Failback steps in vice-versa.
|
|||
|
|
|||
|
>:memo: **Remember**: We can skip the backup-restore operations
|
|||
|
> in case of failback if the required yamls are already present on
|
|||
|
> the primary cluster. Any new PVCs will still need to be restored on the
|
|||
|
> primary site.
|
|||
|
|
|||
|
## Disaster Recovery
|
|||
|
|
|||
|
> Use cases: Natural disasters, Power failures, System failures, and crashes, etc.
|
|||
|
|
|||
|
### Failover (abrupt shutdown)
|
|||
|
|
|||
|
In case of Disaster recovery, create VolumeReplication CR at the Secondary Site.
|
|||
|
Since the connection to the Primary Site is lost, the operator automatically
|
|||
|
sends a GRPC request down to the driver to forcefully mark the dataSource as `primary`.
|
|||
|
|
|||
|
* If you are manually creating the PVC and PV on the secondary cluster, remove
|
|||
|
the claimRef section in the PV objects.
|
|||
|
* Create the storageclass, PVC, and PV objects on the secondary site.
|
|||
|
* As you are creating the static binding between PVC and PV, a new PV won’t be
|
|||
|
created here, the PVC will get bind to the existing PV.
|
|||
|
* Create the VolumeReplicationClass and VolumeReplication CR on the secondary site.
|
|||
|
* Check whether the image is `primary` on secondary site, by verifying in
|
|||
|
the VolumeReplication CR status.
|
|||
|
* Once the Image is marked as `primary`, the PVC is now ready to be used. Now,
|
|||
|
we can scale up the applications to use the PVC.
|
|||
|
|
|||
|
### Failback (post-disaster recovery)
|
|||
|
|
|||
|
Once the failed cluster is recovered on the primary site and you want to failback
|
|||
|
from secondary site, follow the below steps:
|
|||
|
|
|||
|
* Update the VolumeReplication CR replicationState
|
|||
|
from `primary` to `secondary` on the primary site.
|
|||
|
* Scale down the applications on the secondary site.
|
|||
|
* Update the VolumeReplication CR replicationState from `primary` to
|
|||
|
`secondary` in secondary site.
|
|||
|
* On the primary site, verify that the VolumeReplication status is marked as
|
|||
|
volume ready to use
|
|||
|
* Once the volume is marked to ready to use, change the replicationState state
|
|||
|
from `secondary` to `primary` in primary site.
|
|||
|
* Scale up the applications again on the primary site.
|