ceph-csi/internal/rbd
Madhu Rajanna 75fa1927fc rbd: mark image ready when image state is up+unknown
To recover from split brain (up+error) state the image need to be
demoted and requested for resync on site-a and then the image on site-b
should gets demoted.The volume should be marked to ready=true when the
image state on both the clusters are up+unknown because during the last
snapshot syncing the data gets copied first and then image state on the
site-a changes to up+unknown.

If the image state on both the sites are up+unknown consider that
complete data is synced as the last snapshot
gets exchanged between the clusters.

* create 10 GB of file and validate the data after resync

* Do Failover when the site-a goes down
* Force promote the image and write data in GiB
* Once the site-a comes back, Demote the image and issue resync
* Demote the image on site-b
* The status will get reflected on the other site when the last
  snapshot sync happens
* The image will go to up+unknown state. and complete data will
  be copied to site a
* Promote the image on site-a and use it

```bash
csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006:
  global_id:   e7f9ec55-06ab-46cb-a1ae-784be75ed96d
  state:       up+unknown
  description: remote image demoted
  service:     a on minicluster1
  last_update: 2021-04-28 07:11:56
  peer_sites:
    name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
    state: up+unknown
    description: remote image demoted
    last_update: 2021-04-28 07:11:41
 ```

* Do Failover when the site-a goes down
* Force promote the image on site-b and write data in GiB
* Demote the image on site-b
* Once the site-a comes back, Demote the image on site-a
* The images on the both site will go to split brain state

```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
  global_id:   115c3df9-3d4f-4c04-93a7-531b82155ddf
  state:       up+error
  description: split-brain
  service:     a on minicluster2
  last_update: 2021-04-28 07:25:41
  peer_sites:
    name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph
    state: up+error
    description: split-brain
    last_update: 2021-04-28 07:25:26
```
* Issue resync
* The images cannot be resynced because when we issue resync
  on site a the image on site-b was in demoted state
* To recover from this state (promote and then demote the
  image on site-b after sometime)

```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
  global_id:   115c3df9-3d4f-4c04-93a7-531b82155ddf
  state:       up+unknown
  description: remote image demoted
  service:     a on minicluster1
  last_update: 2021-04-28 07:32:56
  peer_sites:
    name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
    state: up+unknown
    description: remote image demoted
    last_update: 2021-04-28 07:32:41
```
* Once the data is copied we can see that  the image state
  is moved to up+unknown on both sites
* Promote the image on site-a and use it

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 07a916b84d)
2021-05-05 15:07:18 +00:00
..
clone.go cleanup: release resources for rbdImages objects after use 2021-04-14 03:59:28 +00:00
controllerserver.go rbd: return crypt error for the rpc return 2021-04-22 12:55:50 +05:30
driver.go rbd: Add ReplicationServer struct for replication operations 2021-04-05 08:53:40 +00:00
encryption.go logging: report issues in rbdImage.DEKStore API with stacks 2021-04-14 03:59:28 +00:00
errors.go rbd: correct the code comment for ErrFlattenInProgress 2020-10-20 08:59:25 +00:00
identityserver.go cleanup: address godot warnings 2020-07-21 08:36:24 +00:00
mirror.go rbd: check for peer site status 2021-04-05 08:53:40 +00:00
nodeserver_test.go cleanup: Remove support for Delete and Unmounting v1.1.0 PVC 2020-07-10 16:07:13 +00:00
nodeserver.go cleanup: refactor deeply nested if statements in internal/rbd 2021-04-07 02:31:41 +00:00
rbd_attach.go util: introduce VolumeEncryption type 2021-03-12 10:11:47 +00:00
rbd_journal.go cleanup: move copyEncryptionConfig() from CreateVolume to Exists() 2021-04-14 03:59:28 +00:00
rbd_util_test.go rbd: add exclusive-lock and journaling image features for rbd image 2021-03-24 09:48:04 +00:00
rbd_util.go rbd: modified logic to check image watchers 2021-04-20 11:54:30 +05:30
replicationcontrollerserver.go rbd: mark image ready when image state is up+unknown 2021-05-05 15:07:18 +00:00
snapshot.go rbd: delete encryption key from KMS 2021-04-30 09:37:23 +00:00