ceph-csi

mirror of https://github.com/ceph/ceph-csi.git synced 2024-12-18 11:00:25 +00:00

Author	SHA1	Message	Date
Madhu Rajanna	e4b7943bac	rbd: add workaround for force promote use ExecCommandWithTimeout with timeout of 1 minute for the promote operation. If the command doesnot returns error/response in 1 minute the process will be killed and error will be returned to the user. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-12-23 13:36:21 +00:00
Madhu Rajanna	edcb2b529b	rbd: move core fields to rbdImage struct moved ParentName, ParentPool and ImageFeatureSet fields to the rbdImage struct as these are the first citizens on the rbdImage. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-12-23 03:47:00 +00:00
Madhu Rajanna	810e285c50	rbd: reset dummy image id dummy image rbdVolume struct is derived from the actual one rbdVolume of the volumeID sent in the EnableVolumeReplication request. and the dummy rbdVolume struct contains the image id of the actual volume because of that when we are repairing the dummy image the image is sent to trash but not deleted due to the wrong image ID. resetting the image id will makes sure the image id is fetching from ceph cluster and same image id will be used for manager operation. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-12-21 17:39:07 +00:00
Niels de Vos	8d09134125	rbd: export GenVolFromVolID() for consumption by csi-addons genVolFromVolID() is used by the CSI Controller service to create an rbdVolume object from a CSI volume_id. This function is useful for CSI-Addons Services as well, so rename it to GenVolFromVolID(). Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-12-10 07:35:26 +00:00
Madhu Rajanna	8081ac8251	rbd: add new image features for dummy image The dummy image will be created with 1Mib size. during the snapshot transfer operation the 1Mib will be transferred even if the dummy image doesnot contains any data. adding the new image features `fast-diff,layering,obj-map,exclusive-lock`on the dummy image will ensure that only the diff is transferred to the remote cluster. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-12-07 17:34:14 +00:00
Madhu Rajanna	9a4533e549	rbd: create 1MiB size dummy image we added a workaround for rbd scheduling by creating a dummy image in #2656. with the fix we are creating a dummy image of the size of the first actual rbd image which is sent in EnableVolumeReplication request if the actual rbd image size is 1TiB we are creating a dummy image of 1TiB which is not good. even though its a thin provisioned rbd images this is causing issue for the transfer of the snapshot during the mirroring operation. This commit recreates the rbd image with 1MiB size which is the smaller supported size in rbd. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-12-07 17:34:14 +00:00
Madhu Rajanna	64ce5e0949	rbd: check local image state during promote operation rbd mirroring CLI calls are async and it doesn't wait for the operation to be completed. ex:- `rbd mirror image enable` it will enable the mirroring on the image but it doesn't ensure that the image is mirroring enabled and healthy primary. The same goes for the promote volume also. This commits adds a check-in PromoteVolume to make sure the image in a healthy state i.e `up+stopped`. note:- not considering any intermediate states to make sure the image is completely healthy before responding success to the RPC call. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-12-01 20:19:05 +00:00
Prasanna Kumar Kalever	e7d8834149	rbd: enabe journal based mirroring Journal-based RADOS block device mirroring ensures point-in-time consistent replicas of all changes to an image, including reads and writes, block device resizing, snapshots, clones, and flattening. Journaling-based mirroring records all modifications to an image in the order in which they occur. This ensures that a crash-consistent mirror of an image is available. Mirroring when configured in journal mode, mirroring will utilize the RBD journaling image feature to replicate the image contents. If the RBD journaling image feature is not yet enabled on the image, it will be automatically enabled. Fixes: #2018 Co-authored-by: Madhu Rajanna <madhupr007@gmail.com> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-12-01 14:12:30 +00:00
Madhu Rajanna	f0b2ea6a6d	rbd: repair imageid after resync During resync operation the local image will get deleted and a new image is recreated by the rbd mirroring. The new image will have a new imageID. Once resync is completed update the imageID in the OMAP to get the image removed from the trash during DeleteVolume. Before resyncing ``` sh-4.4# rbd info replicapool/csi-vol-0c25bdd3-485f-11ec-bd30-0242ac110004 rbd image 'csi-vol-0c25bdd3-485f-11ec-bd30-0242ac110004': size 1 GiB in 256 objects order 22 (4 MiB objects) snapshot_count: 1 id: 1efcc6b7a769 block_name_prefix: rbd_data.1efcc6b7a769 format: 2 features: layering op_features: flags: create_timestamp: Thu Nov 18 11:02:40 2021 access_timestamp: Thu Nov 18 11:02:40 2021 modify_timestamp: Thu Nov 18 11:02:40 2021 mirroring state: enabled mirroring mode: snapshot mirroring global id: 9c4c236d-8a47-4779-b4f6-94e05da70dbd mirroring primary: true ``` ``` sh-4.4# rados listomapvals csi.volume.0c25bdd3-485f-11ec-bd30-0242ac110004 --pool=replicapool csi.imageid value (12 bytes) : 00000000 31 65 66 63 63 36 62 37 61 37 36 39 \|1efcc6b7a769\| 0000000c csi.imagename value (44 bytes) : 00000000 63 73 69 2d 76 6f 6c 2d 30 63 32 35 62 64 64 33 \|csi-vol-0c25bdd3\| 00000010 2d 34 38 35 66 2d 31 31 65 63 2d 62 64 33 30 2d \|-485f-11ec-bd30-\| 00000020 30 32 34 32 61 63 31 31 30 30 30 34 \|0242ac110004\| 0000002c csi.volname value (40 bytes) : 00000000 70 76 63 2d 32 36 38 39 33 66 30 38 2d 66 66 32 \|pvc-26893f08-ff2\| 00000010 62 2d 34 61 30 66 2d 61 35 63 33 2d 38 38 34 62 \|b-4a0f-a5c3-884b\| 00000020 37 32 30 66 66 62 32 63 \|720ffb2c\| 00000028 csi.volume.owner value (7 bytes) : 00000000 64 65 66 61 75 6c 74 \|default\| 00000007 ``` After Resyncing ``` sh-4.4# rbd info replicapool/csi-vol-0c25bdd3-485f-11ec-bd30-0242ac110004 rbd image 'csi-vol-0c25bdd3-485f-11ec-bd30-0242ac110004': size 1 GiB in 256 objects order 22 (4 MiB objects) snapshot_count: 1 id: 10b183a48a97 block_name_prefix: rbd_data.10b183a48a97 format: 2 features: layering, non-primary op_features: flags: create_timestamp: Thu Nov 18 11:09:39 2021 access_timestamp: Thu Nov 18 11:09:39 2021 modify_timestamp: Thu Nov 18 11:09:39 2021 mirroring state: enabled mirroring mode: snapshot mirroring global id: 9c4c236d-8a47-4779-b4f6-94e05da70dbd mirroring primary: false sh-4.4# rados listomapvals csi.volume.0c25bdd3-485f-11ec-bd30-0242ac110004 --pool=replicapool csi.imageid value (12 bytes) : 00000000 31 30 62 31 38 33 61 34 38 61 39 37 \|10b183a48a97\| 0000000c csi.imagename value (44 bytes) : 00000000 63 73 69 2d 76 6f 6c 2d 30 63 32 35 62 64 64 33 \|csi-vol-0c25bdd3\| 00000010 2d 34 38 35 66 2d 31 31 65 63 2d 62 64 33 30 2d \|-485f-11ec-bd30-\| 00000020 30 32 34 32 61 63 31 31 30 30 30 34 \|0242ac110004\| 0000002c csi.volname value (40 bytes) : 00000000 70 76 63 2d 32 36 38 39 33 66 30 38 2d 66 66 32 \|pvc-26893f08-ff2\| 00000010 62 2d 34 61 30 66 2d 61 35 63 33 2d 38 38 34 62 \|b-4a0f-a5c3-884b\| 00000020 37 32 30 66 66 62 32 63 \|720ffb2c\| 00000028 csi.volume.owner value (7 bytes) : 00000000 64 65 66 61 75 6c 74 \|default\| 00000007 ``` Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-25 09:22:13 +00:00
Madhu Rajanna	027b68ab39	rbd: operate on dummy image after adding scheduling currently we are fist operating on the dummy image to refresh the pool and then we are adding the scheduling. we think the scheduling should be added first and than we should refresh the pool. If we do this all the existing schedules will be considered from the scheduler. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-23 11:04:42 +00:00
Madhu Rajanna	211ca9b5a7	rbd: do deep copy for dummyVol struct with shallow copy of rbdVol to dummyVol the image name update of the dummyVol is getting reflected on the rbdVol which we dont want. do deep copy to avoid this problem. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-23 11:04:42 +00:00
Shyamsundar Ranganathan	d1c21eece9	rbd: Update sequence of operations on dummy mirror image The dummy mirror image needs to be disabled and then reenabled for mirroring, to ensure a newly promoted primary is now starting to schedule snapshots. Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>	2021-11-19 09:38:59 +05:30
Madhu Rajanna	517ad8c644	rbd: use dummy image to workaround rbd scheduling bug currently we have a bug in rbd mirror scheduling module. After doing failover and failback the scheduling is not getting updated and the mirroring snapshots are not getting created periodically as per the scheduling interval. This PR workarounds this one by doing below operations * Create a dummy (unique) image per cluster and this image should be easily identified. * During Promote operation on any image enable the mirroring on the dummy image. when we enable the mirroring on the dummy image the pool will get updated and the scheduling will be reconfigured. * During Demote operation on any image disable the mirroring on the dummy image. the disable need to be done to enable the mirroring again when we get the promote request to make the image as primary * When the DR is no more needed, this image need to be manually cleanup as for now as we dont want to add a check in the existing DeleteVolume code path for delete dummy image as it impact the performance of the DeleteVolume workflow. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-19 09:38:59 +05:30
Madhu Rajanna	e4e0f397a6	rbd: run schedule during promote operation Moved to add scheduling to the promote operation as scheduling need to be added when the image is promoted and this is the correct method of adding the scheduling to make the scheduling take place. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-19 09:38:59 +05:30
Madhu Rajanna	0838845c6a	cleanup: remove FIXME from ResyncVolume as the complexity of ResyncVolume is reduced removing the FIXME which is not valid anymore. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	2017b8c621	rbd: log mirror daemon state for replication log the mirror deamon state in the local and remote cluster for better debugging. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	7472338334	rbd: remove unwanted const for comparing the image states use the states defined in the go-ceph avoid creating of the deplicate const in cephcsi. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	b92a6f5ccb	rbd: log the remote site details during resync logging the remote site details during resyncing for better debugging. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	1fd2f28fee	rbd: check local image state for resyncing below are the local states of the mirrored image "unknown" -> If the image is in an error state means data is completely synced "error" -> If the image is in an error state means it needs resync "syncing" "starting_replay" "replaying" "stopping_replay" "stopped" If the resync is successfully started which means the image will be in "replaying" state. we can consider "replaying" state to report resync succesfully going on state. we are discarding the intermediate states like "syncing", "starting_replay" and "stopping_replay". Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	0d51f6d833	rbd: check local image description for split-brain In some corner case like `re-player shutdown` the local image will not be in error state. It would be also worth considering `description` field to make sure about split-brain. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-18 11:22:03 +00:00
Shyamsundar Ranganathan	47dc9cf28d	rbd: Report errors when a resync maybe in progress Currently we return a !ready status if an image is not found when a replication resync is issued. We also return a !ready just post issuing a resync. The change is to ensure we return errors in these cases for the caller to retry the operation till we can determine we are actually resyncing, and then return !ready with nil errors. Part of addressing: https://github.com/csi-addons/volume-replication-operator/issues/101 Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>	2021-09-15 15:59:22 +00:00
Humble Chirammal	f0b8a3f626	rbd: use String() method of MirrorImageState in return error MirrorImageState (type C.rbd_mirror_image_state_t) has a string method which can be used while returning error in the replication controller. Previously, we were using int return in the error which is not the proper usage. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-09-03 16:02:53 +00:00
Niels de Vos	6d00b39886	cleanup: move log functions to new internal/util/log package Moving the log functions into its own internal/util/log package makes it possible to split out the humongous internal/util packages in further smaller pieces. This reduces the inter-dependencies between utility functions and components, preventing circular dependencies which are not allowed in Go. Updates: #852 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-08-26 09:34:05 +00:00
Madhu Rajanna	fc0d6f6b8b	rbd: return succuss if image is healthy secondary If the image is in secondary state and its up+replaying means its an healthy secondary and the image is primary somewhere in the remote cluster and the local image is getting replayed. Return success for the Disabling mirroring as we cannot disable the mirroring on the secondary state, when the image on the remote site gets disabled the image on all the remote (secondary) will get auto deleted. This helps in garbage collecting the volume replication kuberentes artifacts Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-16 17:38:25 +00:00
Madhu Rajanna	3c85219962	rbd: consider empty mirroring mode consider the empty mirroring mode when validating the snapshot interval and the scheduling time. Even if the mirroring Mode is not set validate the snapshot scheduling details as cephcsi sets the mirroring mode to default snapshot. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-09 11:05:05 +00:00
Madhu Rajanna	2782878ea2	rbd: log LastUpdate in UTC format This Commit converts the LastUpdate from int to the UTC format and logs it for better debugging. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-06 10:18:51 +00:00
Yati Padia	6691951453	rbd: use go-ceph for getImageMirroringStatus Currently, getImageMirroringStatus() is using RBD CLI. This commit converts RBD CLI to go-ceph API. Fixes: #2120 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-26 06:37:40 +00:00
Rakshith R	43f753760b	cleanup: resolve nlreturn linter issues nlreturn linter requires a new line before return and branch statements except when the return is alone inside a statement group (such as an if statement) to increase code clarity. This commit addresses such issues. Updates: #1586 Signed-off-by: Rakshith R <rar@redhat.com>	2021-07-22 06:05:01 +00:00
Yati Padia	f36d611ef9	cleanup: resolves gofumpt issues of internal codes This PR runs gofumpt for internal folder. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-14 19:50:56 +00:00
Yati Padia	f210d5758b	cleanup: spell check getImageMirroingStatus This commit corrects the spelling for getImageMirroingStatus() -> getImageMirroringStatus Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-14 07:32:01 +00:00
Yati Padia	ffab37f44f	cleanup: resolves gocritic linter issues This commit resolves gocritic linter errors. Updates: #2250 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-08 05:19:26 +00:00
Madhu Rajanna	0837c05be0	rbd: set scheduling interval on snapshot mirrored image Mirror-snapshots can also be automatically created on a periodic basis if mirror-snapshot schedules are defined. The mirror-snapshot can be scheduled globally, per-pool, or per-image levels. Multiple mirror-snapshot schedules can be defined at any level. To create a mirror-snapshot schedule with rbd, specify the mirror snapshot schedule add command along with an optional pool or image name; interval; and optional start time: The interval can be specified in days, hours, or minutes using d, h, m suffix respectively. The optional start-time can be specified using the ISO 8601 time format. For example: ``` $ rbd --cluster site-a mirror snapshot schedule add --pool image-pool --image image1 24h 14:00:00-05:00 ``` Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-07-06 14:41:48 +00:00
Humble Chirammal	8f82a30c21	internal: reformat long lines in internal/rbd package to 120 chars We have many declarations and invocations..etc with long lines which are very difficult to follow while doing code reading. This address the issues in below files, and restrict the line length to 120 chars. -internal/rbd/rbd_attach.go -internal/rbd/rbd_journal.go -internal/rbd/rbd_util.go -internal/rbd/replicationcontrollerserver.go -internal/rbd/snapshot.go Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-06-28 14:43:49 +00:00
Yati Padia	6bfdf2feb0	cleanup: gocyclo being unused for linter This commit addresses the following issue: 'nolint:gocyclo // complexity needs to be reduced.' is unused for linter "gocyclo" (nolintlint) Updates:#2025 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-06-15 02:54:16 +00:00
Madhu Rajanna	07a916b84d	rbd: mark image ready when image state is up+unknown To recover from split brain (up+error) state the image need to be demoted and requested for resync on site-a and then the image on site-b should gets demoted.The volume should be marked to ready=true when the image state on both the clusters are up+unknown because during the last snapshot syncing the data gets copied first and then image state on the site-a changes to up+unknown. If the image state on both the sites are up+unknown consider that complete data is synced as the last snapshot gets exchanged between the clusters. * create 10 GB of file and validate the data after resync * Do Failover when the site-a goes down * Force promote the image and write data in GiB * Once the site-a comes back, Demote the image and issue resync * Demote the image on site-b * The status will get reflected on the other site when the last snapshot sync happens * The image will go to up+unknown state. and complete data will be copied to site a * Promote the image on site-a and use it ```bash csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006: global_id: e7f9ec55-06ab-46cb-a1ae-784be75ed96d state: up+unknown description: remote image demoted service: a on minicluster1 last_update: 2021-04-28 07:11:56 peer_sites: name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph state: up+unknown description: remote image demoted last_update: 2021-04-28 07:11:41 ``` * Do Failover when the site-a goes down * Force promote the image on site-b and write data in GiB * Demote the image on site-b * Once the site-a comes back, Demote the image on site-a * The images on the both site will go to split brain state ```bash csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006: global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf state: up+error description: split-brain service: a on minicluster2 last_update: 2021-04-28 07:25:41 peer_sites: name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph state: up+error description: split-brain last_update: 2021-04-28 07:25:26 ``` * Issue resync * The images cannot be resynced because when we issue resync on site a the image on site-b was in demoted state * To recover from this state (promote and then demote the image on site-b after sometime) ```bash csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006: global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf state: up+unknown description: remote image demoted service: a on minicluster1 last_update: 2021-04-28 07:32:56 peer_sites: name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph state: up+unknown description: remote image demoted last_update: 2021-04-28 07:32:41 ``` * Once the data is copied we can see that the image state is moved to up+unknown on both sites * Promote the image on site-a and use it Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-05-05 13:38:29 +00:00
Madhu Rajanna	cfc88c9910	rbd: discard up+unknown state in ResyncVolume incase if the image is promoted and demoted the image state will be set to up+unknown if the image on the remote cluster is still in demoted state. when user changes the state from primary to secondary and still the image is in demoted (secondary) state in the remote cluster. the image state on both the cluster will be on unknown state. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-15 17:13:06 +05:30
Madhu Rajanna	d7838defcf	rbd: return FailedPrecondition error message In case of the DR the image on the primary site cannot be demoted as the cluster is down, during failover the image need to be force promoted. RBD returns `Device or resource busy` error message if the image cannot be promoted for above reason. Return FailedPrecondition so that replication operator can send request to force promote the image. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-06 14:12:41 +00:00
Madhu Rajanna	403532c9a6	rbd: use force from PromoteVolume Request instead of fetching the force option from the parameters. Use the Force field available in the PromoteVolume Request. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-06 14:12:41 +00:00
Madhu Rajanna	385a751b8e	rebase: rename kube-storage to csi-addons as the org github.com/kube-storage is renamed to github.com/csi-addons as the name kube-storage was more generic. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-06 10:59:58 +00:00
Madhu Rajanna	448be70682	rbd: early check for disabled,disabling in DisableVolumeReplication added early check for disabling and disabled image mirroring state in DisableVolumeReplication Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-05 08:53:40 +00:00
Madhu Rajanna	fb3f7fe202	rbd: remove todo for image not found Incase of resync the image will get deleted, gets recreated and its a a time consuming operation. It makes sense to return aborted error instead of not found as we have omap data only the image is missing in rbd pool. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-05 08:53:40 +00:00
Madhu Rajanna	95387c3b5e	rbd: check for peer site status Do resync if the image is in unknow or in error state. Check for the current image state for up+stopped or up+replaying and also all peer site status should be un up+stopped to confirm that resyncing is done and image can be promoted and used. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-05 08:53:40 +00:00
Madhu Rajanna	c822ad460d	rbd: add a check for image mirror disabling state the rbd mirror state can be in enabled,disabled or disabling state. If the mirroring is not disabled yet and still in disabling state. we need to check for it and return abort error message if the mirroring is still getting disabled. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-05 08:53:40 +00:00
Madhu Rajanna	aaf6b571b8	rbd: Add ReplicationServer struct for replication operations added ReplicationServer struct for the replication related operation it also embed the ControllerServer which already implements the helper functions like locking/unlocking etc. removed getVolumeFromID and cleanup functions for better code readability and easy maintaince. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-04-05 08:53:40 +00:00
Madhu Rajanna	6e941539b5	rbd: implement volume replication spec implemented the volume replication spec for the rbd mirroring. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-03-16 13:06:44 +00:00

45 Commits