ceph-csi

mirror of https://github.com/ceph/ceph-csi.git synced 2025-04-11 18:13:00 +00:00

Author	SHA1	Message	Date
Prasanna Kumar Kalever	bdcf3273b5	rbd: provide a way to supply mounter specific mapOptions from sc Uses the below schema to supply mounter specific map/unmapOptions to the nodeplugin based on the discussion we all had at https://github.com/ceph/ceph-csi/pull/2636 This should specifically be really helpful with the `tryOthermonters` set to true, i.e with fallback mechanism settings turned ON. mapOption: "kbrd:v1,v2,v3;nbd:v1,v2,v3" - By omitting `krbd:` or `nbd:`, the option(s) apply to rbdDefaultMounter which is krbd. - A user can _override_ the options for a mounter by specifying `krbd:` or `nbd:`. mapOption: "v1,v2,v3;nbd:v1,v2,v3" is effectively the same as the 1st example. - Sections are split by `;`. - If users want to specify common options for both `krbd` and `nbd`, they should mention them twice. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-11-23 08:54:37 +00:00
Shyamsundar Ranganathan	d1c21eece9	rbd: Update sequence of operations on dummy mirror image The dummy mirror image needs to be disabled and then reenabled for mirroring, to ensure a newly promoted primary is now starting to schedule snapshots. Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>	2021-11-19 09:38:59 +05:30
Madhu Rajanna	517ad8c644	rbd: use dummy image to workaround rbd scheduling bug currently we have a bug in rbd mirror scheduling module. After doing failover and failback the scheduling is not getting updated and the mirroring snapshots are not getting created periodically as per the scheduling interval. This PR workarounds this one by doing below operations * Create a dummy (unique) image per cluster and this image should be easily identified. * During Promote operation on any image enable the mirroring on the dummy image. when we enable the mirroring on the dummy image the pool will get updated and the scheduling will be reconfigured. * During Demote operation on any image disable the mirroring on the dummy image. the disable need to be done to enable the mirroring again when we get the promote request to make the image as primary * When the DR is no more needed, this image need to be manually cleanup as for now as we dont want to add a check in the existing DeleteVolume code path for delete dummy image as it impact the performance of the DeleteVolume workflow. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-19 09:38:59 +05:30
Madhu Rajanna	d05fc1e8e5	util: add helper to get the cluster ID added helper function to get the cluster ID. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-19 09:38:59 +05:30
Madhu Rajanna	e4e0f397a6	rbd: run schedule during promote operation Moved to add scheduling to the promote operation as scheduling need to be added when the image is promoted and this is the correct method of adding the scheduling to make the scheduling take place. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-19 09:38:59 +05:30
Madhu Rajanna	7bbd2ea284	rbd: use small case of error message the error message should not start with the capital letter changing the case as per the standard. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-18 10:44:12 +00:00
Madhu Rajanna	51998a5f4a	cleanup: log the image name and pool name instead of logging the volumeID and the pool name. log the poolname and image name for better debugging. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-18 10:44:12 +00:00
Madhu Rajanna	0f0cda49a7	rbd: log stdError for cryptosetup command If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: #2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-11-18 02:17:15 +00:00
Niels de Vos	7e22180125	rbd: call undoStagingTransaction() when NodeStageVolume() fails On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Updates: #2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-11-17 23:58:00 +00:00
Prasanna Kumar Kalever	e6fa392df1	rbd: fix mapOptions passing with rbd-nbd mounter This was a regression introduced by: https://github.com/ceph/ceph-csi/pull/2556 Fixes: #2610 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-11-16 10:12:46 +00:00
Prasanna Kumar Kalever	50e9dfa5c5	cleanup: fix log level This log line is seen frequently in the logs and its better to be at Warning loglevel rather than Error based on its severity E1109 08:30:45.612395 38328 util.go:247] kernel 4.19.202 does not support required features Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-11-10 10:54:29 +00:00
Prasanna Kumar Kalever	3686b6da8b	rbd: utilize cookie support from rbd for nbd Problem: On remap/attach of device (i.e. nodeplugin restart), there is no way for rbd-nbd to defend if the backend storage is matching with the initial backend storage. Say, if an initial map request for backend "pool1/image1" got mapped to /dev/nbd0 and the userspace process is terminated (on nodeplugin restart). A next remap/attach (nodeplugin start) request within reattach-timeout is allowed to use /dev/nbd0 for a different backend "pool1/image2" For example, an operation like below could be dangerous: $ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4" $ sudo pkill -15 rbd-nbd <-- nodeplugin terminate $ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image /dev/nbd0 $ sudo blkid /dev/nbd0 /dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs" Solution: rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify between the device and the backend, so that when a remap/attach request is made, rbd-nbd can compare and avoid such dangerous operations. With the provided solution, as part of the initial map request, backend cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so that on a remap/attach request rbd-nbd will check and validate if the backend per device cookie matches with the initial map backend with the help of cookie. At Ceph-csi we use VOLID as device cookie, which will be unique, we pass the VOLID as cookie at map and use the same at the time of attach, that way rbd-nbd can identify backends and their matching devices. Requires: https://github.com/ceph/ceph/pull/41323 https://lkml.org/lkml/2021/4/29/274 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-11-04 03:20:59 +00:00
Prasanna Kumar Kalever	793b22cf27	rbd: check for nbd cookie support Change checkRbdNbdTools() to setRbdNbdToolFeatures() Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-11-04 03:20:59 +00:00
Prasanna Kumar Kalever	9a3170bf77	rbd: provide a way to disable the auto fallback to nbd mounter This change allows the user to choose not to fallback to NBD mounter when some ImageFeatures are absent with krbd driver, rather just fail the NodeStage call. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-11-01 08:17:36 +00:00
Prasanna Kumar Kalever	bfc24f6f12	cleanup: generalize the parseBool function Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-11-01 08:17:36 +00:00
Prasanna Kumar Kalever	84ec797dda	rbd: detect krbd features in runtime and fallback to nbd Currently, we recognize and warn for the provided image features based on our prior intelligence at ceph-csi (i.e based on supportedFeatures map and validateImageFeatures) at image/PV creation time. It might be very much possible that the cluster is heterogeneous i.e. the PV creation and application container might both be on different nodes with different kernel versions (krbd driver versions). This PR adds a mechanism to check for the supported krbd features during mount time, if the krbd driver doesn't have the specified image feature then it will fall back to rbd-nbd mounter. Fixes: #478 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-11-01 08:17:36 +00:00
Niels de Vos	c852f487a5	util: set defaults for Vault config before converting When using UPPER_CASE formatting for the HashiCorp Vault KMS configuration, a missing `VAULT_DESTROY_KEYS` will cause the option to be set to "false". The default for the option is intended for be "true". This is a difference in behaviour between the `vaultDestroyKeys` and `VAULT_DESTROY_KEYS` options. Both should use a default of "true" when the configuration does not set the option explicitly. By setting the default options in the `standardVault` struct before unmarshalling the configuration in it, the default values will be retained for the missing configuration options. Reported-by: Rachael George <rgeorge@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-10-28 14:41:53 +00:00
Humble Chirammal	6aec858cba	rbd: parse migration secret and set fields for nodestage operations this commit make use of the migration request secret parsing and set the required fields for further nodestage operations Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-10-27 18:35:00 +00:00
Humble Chirammal	5621f2cfca	rbd: split the parsing and deletion logic to its own functions. parseAndDeleteMigratedVolume() prviously clubbed the logic of parsing of migration volume handle and then continued with the deletion of the volume. however this commit split this logic into two, ie parsing has been done in parseMigrationVolID() and DeleteMigratedVolume() deletes the backend volume. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-10-27 18:35:00 +00:00
Humble Chirammal	ff0911fb6a	rbd: add unittests for IsMigrationSecret and ParseAndSetSecretMapFromMigSecret This commit adds unit tests for newly introduced migration specific functions. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-10-27 18:35:00 +00:00
Humble Chirammal	b49bf4b987	rbd: parse migration secret and set it for controller server operations This commit adds a couple of helper functions to parse the migration request secret and set it for further csi driver operations. More details: The intree secret has a data field called "key" which is the base64 admin secret key. The ceph CSI driver currently expect the secret to contain data field "UserKey" for the equivalant. The CSI driver also expect the "UserID" field which is not available in the in-tree secret by deafult. This missing userID will be filled (if the username differ than 'admin') in the migration secret as 'adminId' field in the migration request, this commit adds the logic to parse this migration secret as below: "key" field value will be picked up from the migraion secret to "UserKey" field. "adminId" field value will be picked up from the migration secret to "UserID" field if `adminId` field is nil or not set, `UserID` field will be filled with default value ie `admin`.The above logic get activated only when the secret is a migration secret, otherwise skipped to the normal workflow as we have today. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-10-27 18:35:00 +00:00
Niels de Vos	b132696e54	rbd: note that thick-provisioning is deprecated Thick-provisioning was introduced to make accounting of assigned space for volumes easier. When thick-provisioned volumes are the only consumer of the Ceph cluster, this works fine. However, it is unlikely that this is the case. Instead, accounting of the requested (thin-provisioned) size of volumes is much more practical as different types of volumes can be tracked. OpenShift already provides cluster-wide quotas, which can combine accounting of requested volumes by grouping different StorageClasses. In addition to the difficult practise of allowing only thick-provisioned RBD backed volumes, the performance makes thick-provisioning troublesome. As volumes need to be completely allocated, data needs to be written to the volume. This can take a long time, depending on the size of the volume. Provisioning, cloning and snapshotting becomes very much noticeable, and because of the additional time consumption, more prone to failures. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-10-27 06:54:07 +00:00
Madhu Rajanna	0838845c6a	cleanup: remove FIXME from ResyncVolume as the complexity of ResyncVolume is reduced removing the FIXME which is not valid anymore. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	2017b8c621	rbd: log mirror daemon state for replication log the mirror deamon state in the local and remote cluster for better debugging. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	7472338334	rbd: remove unwanted const for comparing the image states use the states defined in the go-ceph avoid creating of the deplicate const in cephcsi. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	b92a6f5ccb	rbd: log the remote site details during resync logging the remote site details during resyncing for better debugging. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Madhu Rajanna	1fd2f28fee	rbd: check local image state for resyncing below are the local states of the mirrored image "unknown" -> If the image is in an error state means data is completely synced "error" -> If the image is in an error state means it needs resync "syncing" "starting_replay" "replaying" "stopping_replay" "stopped" If the resync is successfully started which means the image will be in "replaying" state. we can consider "replaying" state to report resync succesfully going on state. we are discarding the intermediate states like "syncing", "starting_replay" and "stopping_replay". Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-26 12:00:36 +00:00
Rakshith R	12cd05a408	rbd: add EnsureImageCleanup to snapshot deletion Signed-off-by: Rakshith R <rar@redhat.com>	2021-10-20 18:25:31 +00:00
Rakshith R	1849076aab	rbd: add EnsureImageCleanup to ensure image cleanup from trash After moving moving image to trash, if `trash remove` step fails, then external-provisioner will issue subsequent requests, in which image will be absent in pool( will be in trash) and omap cleanup will be done with stale image left in trash with no `trash remove` step on it. To avoid this scenario list trash images and find corresponding id for given image name and add a task to flatten when we encounter a ErrImageNotFound. Fixes: #1728 Signed-off-by: Rakshith R <rar@redhat.com>	2021-10-20 18:25:31 +00:00
Niels de Vos	6d3e25f069	util: NodeGetVolumeStatsResponse.Usage may not contain negative values Following the CSI specification, values that are included in the VolumeUsage MUST NOT be negative. However, CephFS seems to return -1 for the number of inodes that are available. Instead of returning a negative value, set it to 0 so that it will not get included in the encoded JSON response. Updates: #2579 See-also: `5b0d454015/spec.md (L2477-L2487)` Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-10-20 07:18:48 +00:00
Madhu Rajanna	0d51f6d833	rbd: check local image description for split-brain In some corner case like `re-player shutdown` the local image will not be in error state. It would be also worth considering `description` field to make sure about split-brain. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-18 11:22:03 +00:00
Humble Chirammal	c584fa20da	rbd: use clusterID from volumeContext at nodestage previously we were retriving clusterID using the monitors field in the volume context at node stage code path. however it is possible to retrieve or use clusterID directly from the volume context. This commit also remove the getClusterIDFromMigrationVolume() function which was used previously and its tests Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-10-11 10:06:30 +00:00
Humble Chirammal	4e61156dc4	rbd: change iteration variable name in the migration test to be specific we reuse or overload the variable name in the test execution at present. This commit use a different variable name as initialized in each run Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-10-11 10:06:30 +00:00
Madhu Rajanna	90ecd2d7e8	rbd: use go-ceph to get mirroring info use go-ceph api to get image mirroring info. closes #2558 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-07 08:02:06 +00:00
Madhu Rajanna	8ebc0659ab	rbd: perform resize of file system for static volume For static volume, the user will manually mounts already existing image as a volume to the application pods. As its a rbd Image, if the PVC is of type fileSystem the image will be mapped, formatted and mounted on the node, If the user resizes the image on the ceph cluster. User cannot not automatically resize the filesystem created on the rbd image. Even if deletes and recreates the kubernetes objects, the new size will not be visible on the node. With this changes During the NodeStageVolumeRequest the nodeplugin will check the size of the mapped rbd image on the node using the devicePath. and also the rbd image size on the ceph cluster. If the size is not matching it will do the file system resize on the node as part of the NodeStageVolumeRequest RPC call. The user need to do below operation to see new size * Resize the rbd image in ceph cluster * Scale down all the application pods using the static PVC. * Make sure no application pods which are using the static PVC is running on a node. * Scale up all the application pods. Validate the new size in application pod mounted volume. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-06 13:15:00 +00:00
Madhu Rajanna	fe9020260d	rbd: move flattening to helper function in NodeStage operation we are flattening the image to support mounting on the older clients. this commits moves it to a helper function to reduce code complexity. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-06 13:15:00 +00:00
Madhu Rajanna	cda2abca5d	rbd: use NewMetricsBlock to get size instead of lsblk command use NewMetricsBlock function from the kubernetes package to get the size. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-10-06 13:15:00 +00:00
Rakshith R	ded75eb099	rbd: copyEncryptionConfig for thickProvisioned snap restore too This commit adds bugfix to copy encryption passphrase for thick provisioned PVC restored from snapshot. Signed-off-by: Rakshith R <rar@redhat.com>	2021-10-05 07:46:57 +00:00
Rakshith R	59b7a26175	rbd: modify copyEncryptionConfig to accept copyOnlyPassphrase arg During PVC snapshot/clone both kms config and passphrase needs to copied, while for PVC restore only passphrase needs to be copied to dest rbdvol since destination storageclass may have another kms config. Signed-off-by: Rakshith R <rar@redhat.com>	2021-10-05 07:46:57 +00:00
Humble Chirammal	3c9d7e3cd5	rbd: detect migration volID in DeleteVolume() and delete rbd image This commit adds the logic to detect a passed in volumeID is a migrated volume ID and if yes, the driver connect to the backend cluster and clean/delete the image. The logic only applied if its a migration volume ID. The migration volume ID carry the information like mons, pool and image name which is good enough for the driver to identify and connect to the backend cluster for its operations. migration volID format: <mig>_mons-<monsHash>_image-<imageUID>_<poolHash> Details on the hash values: * MonsHash: this carry a hash value (md5sum) which will be acted as the `clusterID` for the operations in this context. * ImageUID: this is the unique UUID generated by kubernetes for the created volume. * PoolHash: this is an encoded string of pool name. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-10-04 16:06:31 +00:00
Madhu Rajanna	34a21cdbe3	cleanup: move mount functions to new pkg moved fuse and kernel mount functions to a new package. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-09-23 06:39:37 +00:00
Madhu Rajanna	b1ef842640	cleanup: move core functions to core pkg as we are refractoring the cephfs code, Moving all the core functions to a new folder /pkg called core. This will make things easier to implement. For now onwards all the core functionalities will be added to the core package. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-09-23 06:39:37 +00:00
Humble Chirammal	4804f47b18	e2e: Add e2e for rbd migration static pvc This commit adds e2e for rbd migration static PVCs Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-09-20 09:54:54 +00:00
Humble Chirammal	2e8e8f5e64	rbd: fill clusterID if its a migration nodestage request the migration nodestage request does not carry the 'clusterID' in it and only monitors are available with the volumeContext. The volume context flag 'migration=true' and 'static=true' flags allow us to fill 'clusterID' from the passed in monitors to the volume Context,so that rest of the static operations on nodestage can be proceeded as we do treat static volumes today. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-09-20 09:54:54 +00:00
Humble Chirammal	1f5963919f	util: get clusterID for the passed in mon string as part of migration support, the clusterID has to be fetched from passed in mon. Because the intree RBD storage class only got monitor and not `clusterID` parameter support. However, in CSI, SC has the `clusterID` parameter support but not mon. Due to that we have to fetch the clusterID from config file for the passed in mon and use it in our operations. This adds a helper function to retrieve clusterID from passed in mon string. Updates https://github.com/ceph/ceph-csi/issues/2509 Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-09-20 09:54:54 +00:00
Prasanna Kumar Kalever	c9cc36d8db	rbd: provide alternatives to preserve the ceph log files Currently, we delete the ceph client log file on unmap/detach. This patch provides additional alternatives for users who would like to persist the log files. Strategies: ----------- `remove`: delete log file on unmap/detach `compress`: compress the log file to gzip on unmap/detach `preserve`: preserve the log file in text format Note that the default strategy will be remove on unmap, and these options can be tweaked from the storage class Compression size details example: On Map: (with debug-rbd=20) --------- $ ls -lh -rw-r--r-- 1 root root 526K Sep 1 18:15 rbd-nbd-0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-d2e89c87-0b4d-11ec-8ea6-160f128e682d.log On unmap: --------- $ ls -lh -rw-r--r-- 1 root root 33K Sep 1 18:15 rbd-nbd-0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-d2e89c87-0b4d-11ec-8ea6-160f128e682d.gz Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-09-16 13:55:15 +00:00
Prasanna Kumar Kalever	10bbb049f7	cleanup: passing pointers to larger type Log: internal/rbd/rbd_attach.go:424:2: hugeParam: dArgs is heavy (88 bytes); consider passing it by pointer (gocritic) Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-09-16 13:55:15 +00:00
Prasanna Kumar Kalever	ad2c6d2851	util: add gzip helper function Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-09-16 13:55:15 +00:00
Shyamsundar Ranganathan	47dc9cf28d	rbd: Report errors when a resync maybe in progress Currently we return a !ready status if an image is not found when a replication resync is issued. We also return a !ready just post issuing a resync. The change is to ensure we return errors in these cases for the caller to retry the operation till we can determine we are actually resyncing, and then return !ready with nil errors. Part of addressing: https://github.com/csi-addons/volume-replication-operator/issues/101 Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>	2021-09-15 15:59:22 +00:00
Rakshith R	82d09d81cf	util: modify GetMonsAndClusterID() to take clusterID instead of options This commit: - modifies GetMonsAndClusterID() to take clusterID instead of options. - moves out validation of clusterID is set or not out of GetMonsAndClusterID(). - defines ErrClusterIDNotSet new error for reusability. - add GetClusterID() to obtain clusterID from options. Signed-off-by: Rakshith R <rar@redhat.com>	2021-09-14 08:39:57 +00:00

1 2 3 4 5 ...

726 Commits