ceph-csi

mirror of https://github.com/ceph/ceph-csi.git synced 2025-06-03 04:16:42 +00:00

Author	SHA1	Message	Date
Prasanna Kumar Kalever	c9cd8d7a37	e2e: sync data from rbd-nbd mount Until we have a real fix, just to avoid occasionally file system entering into read-only on nodeplugin restart, lets sync data from the application pod. Updates: #2204 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-30 15:39:48 +00:00
Madhu Rajanna	fe947eccce	build: install specific commitlint version commitlint 13.1.0 is causing issues when PR is backported from devel branch to release branch https://github.com/ceph/ceph-csi/pull/2332#issuecomment-888325775 Lets revert back to commitlint 12.1.4 where we have not seen any issue with backports to release branch. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-07-30 07:06:22 +00:00
Niels de Vos	d3beaeb014	e2e: retry deploying CephFS components on failure There are reports where CephFS deploying failed with etcdserver timeouts: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -' INFO: rc: 1 FAIL: failed to create CephFS provisioner rbac with error error running /usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -: Command stdout: role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created stderr: Error from server: error when creating "STDIN": etcdserver: request timed out Error from server: error when creating "STDIN": etcdserver: request timed out Error from server: error when creating "STDIN": etcdserver: request timed out error: exit status 1 By using retryKubectlInput() helper function, a retry will be done, and the failure should not be fatal any longer. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-29 12:35:52 +00:00
Prasanna Kumar Kalever	d2def71944	doc: update the upgrade documentation to reflect 3.4.0 changes Mainly removed rbd-nbd mounter specified at the pre-upgrade considerations affecting the restarts. Also updated the 3.3 tags to 3.4 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-28 11:52:06 +00:00
Niels de Vos	ce9e54e5bd	ci: add Mergify backport rules for release-v3.4 The new `backport-to-release-v3.4` label can be added to PRs and Mergify will create a backport once the PR for the devel branch has been merged. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-28 12:53:58 +05:30
Prasanna Kumar Kalever	52799da09d	doc: add design doc for volume healer Closes: #667 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-28 11:54:59 +05:30
Prasanna Kumar Kalever	ebe4e1f944	ci: ignore spell check for design proposal images To avoid failures triggered by checking SVG image formats. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-28 11:54:59 +05:30
Prasanna Kumar Kalever	068e44bdb1	cleanup: move rbd-mirror image to a new directory Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-28 11:54:59 +05:30
Madhu Rajanna	080b251850	e2e: validate images in trash for rados namespace added validation check to verify stale images in trash for the rados namespace testing. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-07-28 03:48:33 +00:00
Madhu Rajanna	8f185bf7b2	rbd: use rados namespace for manager command Currently we have a bug that we are not using rados namespace when adding ceph manager command to remove the image from the trash. This commit adds the missing rados namespace when adding ceph manager task. without fix the image will be moved to trash and no task will be added to remove from the trash. it will become ceph responsibility to remove the image from trash when it will cleanup the trash. workaroud: manually purge the trash Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-07-28 03:48:33 +00:00
Yug Gupta	d14c0afe28	doc: Add documentation for DR Add documenation for Disaster Recovery which steps to Failover and Failback in case of a planned migration or a Disaster. Signed-off-by: Yug Gupta <yuggupta27@gmail.com>	2021-07-27 11:43:01 +00:00
Niels de Vos	ec6703ed58	rbd: rename encryption metadata keys to enable mirroring RBD image metadata keys that start with '.rbd' are expected to be internal to RBD itself and are not mirrored to remote sites. Renaming the keys (dropping the '.' prefix) and using the new MigrateMetadata() function now makes the keys available on remote sites too. Closes: #2219 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-26 11:49:56 +00:00
Niels de Vos	607129171d	rbd: move image metadata key migration to its own function The new MigrateMetadata() function can be used to get the metadata of an image with a deprecated and new key. Renaming metadata keys can be done easily this way. A default value will be set in the image metadata when it is missing completely. But if the deprecated key was set, the data is stored under the new key and the deprecated key is removed. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-26 11:49:56 +00:00
Yati Padia	6691951453	rbd: use go-ceph for getImageMirroringStatus Currently, getImageMirroringStatus() is using RBD CLI. This commit converts RBD CLI to go-ceph API. Fixes: #2120 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-26 06:37:40 +00:00
Niels de Vos	4e6d9be826	ci: fix yamllint error in generated golangci.yml file When running 'make containerized-test' the following error gets reported: yamllint -s -d '{extends: default, rules: {line-length: {allow-non-breakable-inline-mappings: true}},ignore: charts//templates/.yaml}' ./scripts/golangci.yml ./scripts/golangci.yml 179:81 error line too long (84 > 80 characters) (line-length) The golangci.yml.in is used to generate golangci.yml, addressing the line-length there resolves the issue. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-26 04:05:50 +00:00
Niels de Vos	e75d308b9c	e2e: isRetryableAPIError() should match any etcdserver timeout framework.RunKubectl() returns an error that does not end with "etcdserver: request timed out", but contains the text somewhere in the middle: error running /usr/bin/kubectl --server=https://192.168.39.57:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-a44ec4b4 create -f -: Command stdout: stderr: Error from server: error when creating "STDIN": etcdserver: request timed out error: exit status 1 isRetryableAPIError() should return `true` for this case as well, so instead of using HasSuffix(), we'll use Contains(). Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-23 12:20:16 +00:00
Prasanna Kumar Kalever	75dda7ac0d	e2e: add test for expansion of encrypted volumes Also adds a test case to validate the default encryption type Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Prasanna Kumar Kalever	526ff95f10	rbd: add support to expand encrypted volume Previously in ControllerExpandVolume() we had a check for encrypted volumes and we use to fail for all expand requests on an encrypted volume. Also for Block VolumeMode PVCs NodeExpandVolume used to be ignored/skipped. With these changes, we add support for the expansion of encrypted volumes. Also for raw Block VolumeMode PVCs with Encryption we call NodeExpandVolume. That said, With LUKS1, cryptsetup utility doesn't prompt for a passphrase on resizing the crypto mapper device. This is because LUKS1 devices don't use kernel keyring for volume keys. Whereas, LUKS2 devices use kernel keyring for volume key by default, i.e. cryptsetup utility asks for a passphrase if it detects volume key was previously passed to dm-crypt via kernel keyring service, we are overriding the default by --disable-keyring option during cryptsetup open command. So that at the time of crypto mapper device resize we will not be prompted for any passphrase. Fixes: #1469 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Prasanna Kumar Kalever	4fa05cb3a1	util: add helper functions for resize of encrypted volume such as: ResizeEncryptedVolume() and LuksResize() Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Prasanna Kumar Kalever	572f39d656	util: fix log level in OpenEncryptedVolume() Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Prasanna Kumar Kalever	812003eb45	util: fix bug in DeviceEncryptionStatus() With Luks1 device: $ cryptsetup status /dev/mapper/crypto-rbd0 /dev/mapper/crypto-rbd0 is active and is in use. type: LUKS1 cipher: aes-xts-plain64 keysize: 512 bits key location: dm-crypt device: /dev/rbd0 sector size: 512 offset: 4096 sectors size: 4190208 sectors mode: read/write With Luks2 device: $ cryptsetup status /dev/mapper/crypto-rbd0 /dev/mapper/crypto-rbd0 is active and is in use. type: LUKS2 cipher: aes-xts-plain64 keysize: 512 bits key location: dm-crypt device: /dev/rbd0 sector size: 512 offset: 32768 sectors size: 4161536 sectors mode: read/write This could lead to failures with unmap in the NodeUnstageVolume path for the encrypted volumes. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Yati Padia	1ae2afe208	cleanup: modifies the error caused due to merged PRs This commit modifies the error of godot, cyclop, paralleltest linter caused due to merged PRs. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	4e890e9daf	ci: disable gci and wrapcheck linter This commit disables wrapcheck and gci linters. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	172b66f73f	cleanup: resolves cyclop linter issue this commit adds `// nolint:cyclop` for the fucntions whose complexity is above 20 Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	e85c0eedc4	ci: set max-complexity of cyclop as 20 This commit sets the value of max- complexity of cyclop linter as 20 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	45b40661e2	ci: disable gomoddirectives linter This commit disables gomoddirectives linter as it bans use of replace directive. Update: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	9d6ce7c5dd	ci: disable forbidigo linter This commit disables the forbidigo linter as this linter forbids the use of fmt.Printf but we need to use it in various part of our codebase. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	9414a76a86	ci: disable exhaustivestruct linter This commit disables the exhaustivestruct linter as it is meant to be used only for special cases. We don't need to enable this for our project. Fixes: #2224 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	c5bc3d38c4	ci: update static check tools This PR updates the static check tools to the latest version. Further needs to resolve all the errors after updating the version. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Humble Chirammal	abe6a6e5ac	util: remove deleteLock test as it is enforced by the controller Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-22 15:07:49 +00:00
Humble Chirammal	c42d4768ca	util: remove the deleteLock acquistion check for clone and snapshot At present while acquiring the deleteLock on the volume, we check for ongoing clone and snapshot creation operations on the same. Considering snapshot and clone controllers does not allow parent volume deletion on subjected operations, we can be free from this extra check. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-22 15:07:49 +00:00
Niels de Vos	82557e3f34	util: allow configuring VAULT_BACKEND for Vault connection It seems that the version of the key/value engine can not always be detected for Hashicorp Vault. In certain cases, it is required to configure the `VAULT_BACKEND` (or `vaultBackend`) option so that a successful connection to the service can be made. The `kv-v2` is the current default for development deployments of Hashicorp Vault (what we use for automated testing). Production deployments default to version 1 for now. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-22 13:02:47 +00:00
Thomas Kooi	75b9b9fe6d	cleanup: fix beta apiVersion for csidriver This change resolves a typo for installing the CSIDriver resource in Kubernetes clusters before 1.18, where the apiVersion is incorrect. See also: https://kubernetes-csi.github.io/docs/csi-driver-object.html [ndevos: replace v1betav1 in examples with v1beta1] Signed-off-by: Thomas Kooi <t.j.kooi@avisi.nl>	2021-07-22 09:12:44 +00:00
Rakshith R	43f753760b	cleanup: resolve nlreturn linter issues nlreturn linter requires a new line before return and branch statements except when the return is alone inside a statement group (such as an if statement) to increase code clarity. This commit addresses such issues. Updates: #1586 Signed-off-by: Rakshith R <rar@redhat.com>	2021-07-22 06:05:01 +00:00
Niels de Vos	5c016b4b94	e2e: retry on "connect: connection refused" errors Sometimes there are failures in the e2e suite when connecting to the etcdserver fails. The following error was caught: INFO: Error getting pvc "rbd-pvc" in namespace "rbd-1318": Get "https://192.168.39.222:8443/api/v1/namespaces/rbd-1318/persistentvolumeclaims/rbd-pvc": dial tcp 192.168.39.222:8443: connect: connection refused FAIL: failed to create PVC with error failed to get pvc: Get "https://192.168.39.222:8443/api/v1/namespaces/rbd-1318/persistentvolumeclaims/rbd-pvc": dial tcp 192.168.39.222:8443: connect: connection refused If etcdserver was only briefly unavailable, one or more retries might be sufficient to have the test pass. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-21 13:08:41 +00:00
Humble Chirammal	d85304c7c2	doc: lift clone feature support to Beta Since kubernetes 1.16 clone ( create a new volume from exisiing volume) functionality is at Beta state and kubernetes v1.18, the snapshot functionality has been lifted to GA. Ceph CSI drivers have been supporting this feature for last few releases and users are heavily using this feature since then. We also have good amount of e2e test case which cover volume creation from PVC backend or iow, PVC as a datasource. With that, this PR proposes of lifting this feature support to Beta with v3.4.0 version. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-21 06:57:10 +00:00
Humble Chirammal	69836d7c02	doc: lift snapshot creation/deletion to Beta Since kubernetes 1.17 snapshot functionality is at Beta state and external snapshotter 3.0.3. Since v4.0.0 of snapshotter controller and kubernetes v1.20, the snapshot functionality has been lifted to GA. Ceph CSI drivers have been supporting this feature for last few releases and users are heavily using this feature since then. We also have good amount of e2e test case which cover volume creation from snapshot backend or iow, snapshot as a datasource. With that, this PR proposes of lifting this feature support to Beta with v3.4.0 version. Updates# https://github.com/ceph/ceph-csi/issues/2199 Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-21 06:57:10 +00:00
Humble Chirammal	b66f40793f	doc: lift snapshot creation/deletion to Beta Since kubernetes 1.17 snapshot functionality is at Beta state and external snapshotter 3.0.3. Since v4.0.0 of snapshotter controller and kubernetes v1.20, the snapshot functionality has been lifted to GA. Ceph CSI drivers have been supporting this feature for last few releases and users are heavily using this feature since then. We also have good amount of e2e test case which cover volume creation from snapshot backend or iow, snapshot as a datasource. With that, this PR proposes of lifting this feature support to Beta with v3.4.0 version. Updates# https://github.com/ceph/ceph-csi/issues/2199 Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-21 06:57:10 +00:00
Humble Chirammal	8f3d8f4cfd	doc: explicitly mention volume/pv metrics feature we have volume PV metrics support available in our driver along with the grpc metrics support (EnableGRPCMetrics) we added in between for csi operations. The latter is getting deprecated and the current mention in the support matrix on metrics support confuse many. This PR explictly mention this support in the docs to volume/PV metrics This PR also add a seperate row for block mode PV metrics Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-20 16:03:04 +00:00
Yati Padia	7f5df7c940	cleanup: resolves gofumpt issues in e2e This commit resolves gofumpt issues in e2e folder. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-20 15:37:58 +00:00
Niels de Vos	c4372b8567	doc: describe Hashicorp Vault with a ServiceAccount per Tenant In addition to the single ServiceAccount KMS support for Hashicorp Vault, Ceph-CSI can now use a ServiceAccount per Tenant as well. This adds the user-documentation with references to the example deployment files. Closes: #2222 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-20 12:31:40 +00:00
Niels de Vos	841a53bc3d	e2e: retry kubectl commands in case deploying Vault fails Sometimes it happens that the deployment of Hashicorp Vault fails. Deployment is one of the 1st steps that are done when starting the e2e suite, and the Kubernetes cluster may still be a little overloaded while it is settling down. It should be possible to retry and succeed after a while. Fixes: #2288 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-19 16:12:18 +00:00
Niels de Vos	d5ea89e603	e2e: add retryKubectlInput() for retrying kubectl calls Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-19 16:12:18 +00:00
Yati Padia	3469dfc753	cleanup: resolve errorlint issues This commit resolves errorlint issues which checks for the code that will cause problems with the error wrapping scheme. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-19 13:31:29 +00:00
Yati Padia	bfda5fa57f	cleanup: resolve revive linter issue revive linter checks for var-declaration format. For example: "e2e/rbd_helper.go:441:36: var-declaration: should drop = nil from declaration of var noPVCValidation; it is the zero value (revive) var noPVCValidation validateFunc = nil" Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-19 08:39:32 +00:00
Humble Chirammal	bd947bbe31	util: remove deleteLock check while acquiring snapshot createLock snapshot controller make sure the pvc which is the source for the snapshot request wont get deleted while snapshot is getting created, so we dont need to check for any ongoing delete operation here on the volume. Subjected code path in snapshot controller: ``` pvc, err := ctrl.getClaimFromVolumeSnapshot(snapshot) . .. pvcClone.ObjectMeta.Finalizers = append(pvcClone.ObjectMeta.Finalizers, utils.PVCFinalizer) _, err = ctrl.client.CoreV1().PersistentVolumeClaims(pvcClone.Namespace).Update(..) ``` Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-17 10:23:13 +00:00
Prasanna Kumar Kalever	10fc639d68	ci: fix nolintlint warnings warnings from golangci-lint: e2e/pod.go:207:122: directive `//nolint:unparam,lll // cn can be used with different inputs later` is unused for linter unparam (nolintlint) func execCommandInContainer(f framework.Framework, c, ns, cn string, opt metav1.ListOptions) (string, string, error) { //nolint:unparam,lll // cn can be used with different inputs later e2e/pod.go:307:70: directive `//nolint:unparam // skipNotFound can be used with different inputs later` is unused for linter unparam (nolintlint) func deletePodWithLabel(label, ns string, skipNotFound bool) error { //nolint:unparam // skipNotFound can be used with different inputs later Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	fd3bf1750b	e2e: fix the testcases for rbd-nbd Now that the healer functionaity for mounter processes is available, lets start, using it. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	78f740d903	rbd: improve healer to run multiple NodeStageVolume req concurrently This will bring down the healer run time by a great factor. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	b6a88dd728	rbd: add volume healer Problem: ------- For rbd nbd userspace mounter backends, after a restart of the nodeplugin all the mounts will start seeing IO errors. This is because, for rbd-nbd backends there will be a userspace mount daemon running per volume, post restart of the nodeplugin pod, there is no way to restore the daemons back to life. Solution: -------- The volume healer is a one-time activity that is triggered at the startup time of the rbd nodeplugin. It navigates through the list of volume attachments on the node and acts accordingly. For now, it is limited to nbd type storage only, but it is flexible and can be extended in the future for other backend types as needed. From a few feets above: This solves a severe problem for nbd backed csi volumes. The healer while going through the list of volume attachments on the node, if finds the volume is in attached state and is of type nbd, then it will attempt to fix the rbd-nbd volumes by sending a NodeStageVolume request with the required volume attributes like secrets, device name, image attributes, and etc.. which will finally help start the required rbd-nbd daemons in the nodeplugin csi-rbdplugin container. This will allow reattaching the backend images with the right nbd device, thus allowing the applications to perform IO without any interruptions even after a nodeplugin restart. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00

1 2 3 4 5 ...

2316 Commits