ceph-csi

mirror of https://github.com/ceph/ceph-csi.git synced 2025-06-03 04:16:42 +00:00

Author	SHA1	Message	Date
Humble Chirammal	c42d4768ca	util: remove the deleteLock acquistion check for clone and snapshot At present while acquiring the deleteLock on the volume, we check for ongoing clone and snapshot creation operations on the same. Considering snapshot and clone controllers does not allow parent volume deletion on subjected operations, we can be free from this extra check. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-22 15:07:49 +00:00
Niels de Vos	82557e3f34	util: allow configuring VAULT_BACKEND for Vault connection It seems that the version of the key/value engine can not always be detected for Hashicorp Vault. In certain cases, it is required to configure the `VAULT_BACKEND` (or `vaultBackend`) option so that a successful connection to the service can be made. The `kv-v2` is the current default for development deployments of Hashicorp Vault (what we use for automated testing). Production deployments default to version 1 for now. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-22 13:02:47 +00:00
Thomas Kooi	75b9b9fe6d	cleanup: fix beta apiVersion for csidriver This change resolves a typo for installing the CSIDriver resource in Kubernetes clusters before 1.18, where the apiVersion is incorrect. See also: https://kubernetes-csi.github.io/docs/csi-driver-object.html [ndevos: replace v1betav1 in examples with v1beta1] Signed-off-by: Thomas Kooi <t.j.kooi@avisi.nl>	2021-07-22 09:12:44 +00:00
Rakshith R	43f753760b	cleanup: resolve nlreturn linter issues nlreturn linter requires a new line before return and branch statements except when the return is alone inside a statement group (such as an if statement) to increase code clarity. This commit addresses such issues. Updates: #1586 Signed-off-by: Rakshith R <rar@redhat.com>	2021-07-22 06:05:01 +00:00
Niels de Vos	5c016b4b94	e2e: retry on "connect: connection refused" errors Sometimes there are failures in the e2e suite when connecting to the etcdserver fails. The following error was caught: INFO: Error getting pvc "rbd-pvc" in namespace "rbd-1318": Get "https://192.168.39.222:8443/api/v1/namespaces/rbd-1318/persistentvolumeclaims/rbd-pvc": dial tcp 192.168.39.222:8443: connect: connection refused FAIL: failed to create PVC with error failed to get pvc: Get "https://192.168.39.222:8443/api/v1/namespaces/rbd-1318/persistentvolumeclaims/rbd-pvc": dial tcp 192.168.39.222:8443: connect: connection refused If etcdserver was only briefly unavailable, one or more retries might be sufficient to have the test pass. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-21 13:08:41 +00:00
Humble Chirammal	d85304c7c2	doc: lift clone feature support to Beta Since kubernetes 1.16 clone ( create a new volume from exisiing volume) functionality is at Beta state and kubernetes v1.18, the snapshot functionality has been lifted to GA. Ceph CSI drivers have been supporting this feature for last few releases and users are heavily using this feature since then. We also have good amount of e2e test case which cover volume creation from PVC backend or iow, PVC as a datasource. With that, this PR proposes of lifting this feature support to Beta with v3.4.0 version. Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-21 06:57:10 +00:00
Humble Chirammal	69836d7c02	doc: lift snapshot creation/deletion to Beta Since kubernetes 1.17 snapshot functionality is at Beta state and external snapshotter 3.0.3. Since v4.0.0 of snapshotter controller and kubernetes v1.20, the snapshot functionality has been lifted to GA. Ceph CSI drivers have been supporting this feature for last few releases and users are heavily using this feature since then. We also have good amount of e2e test case which cover volume creation from snapshot backend or iow, snapshot as a datasource. With that, this PR proposes of lifting this feature support to Beta with v3.4.0 version. Updates# https://github.com/ceph/ceph-csi/issues/2199 Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-21 06:57:10 +00:00
Humble Chirammal	b66f40793f	doc: lift snapshot creation/deletion to Beta Since kubernetes 1.17 snapshot functionality is at Beta state and external snapshotter 3.0.3. Since v4.0.0 of snapshotter controller and kubernetes v1.20, the snapshot functionality has been lifted to GA. Ceph CSI drivers have been supporting this feature for last few releases and users are heavily using this feature since then. We also have good amount of e2e test case which cover volume creation from snapshot backend or iow, snapshot as a datasource. With that, this PR proposes of lifting this feature support to Beta with v3.4.0 version. Updates# https://github.com/ceph/ceph-csi/issues/2199 Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-21 06:57:10 +00:00
Humble Chirammal	8f3d8f4cfd	doc: explicitly mention volume/pv metrics feature we have volume PV metrics support available in our driver along with the grpc metrics support (EnableGRPCMetrics) we added in between for csi operations. The latter is getting deprecated and the current mention in the support matrix on metrics support confuse many. This PR explictly mention this support in the docs to volume/PV metrics This PR also add a seperate row for block mode PV metrics Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-20 16:03:04 +00:00
Yati Padia	7f5df7c940	cleanup: resolves gofumpt issues in e2e This commit resolves gofumpt issues in e2e folder. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-20 15:37:58 +00:00
Niels de Vos	c4372b8567	doc: describe Hashicorp Vault with a ServiceAccount per Tenant In addition to the single ServiceAccount KMS support for Hashicorp Vault, Ceph-CSI can now use a ServiceAccount per Tenant as well. This adds the user-documentation with references to the example deployment files. Closes: #2222 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-20 12:31:40 +00:00
Niels de Vos	841a53bc3d	e2e: retry kubectl commands in case deploying Vault fails Sometimes it happens that the deployment of Hashicorp Vault fails. Deployment is one of the 1st steps that are done when starting the e2e suite, and the Kubernetes cluster may still be a little overloaded while it is settling down. It should be possible to retry and succeed after a while. Fixes: #2288 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-19 16:12:18 +00:00
Niels de Vos	d5ea89e603	e2e: add retryKubectlInput() for retrying kubectl calls Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-19 16:12:18 +00:00
Yati Padia	3469dfc753	cleanup: resolve errorlint issues This commit resolves errorlint issues which checks for the code that will cause problems with the error wrapping scheme. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-19 13:31:29 +00:00
Yati Padia	bfda5fa57f	cleanup: resolve revive linter issue revive linter checks for var-declaration format. For example: "e2e/rbd_helper.go:441:36: var-declaration: should drop = nil from declaration of var noPVCValidation; it is the zero value (revive) var noPVCValidation validateFunc = nil" Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-19 08:39:32 +00:00
Humble Chirammal	bd947bbe31	util: remove deleteLock check while acquiring snapshot createLock snapshot controller make sure the pvc which is the source for the snapshot request wont get deleted while snapshot is getting created, so we dont need to check for any ongoing delete operation here on the volume. Subjected code path in snapshot controller: ``` pvc, err := ctrl.getClaimFromVolumeSnapshot(snapshot) . .. pvcClone.ObjectMeta.Finalizers = append(pvcClone.ObjectMeta.Finalizers, utils.PVCFinalizer) _, err = ctrl.client.CoreV1().PersistentVolumeClaims(pvcClone.Namespace).Update(..) ``` Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-17 10:23:13 +00:00
Prasanna Kumar Kalever	10fc639d68	ci: fix nolintlint warnings warnings from golangci-lint: e2e/pod.go:207:122: directive `//nolint:unparam,lll // cn can be used with different inputs later` is unused for linter unparam (nolintlint) func execCommandInContainer(f framework.Framework, c, ns, cn string, opt metav1.ListOptions) (string, string, error) { //nolint:unparam,lll // cn can be used with different inputs later e2e/pod.go:307:70: directive `//nolint:unparam // skipNotFound can be used with different inputs later` is unused for linter unparam (nolintlint) func deletePodWithLabel(label, ns string, skipNotFound bool) error { //nolint:unparam // skipNotFound can be used with different inputs later Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	fd3bf1750b	e2e: fix the testcases for rbd-nbd Now that the healer functionaity for mounter processes is available, lets start, using it. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	78f740d903	rbd: improve healer to run multiple NodeStageVolume req concurrently This will bring down the healer run time by a great factor. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	b6a88dd728	rbd: add volume healer Problem: ------- For rbd nbd userspace mounter backends, after a restart of the nodeplugin all the mounts will start seeing IO errors. This is because, for rbd-nbd backends there will be a userspace mount daemon running per volume, post restart of the nodeplugin pod, there is no way to restore the daemons back to life. Solution: -------- The volume healer is a one-time activity that is triggered at the startup time of the rbd nodeplugin. It navigates through the list of volume attachments on the node and acts accordingly. For now, it is limited to nbd type storage only, but it is flexible and can be extended in the future for other backend types as needed. From a few feets above: This solves a severe problem for nbd backed csi volumes. The healer while going through the list of volume attachments on the node, if finds the volume is in attached state and is of type nbd, then it will attempt to fix the rbd-nbd volumes by sending a NodeStageVolume request with the required volume attributes like secrets, device name, image attributes, and etc.. which will finally help start the required rbd-nbd daemons in the nodeplugin csi-rbdplugin container. This will allow reattaching the backend images with the right nbd device, thus allowing the applications to perform IO without any interruptions even after a nodeplugin restart. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	6007fc9bfe	cleanup: move static volume check to helper function Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	10e4eee481	deploy: add few more cluster-roles for rbd nodeplugin Nodeplugin needs below cluster roles: persistentvolumes: get volumeattachments: list, get These additional permissions are needed by the volume healer. Volume healer aims at fixing the volume health issues at the very startup time of the nodeplugin. As part of its operations, volume healer has to run through the list of volume attachments and understand details about each persistentvolume. The later commits will use these additional cluster roles. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	874f6629fb	rbd: get default plugin path Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	6d24080851	rbd: update per volume metadata stash-file with devicePath As part of stage transaction if the mounter is of type nbd, then capture device path after a successful rbd-nbd map. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Prasanna Kumar Kalever	70998571aa	cleanup: change variable name from path to metaDataPath path is used by standard package. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-16 16:30:58 +00:00
Humble Chirammal	94c5c5e119	util: remove deleteLock while we acquire clone operation lock clone controller make sure there is no delete operation happens on the source PVC which has been referred as the datasource of clone PVC, we are safe to operate without looking at delete operation lock in this case. Subjected code in the controller: ... if claim.Spec.DataSource != nil && rc.clone { err = p.setCloneFinalizer(ctx, claim) ... } if !checkFinalizer(claim, pvcCloneFinalizer) { claim.Finalizers = append(claim.Finalizers, pvcCloneFinalizer) _, err := p.client.CoreV1().PersistentVolumeClaims(claim.Namespace).Update(..claim..) } Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-16 12:32:28 +00:00
Humble Chirammal	e088e8fd2e	cephfs: Get rid of locking at nodepublish Considering kubelet make sure the stage and publish operations are serialized, we dont need any extra locking in nodePublish Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-16 07:18:56 +00:00
Humble Chirammal	61bf49a4f5	rbd: Get rid of locking at nodePublish Considering kubelet make sure the stage and publish operations are serialized, we dont need any extra locking in nodePublish Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-16 07:18:56 +00:00
Humble Chirammal	ced3a0922f	cephfs: Get rid of locking at nodeUnpublish call Considering kubelet make sure the unstage and unpublish operations are serialized, we dont need any extra locking in nodeUnpublish Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-16 07:18:56 +00:00
Humble Chirammal	ef852cc93d	rbd: Get rid of locking at nodeUnpublish call Considering kubelet make sure the unstage and unpublish operations are serialized, we dont need any extra locking in nodeUnpublish Signed-off-by: Humble Chirammal <hchiramm@redhat.com>	2021-07-16 07:18:56 +00:00
Niels de Vos	276918db10	rebase: update minikube to v1.22.0 Minikube has bumped it's support for latest Kubernetes version to 1.22.0-beta.0. This might improve our CI jobs with Kubernetes 1.22 too. See-also: https://github.com/kubernetes/minikube/releases/tag/v1.22.0 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-15 11:39:33 +00:00
Yati Padia	f36d611ef9	cleanup: resolves gofumpt issues of internal codes This PR runs gofumpt for internal folder. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-14 19:50:56 +00:00
Yati Padia	696ee496fc	cleanup: resolves gofumpt for cmd This commit resolves gofumpt linter for cmd folder Updates: 1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-14 17:19:00 +00:00
Yati Padia	299979fc14	ci: add unit test for toError() This commit adds unit test for the func converting cephFSCloneState to error. Fixes: #2259 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-14 15:02:12 +00:00
Yati Padia	c66872c3c6	cleanup: ineffective assignment This commit resolves ineffective assignent of snap. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-14 12:39:17 +00:00
Niels de Vos	4d4a2a7814	e2e: prevent re-using empty pvc object When an error occurs, the pvc object is overwritten in the PollImmediate() loop. Re-using the pvc.Namespace results in error messages like Error getting pvc in namespace: '': an empty namespace may not be set when a resource name is provided and prevents the retry by PollImmediate() to never succeed. Storing the namespace in a local variable prevents this from happening. Reported-by: Rakshith R <rar@redhat.com> Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-14 10:18:51 +00:00
Niels de Vos	f7ae33c67c	e2e: only call error check functions when err != nil Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-14 10:18:51 +00:00
Niels de Vos	075a4087d7	e2e: mark "etcdserver: request timed out" errors as retryable There are regular CI failures where etcdserver times out. These errors seem not to get caught by any of the existing error comparing. Matching the error by string should prevent temporary etcdserver issues now too. Updates: #2218 Closes: #1969 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-14 10:18:51 +00:00
Yati Padia	f210d5758b	cleanup: spell check getImageMirroingStatus This commit corrects the spelling for getImageMirroingStatus() -> getImageMirroringStatus Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-14 07:32:01 +00:00
Niels de Vos	d941e5abac	util: make parseTenantConfig() usable for modular KMSs parseTenantConfig() only allowed configuring a defined set of options, and KMSs were not able to re-use the implementation. Now, the function parses the ConfigMap from the Tenants Namespace and returns a map with options that the KMS supports. The map that parseTenantConfig() returns can be inspected by the KMS, and applied to the vaultTenantConnection type by calling parseConfig(). Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-13 17:16:35 +00:00
Niels de Vos	96bb8bfd0e	e2e: add securityContext.runAsUser to vault-init-job Kubelet sometimes reports the following error: failed to "StartContainer" for "vault-init-job" with CreateContainerConfigError: container has runAsNonRoot and image will run as root Setting securityContext.runAsUser resolves this. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-13 17:16:35 +00:00
Niels de Vos	e3c7dea7d6	e2e: add test for Vault with ServiceAccount per Tenant Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-13 17:16:35 +00:00
Niels de Vos	b700fa43e6	doc: add example for Tenant ServiceAccount The ServiceAccount "ceph-csi-vault-sa" is expected to be placed in the Namespace "tenant" so that the provisioner and node-plugin fetch the ServiceAccount from a Namespace where Ceph-CSI is not deployed. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-13 17:16:35 +00:00
Niels de Vos	8662e01d2c	deploy: allow RBD components to get ServiceAccounts The provisioner and node-plugin have the capability to connect to Hashicorp Vault with a ServiceAccount from the Namespace where the PVC is created. This requires permissions to read the contents of the ServiceAccount from an other Namespace than where Ceph-CSI is deployed. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-13 17:16:35 +00:00
Niels de Vos	3d7d48a4aa	util: VaultTenantSA KMS implementation This new KMS uses a Kubernetes ServiceAccount from a Tenant (Namespace) to connect to Hashicorp Vault. The provisioner and node-plugin will check for the configured ServiceAccount and use the token that is located in one of the linked Secrets. Subsequently the Vault connection is configured to use the Kubernetes token from the Tenant. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-13 17:16:35 +00:00
Niels de Vos	6dc5bf2b29	util: split vaultTenantConnection from VaultTokensKMS This makes the Tenant configuration for Hashicorp Vault KMS connections more modular. Additional KMS implementations that use Hashicorp Vault with per-Tenant options can re-use the new vaultTenantConnection. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-13 17:16:35 +00:00
Niels de Vos	ed298341a6	doc: proposal for KMS with ServiceAccount per Tenant A new KMS that supports Hashicorp Vault with the Kubernetes Auth backend and ServiceAccounts per Tenant (Kubernetes Namespace). Updates: #2222 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-13 12:12:25 +00:00
Yati Padia	69c9e5ffb1	cleanup: resolve parallel test issue This commit resolves parallel test issues and also excludes internal/util/conn_pool_test.go as those test can't run in parallel. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-13 11:31:39 +00:00
Prasanna Kumar Kalever	8b3136e696	doc: add documentaion for rbd-nbd mounter Closes #2124 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-13 10:19:17 +00:00
Yati Padia	4a649fe17f	cleanup: resolve godot linter This commit resolves godot linter issue which says "Comment should end in a period (godot)". Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-13 06:50:03 +00:00

... 5 6 7 8 9 ...

2536 Commits