ceph-csi

mirror of https://github.com/ceph/ceph-csi.git synced 2025-06-14 18:53:35 +00:00

Author	SHA1	Message	Date
Madhu Rajanna	2c66dfc3e4	e2e: retry running kubectl on known errors By using retryKubectl helper function, a retry will be done, and the known error messages will be skipped. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-06 08:03:18 +00:00
Madhu Rajanna	2071c535fa	e2e: pass variadic argument to kubectl helper function this provides caller ability to pass the arguments like ignore-not-found=true etc when executing the kubectl commands. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-06 08:03:18 +00:00
Madhu Rajanna	9f0af30735	e2e: add retryKubectlArgs helper for kubectl retry added helper function retryKubectlArgs to perform action if its a known error. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-06 08:03:18 +00:00
Madhu Rajanna	dd9fabf747	e2e: add isAlreadyExistsCLIError to check known error added isAlreadyExistsCLIError to check for known error. if error is already exists we are considering it as a success. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-06 08:03:18 +00:00
Madhu Rajanna	d321663872	deploy: add template changes for mapping added template changes for the clusterID and poolID,fsID mapping details for the pod templates. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-05 16:07:51 +00:00
Madhu Rajanna	92ad2ceec9	rbd: read clusterID and PoolID from mapping Whenever Ceph-CSI receives a CSI/Replication request it will first decode the volumeHandle and try to get the required OMAP details if it is not able to retrieve, receives a `Not Found` error message and Ceph-CSI will check for the clusterID mapping. If the old volumeID `0001-00013-site1-storage-0000000000000001 -b0285c97-a0ce-11eb-8c66-0242ac110002` contains the `site1-storage` as the clusterID, now Ceph-CSI will look for the corresponding clusterID `site2-storage` from the above configmap. If the clusterID mapping is found now Ceph-CSI will look for the poolID mapping ie mapping between `1` and `2`. Example:- pool with name exists on both the clusters with different ID's Replicapool with ID `1` on site1 and Replicapool with ID `2` on site2. After getting the required mapping Ceph-CSI has the required information to get more details from the rados OMAP. If we have multiple clusterID mapping it will loop through all the mapping and checks the corresponding pool to get the OMAP data. If the clusterID mapping does not exist Ceph-CSI will return an `Not Found` error message to the caller. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-05 16:07:51 +00:00
Madhu Rajanna	ac11d71e19	util: add helper function to read clusterID mapping added helper function to read the clusterID mapping from the mounted file. The clusterID mapping contains below mappings * ClusterID mappings (to cluster to which we are failingover and from which cluster failover happened) * RBD PoolID mapping of between the clusters. * CephFS FscID mapping between the clusters. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-05 16:07:51 +00:00
Madhu Rajanna	fce5a181d0	doc: change FsID to FscID for cephfs updated the filesystem identifier from FsId to FscID. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-05 16:07:51 +00:00
Yug Gupta	080f7538c0	helm: update cephfs provisioner updateStrategy Update ceph-csi-cephfs.provisioner updatestrategy to allow maxUnavailable pods at a time to be 50% Signed-off-by: Yug Gupta <yuggupta27@gmail.com>	2021-08-05 14:04:16 +00:00
Yug Gupta	ea088d40be	helm: update rbd provisioner updateStrategy Update ceph-csi-rbd.provisioner updatestrategy to allow maxUnavailable pods at a time to be 50% Signed-off-by: Yug Gupta <yuggupta27@gmail.com>	2021-08-05 14:04:16 +00:00
Madhu Rajanna	3805c29f36	build: update commitlint to use latest tag updaing the commitlint to the latest, so each time latest release can be installed by default. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-05 14:50:12 +05:30
Madhu Rajanna	0b6322afda	ci: update mergify for commitlint updated commitlint mergify rules to consider the commitlint status to merge the PR. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-05 14:50:12 +05:30
Madhu Rajanna	38ef32a496	ci: trailer-exists to verify sign-off This commit uses trailer-exists instead of signed-off-by to verify the sign-off-by message. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com> Suggested-by: Ade Attwood	2021-08-05 14:50:12 +05:30
Yug Gupta	1dc032e554	doc: update comments in voljournal Update spell errors and comments in voljournal.go Signed-off-by: Yug Gupta <yuggupta27@gmail.com>	2021-08-05 08:11:15 +00:00
Niels de Vos	4859f2dfdb	util: allow configuring VAULT_AUTH_MOUNT_PATH for Vault Tenant SA KMS The VAULT_AUTH_MOUNT_PATH is a Vault configuration parameter that allows a user to set a non default path for the Kubernetes ServiceAccount integration. This can already be configured for the Vault KMS, and is now added to the Vault Tenant SA KMS as well. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-08-05 06:02:57 +00:00
Niels de Vos	f2d5c2e0df	util: add vaultAuthNamespace option for Vault KMS The new `vaultAuthNamespace` configuration parameter can be set to the Vault Namespace where the authentication is setup in the service. Some Hashicorp Vault deployments use sub-namespaces for their users/tenants, with a 'root' namespace where the authentication is configured. This requires passing of different Vault namespaces for different operations. Example: - the Kubernetes Auth mechanism is configured for in the Vault Namespace called 'devops' - a user/tenant has a sub-namespace called 'devops/website' where the encryption passphrases can be placed in the key-value store The configuration for this, then looks like: vaultAuthNamespace: devops vaultNamespace: devops/homepage Note that Vault Namespaces are a feature of the Hashicorp Vault Enterprise product, and not part of the Open Source version. This prevents adding e2e tests that validate the Vault Namespace configuration. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-08-04 18:20:45 +00:00
Niels de Vos	83167e2ac5	util: correct error message when connecting to Vault fails Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-08-04 18:20:45 +00:00
Alexandre Lossent	5cba04c470	cephfs: support selinux mount options - mount host's /etc/selinux in node plugins - process mount options in all code paths for cephfs volume options Signed-off-by: Alexandre Lossent <alexandre.lossent@cern.ch>	2021-08-04 12:59:34 +00:00
Niels de Vos	72d56cb8db	e2e: use original namespace for retrying resize check expandPVCSize() uses the namespace of the PVC that was checked. In case the .Get() call fails, the PVC will not have its namespace set, and subsequent tries will fail with errors like: Error getting pvc in namespace: '': etcdserver: request timed out waiting for PVC (9 seconds elapsed) Error getting pvc in namespace: '': an empty namespace may not be set when a resource name is provided By using the original namespace of the PVC stored in a separate variable as is done with the name of the PVC, this problem should not occur anymore. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-08-04 08:08:24 +00:00
Madhu Rajanna	b1e86ee01c	ci: disable commitlint mergify rule currently PR merging is blocked due to commitlint issue. disabling commitlint or the release branches now. more details at https://github.com/ceph/ceph-csi/pull/2342 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-04 09:48:45 +05:30
Niels de Vos	a7ff868dae	e2e: retry getting the Services for Ceph MON on failures In case listing the Kubernetes Services fails, the following error is returned immediately: failed to create configmap with error failed to list services: etcdserver: request timed out Wrapping the listing of the Services in a PollImmediate() routine, adds a retry in case of common temporary issues. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-08-03 18:44:03 +00:00
Madhu Rajanna	5fc9c3a046	doc: add design doc for clusterid poolid mapping added design doc to handle volumeID mapping in case of the failover in the Disaster Recovery. update #2118 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-08-03 13:45:58 +00:00
Niels de Vos	e0ac70f8fb	e2e: use official CentOS container location registry.centos.org is not officially maintained by the CentOS infrastructure team. The container images on quay.io are the official once and we should use those instead. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-08-03 12:19:46 +00:00
Artur Troian	16ec97d8f7	util: getCgroupPidsFile produces striped path when extra : present This commit uses `string.SplitN` instead of `string.Split`. The path for pids.max has extra `:` symbols in it due to which getCgroupPidsFile() splits the string into 5 tokens instead of 3 leading to loss of part of the path. As a result, the below error is reported: `Failed to get the PID limit, can not reconfigure: open /sys/fs/cgroup/pids/system.slice/containerd.service/ kubepods-besteffort-pod183b9d14_aed1_4b66_a696_da0c738bc012.slice/pids.max: no such file or directory` SplitN takes an argument n and splits the string accordingly which helps us to get the desired file path. Fixes: #2337 Co-authored-by: Yati Padia <ypadia@redhat.com> Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-08-03 06:03:10 +00:00
rtsp	af1f50ba04	deploy: rbd kubernetes manifests add ability to deploy ceph-csi-rbd on non-default namespace Signed-off-by: rtsp <git@rtsp.us>	2021-07-31 03:09:14 +00:00
Prasanna Kumar Kalever	c9cd8d7a37	e2e: sync data from rbd-nbd mount Until we have a real fix, just to avoid occasionally file system entering into read-only on nodeplugin restart, lets sync data from the application pod. Updates: #2204 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-30 15:39:48 +00:00
Madhu Rajanna	fe947eccce	build: install specific commitlint version commitlint 13.1.0 is causing issues when PR is backported from devel branch to release branch https://github.com/ceph/ceph-csi/pull/2332#issuecomment-888325775 Lets revert back to commitlint 12.1.4 where we have not seen any issue with backports to release branch. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-07-30 07:06:22 +00:00
Niels de Vos	d3beaeb014	e2e: retry deploying CephFS components on failure There are reports where CephFS deploying failed with etcdserver timeouts: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -' INFO: rc: 1 FAIL: failed to create CephFS provisioner rbac with error error running /usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -: Command stdout: role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created stderr: Error from server: error when creating "STDIN": etcdserver: request timed out Error from server: error when creating "STDIN": etcdserver: request timed out Error from server: error when creating "STDIN": etcdserver: request timed out error: exit status 1 By using retryKubectlInput() helper function, a retry will be done, and the failure should not be fatal any longer. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-29 12:35:52 +00:00
Prasanna Kumar Kalever	d2def71944	doc: update the upgrade documentation to reflect 3.4.0 changes Mainly removed rbd-nbd mounter specified at the pre-upgrade considerations affecting the restarts. Also updated the 3.3 tags to 3.4 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-28 11:52:06 +00:00
Niels de Vos	ce9e54e5bd	ci: add Mergify backport rules for release-v3.4 The new `backport-to-release-v3.4` label can be added to PRs and Mergify will create a backport once the PR for the devel branch has been merged. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-28 12:53:58 +05:30
Prasanna Kumar Kalever	52799da09d	doc: add design doc for volume healer Closes: #667 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-28 11:54:59 +05:30
Prasanna Kumar Kalever	ebe4e1f944	ci: ignore spell check for design proposal images To avoid failures triggered by checking SVG image formats. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-28 11:54:59 +05:30
Prasanna Kumar Kalever	068e44bdb1	cleanup: move rbd-mirror image to a new directory Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-28 11:54:59 +05:30
Madhu Rajanna	080b251850	e2e: validate images in trash for rados namespace added validation check to verify stale images in trash for the rados namespace testing. Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-07-28 03:48:33 +00:00
Madhu Rajanna	8f185bf7b2	rbd: use rados namespace for manager command Currently we have a bug that we are not using rados namespace when adding ceph manager command to remove the image from the trash. This commit adds the missing rados namespace when adding ceph manager task. without fix the image will be moved to trash and no task will be added to remove from the trash. it will become ceph responsibility to remove the image from trash when it will cleanup the trash. workaroud: manually purge the trash Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>	2021-07-28 03:48:33 +00:00
Yug Gupta	d14c0afe28	doc: Add documentation for DR Add documenation for Disaster Recovery which steps to Failover and Failback in case of a planned migration or a Disaster. Signed-off-by: Yug Gupta <yuggupta27@gmail.com>	2021-07-27 11:43:01 +00:00
Niels de Vos	ec6703ed58	rbd: rename encryption metadata keys to enable mirroring RBD image metadata keys that start with '.rbd' are expected to be internal to RBD itself and are not mirrored to remote sites. Renaming the keys (dropping the '.' prefix) and using the new MigrateMetadata() function now makes the keys available on remote sites too. Closes: #2219 Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-26 11:49:56 +00:00
Niels de Vos	607129171d	rbd: move image metadata key migration to its own function The new MigrateMetadata() function can be used to get the metadata of an image with a deprecated and new key. Renaming metadata keys can be done easily this way. A default value will be set in the image metadata when it is missing completely. But if the deprecated key was set, the data is stored under the new key and the deprecated key is removed. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-26 11:49:56 +00:00
Yati Padia	6691951453	rbd: use go-ceph for getImageMirroringStatus Currently, getImageMirroringStatus() is using RBD CLI. This commit converts RBD CLI to go-ceph API. Fixes: #2120 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-26 06:37:40 +00:00
Niels de Vos	4e6d9be826	ci: fix yamllint error in generated golangci.yml file When running 'make containerized-test' the following error gets reported: yamllint -s -d '{extends: default, rules: {line-length: {allow-non-breakable-inline-mappings: true}},ignore: charts//templates/.yaml}' ./scripts/golangci.yml ./scripts/golangci.yml 179:81 error line too long (84 > 80 characters) (line-length) The golangci.yml.in is used to generate golangci.yml, addressing the line-length there resolves the issue. Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-26 04:05:50 +00:00
Niels de Vos	e75d308b9c	e2e: isRetryableAPIError() should match any etcdserver timeout framework.RunKubectl() returns an error that does not end with "etcdserver: request timed out", but contains the text somewhere in the middle: error running /usr/bin/kubectl --server=https://192.168.39.57:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-a44ec4b4 create -f -: Command stdout: stderr: Error from server: error when creating "STDIN": etcdserver: request timed out error: exit status 1 isRetryableAPIError() should return `true` for this case as well, so instead of using HasSuffix(), we'll use Contains(). Signed-off-by: Niels de Vos <ndevos@redhat.com>	2021-07-23 12:20:16 +00:00
Prasanna Kumar Kalever	75dda7ac0d	e2e: add test for expansion of encrypted volumes Also adds a test case to validate the default encryption type Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Prasanna Kumar Kalever	526ff95f10	rbd: add support to expand encrypted volume Previously in ControllerExpandVolume() we had a check for encrypted volumes and we use to fail for all expand requests on an encrypted volume. Also for Block VolumeMode PVCs NodeExpandVolume used to be ignored/skipped. With these changes, we add support for the expansion of encrypted volumes. Also for raw Block VolumeMode PVCs with Encryption we call NodeExpandVolume. That said, With LUKS1, cryptsetup utility doesn't prompt for a passphrase on resizing the crypto mapper device. This is because LUKS1 devices don't use kernel keyring for volume keys. Whereas, LUKS2 devices use kernel keyring for volume key by default, i.e. cryptsetup utility asks for a passphrase if it detects volume key was previously passed to dm-crypt via kernel keyring service, we are overriding the default by --disable-keyring option during cryptsetup open command. So that at the time of crypto mapper device resize we will not be prompted for any passphrase. Fixes: #1469 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Prasanna Kumar Kalever	4fa05cb3a1	util: add helper functions for resize of encrypted volume such as: ResizeEncryptedVolume() and LuksResize() Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Prasanna Kumar Kalever	572f39d656	util: fix log level in OpenEncryptedVolume() Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Prasanna Kumar Kalever	812003eb45	util: fix bug in DeviceEncryptionStatus() With Luks1 device: $ cryptsetup status /dev/mapper/crypto-rbd0 /dev/mapper/crypto-rbd0 is active and is in use. type: LUKS1 cipher: aes-xts-plain64 keysize: 512 bits key location: dm-crypt device: /dev/rbd0 sector size: 512 offset: 4096 sectors size: 4190208 sectors mode: read/write With Luks2 device: $ cryptsetup status /dev/mapper/crypto-rbd0 /dev/mapper/crypto-rbd0 is active and is in use. type: LUKS2 cipher: aes-xts-plain64 keysize: 512 bits key location: dm-crypt device: /dev/rbd0 sector size: 512 offset: 32768 sectors size: 4161536 sectors mode: read/write This could lead to failures with unmap in the NodeUnstageVolume path for the encrypted volumes. Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>	2021-07-23 10:00:23 +00:00
Yati Padia	1ae2afe208	cleanup: modifies the error caused due to merged PRs This commit modifies the error of godot, cyclop, paralleltest linter caused due to merged PRs. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	4e890e9daf	ci: disable gci and wrapcheck linter This commit disables wrapcheck and gci linters. Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	172b66f73f	cleanup: resolves cyclop linter issue this commit adds `// nolint:cyclop` for the fucntions whose complexity is above 20 Updates: #1586 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00
Yati Padia	e85c0eedc4	ci: set max-complexity of cyclop as 20 This commit sets the value of max- complexity of cyclop linter as 20 Signed-off-by: Yati Padia <ypadia@redhat.com>	2021-07-22 18:15:48 +00:00

... 19 20 21 22 23 ...

3291 Commits