Commit Graph

2903 Commits

Author SHA1 Message Date
Rakshith R
7f6b73e71f e2e: log imageList in validateRBDImageCount for better debugging
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-09 07:28:43 +00:00
Rakshith R
9d57717222 e2e: add test cases for pvc-pvcClone chain with depth 2
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-09 07:28:43 +00:00
Rakshith R
9321b4bce4 e2e: add test cases for snapshot-restore chain with depth 2
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-09 07:28:43 +00:00
Rakshith R
859d696279 cleanup: refractor checkCloneImage to reducing nesting if
This commit refractors checkCloneImage function to
address nestif linter issue.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-09 07:28:43 +00:00
Madhu Rajanna
a5a8952716 rbd: fix clone problem
This commit fixes a bug in checkCloneImage() which was caused
by checking cloned image before checking on temp-clone image snap
in a subsequent request which lead to stale images. This was solved
by checking temp-clone image snap and flattening temp-clone if
needed.
This commit also fixes comparison bug in flattenCloneImage().

Signed-off-by: Rakshith R <rar@redhat.com>
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-09 07:28:43 +00:00
Madhu Rajanna
916c97b4a8 rbd: copy creds when copying the connection
rbd flatten functions is a CLI call and it expects
the creds as the input and copying of creds is
required when we generate the temp clone image.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-09 07:28:43 +00:00
Rakshith R
08728b631b rbd: fix vol.VolID in cloneFromSnapshot()
Volume generated from snap using genrateVolFromSnap
already copies volume ID correctly, therefore removing
`vol.VolID = rbdVol.VolID` which wrongly copies parent
Volume ID instead leading to error from copyEncryption()
on parent and clone volume ID being equal.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-09 07:28:43 +00:00
Niels de Vos
bb60173a98 e2e: add verifyKeyDestroyed() for validating vaultDestroyKeys
The kmsConfig type in the e2e suite has been enhanced with two functions
that make it possible to validate the destruction of deleted keys.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-06 12:19:18 +00:00
Niels de Vos
b5d2321d57 cleanup: use vaultDefaultCAVerify to set default value
Golang-ci complains about the following:

    internal/util/vault_tokens.go:99:20: string `true` has 4 occurrences, but such constant `vaultDefaultDestroyKeys` already exists (goconst)
    	v.VaultCAVerify = "true"
    	                  ^

This occurence of "true" can be replaced by vaultDefaultCAVerify so
address the warning.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-06 12:19:18 +00:00
Niels de Vos
f584db41e6 util: add vaultDestroyKeys option to destroy Vault kv-v2 secrets
Hashicorp Vault does not completely remove the secrets in a kv-v2
backend when the keys are deleted. The metadata of the keys will be
kept, and it is possible to recover the contents of the keys afterwards.

With the new `vaultDestroyKeys` configuration parameter, this behaviour
can now be selected. By default the parameter will be set to `true`,
indicating that the keys and contents should completely be destroyed.
Setting it to any other value will make it possible to recover the
deleted keys.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-06 12:19:18 +00:00
Niels de Vos
d7bcb42481 rebase: update libopenstorage/secrets
libopenstorage has added a new feature that makes it possible to destroy
the contents of a key/value in the Hashicorp Vault kv-v2 secrets backend.

See-also: https://github.com/libopenstorage/secrets/pull/55
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-06 12:19:18 +00:00
Madhu Rajanna
2782878ea2 rbd: log LastUpdate in UTC format
This Commit converts the LastUpdate
from int to the UTC format and logs
it for better debugging.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-06 10:18:51 +00:00
Madhu Rajanna
2c66dfc3e4 e2e: retry running kubectl on known errors
By using retryKubectl helper function,
a retry will be done, and the known error
messages will be skipped.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-06 08:03:18 +00:00
Madhu Rajanna
2071c535fa e2e: pass variadic argument to kubectl helper function
this provides caller ability to pass the arguments
like ignore-not-found=true etc when executing
the kubectl commands.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-06 08:03:18 +00:00
Madhu Rajanna
9f0af30735 e2e: add retryKubectlArgs helper for kubectl retry
added helper function retryKubectlArgs to perform
action if its a known error.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-06 08:03:18 +00:00
Madhu Rajanna
dd9fabf747 e2e: add isAlreadyExistsCLIError to check known error
added isAlreadyExistsCLIError to check for known error.
if error is already exists we are considering it
as a success.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-06 08:03:18 +00:00
Madhu Rajanna
d321663872 deploy: add template changes for mapping
added template changes for the clusterID and
poolID,fsID mapping details for the pod templates.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-05 16:07:51 +00:00
Madhu Rajanna
92ad2ceec9 rbd: read clusterID and PoolID from mapping
Whenever Ceph-CSI receives a CSI/Replication
request it will first decode the
volumeHandle and try to get the required
OMAP details if it is not able to
retrieve, receives a `Not Found` error
message and Ceph-CSI will check for the
clusterID mapping. If the old volumeID
`0001-00013-site1-storage-0000000000000001
-b0285c97-a0ce-11eb-8c66-0242ac110002`
contains the `site1-storage` as the clusterID,
now Ceph-CSI will look for the corresponding
clusterID `site2-storage` from the above configmap.
If the clusterID mapping is found now Ceph-CSI
will look for the poolID mapping ie mapping between
`1` and `2`. Example:- pool with name exists on
both the clusters with different ID's Replicapool
with ID `1` on site1 and Replicapool with ID `2`
on site2. After getting the required mapping Ceph-CSI
has the required information to get more details
from the rados OMAP. If we have multiple clusterID mapping
it will loop through all the mapping and checks the
corresponding pool to get the OMAP data. If the clusterID
mapping does not exist Ceph-CSI will return an `Not Found`
error message to the caller.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-05 16:07:51 +00:00
Madhu Rajanna
ac11d71e19 util: add helper function to read clusterID mapping
added helper function to read the clusterID mapping
from the mounted file.

The clusterID mapping contains below mappings
* ClusterID mappings (to cluster to which we are failingover
and from which cluster failover happened)
* RBD PoolID mapping of between the clusters.
* CephFS FscID mapping between the clusters.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-05 16:07:51 +00:00
Madhu Rajanna
fce5a181d0 doc: change FsID to FscID for cephfs
updated the filesystem identifier from
FsId to FscID.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-05 16:07:51 +00:00
Yug Gupta
080f7538c0 helm: update cephfs provisioner updateStrategy
Update ceph-csi-cephfs.provisioner updatestrategy
to allow maxUnavailable pods at a time to be 50%

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2021-08-05 14:04:16 +00:00
Yug Gupta
ea088d40be helm: update rbd provisioner updateStrategy
Update ceph-csi-rbd.provisioner updatestrategy
to allow maxUnavailable pods at a time to be 50%

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2021-08-05 14:04:16 +00:00
Madhu Rajanna
3805c29f36 build: update commitlint to use latest tag
updaing the commitlint to the latest, so
each time latest release can be installed by
default.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-05 14:50:12 +05:30
Madhu Rajanna
0b6322afda ci: update mergify for commitlint
updated commitlint mergify rules to
consider the commitlint status to
merge the PR.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-05 14:50:12 +05:30
Madhu Rajanna
38ef32a496 ci: trailer-exists to verify sign-off
This commit uses trailer-exists instead
of signed-off-by to verify the sign-off-by
message.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Suggested-by: Ade Attwood
2021-08-05 14:50:12 +05:30
Yug Gupta
1dc032e554 doc: update comments in voljournal
Update spell errors and comments in
voljournal.go

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2021-08-05 08:11:15 +00:00
Niels de Vos
4859f2dfdb util: allow configuring VAULT_AUTH_MOUNT_PATH for Vault Tenant SA KMS
The VAULT_AUTH_MOUNT_PATH is a Vault configuration parameter that allows
a user to set a non default path for the Kubernetes ServiceAccount
integration. This can already be configured for the Vault KMS, and is
now added to the Vault Tenant SA KMS as well.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-05 06:02:57 +00:00
Niels de Vos
f2d5c2e0df util: add vaultAuthNamespace option for Vault KMS
The new `vaultAuthNamespace` configuration parameter can be set to the
Vault Namespace where the authentication is setup in the service. Some
Hashicorp Vault deployments use sub-namespaces for their users/tenants,
with a 'root' namespace where the authentication is configured. This
requires passing of different Vault namespaces for different operations.

Example:
 - the Kubernetes Auth mechanism is configured for in the Vault
   Namespace called 'devops'
 - a user/tenant has a sub-namespace called 'devops/website' where the
   encryption passphrases can be placed in the key-value store

The configuration for this, then looks like:

    vaultAuthNamespace: devops
    vaultNamespace: devops/homepage

Note that Vault Namespaces are a feature of the Hashicorp Vault
Enterprise product, and not part of the Open Source version. This
prevents adding e2e tests that validate the Vault Namespace
configuration.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-04 18:20:45 +00:00
Niels de Vos
83167e2ac5 util: correct error message when connecting to Vault fails
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-04 18:20:45 +00:00
Alexandre Lossent
5cba04c470 cephfs: support selinux mount options
- mount host's /etc/selinux in node plugins
- process mount options in all code paths for cephfs volume options

Signed-off-by: Alexandre Lossent <alexandre.lossent@cern.ch>
2021-08-04 12:59:34 +00:00
Niels de Vos
72d56cb8db e2e: use original namespace for retrying resize check
expandPVCSize() uses the namespace of the PVC that was checked. In case
the .Get() call fails, the PVC will not have its namespace set, and
subsequent tries will fail with errors like:

    Error getting pvc in namespace: '': etcdserver: request timed out
    waiting for PVC  (9 seconds elapsed)
    Error getting pvc in namespace: '': an empty namespace may not be set when a resource name is provided

By using the original namespace of the PVC stored in a separate variable
as is done with the name of the PVC, this problem should not occur
anymore.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-04 08:08:24 +00:00
Madhu Rajanna
b1e86ee01c ci: disable commitlint mergify rule
currently PR merging is blocked due to
commitlint issue. disabling commitlint
or the release branches now. more details at
https://github.com/ceph/ceph-csi/pull/2342

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-04 09:48:45 +05:30
Niels de Vos
a7ff868dae e2e: retry getting the Services for Ceph MON on failures
In case listing the Kubernetes Services fails, the following error is
returned immediately:

    failed to create configmap with error failed to list services: etcdserver: request timed out

Wrapping the listing of the Services in a PollImmediate() routine, adds
a retry in case of common temporary issues.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-03 18:44:03 +00:00
Madhu Rajanna
5fc9c3a046 doc: add design doc for clusterid poolid mapping
added design doc to handle volumeID mapping in case
of the failover in the Disaster Recovery.

update #2118

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-03 13:45:58 +00:00
Niels de Vos
e0ac70f8fb e2e: use official CentOS container location
registry.centos.org is not officially maintained by the CentOS
infrastructure team. The container images on quay.io are the official
once and we should use those instead.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-03 12:19:46 +00:00
Artur Troian
16ec97d8f7 util: getCgroupPidsFile produces striped path when extra : present
This commit uses `string.SplitN` instead of `string.Split`.
The path for pids.max has extra `:` symbols in it due to which
getCgroupPidsFile() splits the string into 5 tokens instead of
3 leading to loss of part of the path.
As a result, the below error is reported:
`Failed to get the PID limit, can not reconfigure: open
/sys/fs/cgroup/pids/system.slice/containerd.service/
kubepods-besteffort-pod183b9d14_aed1_4b66_a696_da0c738bc012.slice/pids.max:
no such file or directory`
SplitN takes an argument n and splits the string
accordingly which helps us to get the desired
file path.

Fixes: #2337

Co-authored-by: Yati Padia <ypadia@redhat.com>
Signed-off-by: Yati Padia <ypadia@redhat.com>
2021-08-03 06:03:10 +00:00
rtsp
af1f50ba04 deploy: rbd kubernetes manifests
add ability to deploy ceph-csi-rbd on non-default namespace

Signed-off-by: rtsp <git@rtsp.us>
2021-07-31 03:09:14 +00:00
Prasanna Kumar Kalever
c9cd8d7a37 e2e: sync data from rbd-nbd mount
Until we have a real fix, just to avoid occasionally file system entering
into read-only on nodeplugin restart, lets sync data from the application
pod.

Updates: #2204

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-07-30 15:39:48 +00:00
Madhu Rajanna
fe947eccce build: install specific commitlint version
commitlint 13.1.0 is causing issues when
PR is backported from devel branch to release
branch
https://github.com/ceph/ceph-csi/pull/2332#issuecomment-888325775
Lets revert back to commitlint 12.1.4 where we have
not seen any issue with backports to release
branch.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-07-30 07:06:22 +00:00
Niels de Vos
d3beaeb014 e2e: retry deploying CephFS components on failure
There are reports where CephFS deploying failed with etcdserver
timeouts:

    INFO: Running '/usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -'
    INFO: rc: 1
    FAIL: failed to create CephFS provisioner rbac with error error running /usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -:
    Command stdout:
    role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
    rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created

    stderr:
    Error from server: error when creating "STDIN": etcdserver: request timed out
    Error from server: error when creating "STDIN": etcdserver: request timed out
    Error from server: error when creating "STDIN": etcdserver: request timed out

    error:
    exit status 1

By using retryKubectlInput() helper function, a retry will be done, and
the failure should not be fatal any longer.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-07-29 12:35:52 +00:00
Prasanna Kumar Kalever
d2def71944 doc: update the upgrade documentation to reflect 3.4.0 changes
Mainly removed rbd-nbd mounter specified at the pre-upgrade
considerations affecting the restarts.

Also updated the 3.3 tags to 3.4

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-07-28 11:52:06 +00:00
Niels de Vos
ce9e54e5bd ci: add Mergify backport rules for release-v3.4
The new `backport-to-release-v3.4` label can be added to PRs and Mergify
will create a backport once the PR for the devel branch has been merged.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-07-28 12:53:58 +05:30
Prasanna Kumar Kalever
52799da09d doc: add design doc for volume healer
Closes: #667

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-07-28 11:54:59 +05:30
Prasanna Kumar Kalever
ebe4e1f944 ci: ignore spell check for design proposal images
To avoid failures triggered by checking SVG image formats.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-07-28 11:54:59 +05:30
Prasanna Kumar Kalever
068e44bdb1 cleanup: move rbd-mirror image to a new directory
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-07-28 11:54:59 +05:30
Madhu Rajanna
080b251850 e2e: validate images in trash for rados namespace
added validation check to verify stale images in trash
for the rados namespace testing.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-07-28 03:48:33 +00:00
Madhu Rajanna
8f185bf7b2 rbd: use rados namespace for manager command
Currently we have a bug that we are not using rados
namespace when adding ceph manager command to
remove the image from the trash. This commit
adds the missing rados namespace when adding
ceph manager task.

without fix the image will be moved to trash
and no task will be added to remove from the
trash. it will become ceph responsibility to
remove the image from trash when it will cleanup
the trash.

workaroud: manually purge the trash

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-07-28 03:48:33 +00:00
Yug Gupta
d14c0afe28 doc: Add documentation for DR
Add documenation for Disaster Recovery
which steps to Failover and Failback in case
of a planned migration or a Disaster.

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2021-07-27 11:43:01 +00:00
Niels de Vos
ec6703ed58 rbd: rename encryption metadata keys to enable mirroring
RBD image metadata keys that start with '.rbd' are expected to be
internal to RBD itself and are not mirrored to remote sites. Renaming
the keys (dropping the '.' prefix) and using the new MigrateMetadata()
function now makes the keys available on remote sites too.

Closes: #2219
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-07-26 11:49:56 +00:00
Niels de Vos
607129171d rbd: move image metadata key migration to its own function
The new MigrateMetadata() function can be used to get the metadata of an
image with a deprecated and new key. Renaming metadata keys can be done
easily this way.

A default value will be set in the image metadata when it is missing
completely. But if the deprecated key was set, the data is stored under
the new key and the deprecated key is removed.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-07-26 11:49:56 +00:00