RBD supports creating rbd images with
object size, stripe unit and stripe count
to support striping. This PR adds the support
for the same.
More details about striping at
https://docs.ceph.com/en/quincy/man/8/rbd/#stripingfixes: #3124
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
at present we have the check for kube version 1.21 in the tests
which no longer required as it falls under supported kubernetes
versions with the driver.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commits removes
TODO: update nfs node-plugin that has kubernetes-csi/csi-driver-nfs#319
Since, the nfsplugin image is already updated to v4.0.0.
Signed-off-by: Rakshith R <rar@redhat.com>
2 omap objects are getting leaked in the e2e tests, this change is to
workaround them for now.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
validate omap count in every testcase right after
validateSubvolumeCount()
Fixes: #2834
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Validate that we:
* Unset the PVC metadata on the rbd image created for the snapshot
* snapshot metadata on CreateVolume from snapshot
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
On occasion the Pods have not been (re)started before they get listed.
This can result in an empty list. It can occur during RBD testing where
Pods are restarted before `uname` is executed. In case the Pods are not
available yet, the test will fail with the "podlist is empty" error.
By adding a retry when the list of Pods is empty, the tests should
become a little more stable.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Some of the deployment artifacts refer to others (like ServiceAccount in
a Deployment). If the dependencies are not available (yet), there will
be errors reported in the logs. By deploying the components in a more
correct order, fewer errors are reported, making the logs a little
easier to understand.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When testing NFS-provisioning on a cluster that has an NFS-provisioner
and node-plugins deployed with a different driver-name, it is very
useful to have a commandline option to change the name of the
provisioner that is placed in the StorageClass.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
NFS testing will automatically be enabled when CephFS is enabled. This
makes sure the NFS tests run in the CI where there are different jobs
for CephFS and RBD. With a dedicated testNFS variable, it is still
possible to only run the NFS tests, when both CephFS and RBD are
disabled.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The tests for the NFS-provisioner can be run by passing -deploy-nfs and
-test-nfs as parameters to the `go test` or `e2e.test` command.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
added getPersistentVolume helper function
to get the PV and also try if there is any API
error to improve the CI.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added getPersistentVolumeClaim helper function
to get the PVC and also try if there is any API
error to improve the CI.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The Ceph cluster-id is usually detected with `ceph fsid`. This is not
always correct, as the the Ceph cluster can also be configured by name.
If the -clusterid=... is passed, it will be used instead of trying to
detect it with `ceph fsid`.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are many locations where the cluster-id (`ceph fsid`) is obtained
from the Rook Toolbox. Instead of duplicating the code everywhere, use a
new helper function getClusterID().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
A new -filesystem=... option has been added so that the e2e tests can
run against environments that do not have a "myfs" CephFS filesystem.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
StorageClasses are cluster resources, not namespaced; there is no need
to log the namespace of a StorageClass.
When creating a StorageClass, NotFound is not an error that will be
returned, not need to check for it.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
On occasion the creation of the StorageClass can fail due to an
etcdserver timeout. If that happens, the creation can be attempted after
a delay.
This has already been done for CephFS StorageClasses, but was missed for
RBD.
See-also: ceph/ceph-csi@8a0377ef02
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Some parts of the Context() seem to get executed, even when BeforeEach()
did a Skip() for the test. By adding a return inside the Context(), the
tests should not get executed at all.
This was noticed in a failed test, where upgrade was running, eventhough
the job was executed as a nornal non-upgrade one.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit change the image registry URL for sidecars in the
deployment from `k8s.gcr.io` to `registry.k8s.io` as
the migration is happening from former to the latter. This commit
also correct the e2e readme for the change.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
As radosNamespace is more specific to
RBD not the general ceph configuration. Now
we introduced a new RBD section for RBD specific
options, Moving the radosNamespace to RBD section
and keeping the radosNamespace still under the
global ceph level configration for backward
compatibility.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Instead of patching the PV to update
the persistentVolumeReclaimPolicy and
the claimRef before deleting the PVC.
Patch PV persistentVolumeReclaimPolicy to Retain
to retain the PV after deleting the PVC.
Remove the claimRef on the PV after deleting
the PVC so that claim can be attached to a new PVC.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we no longer require the kubernetes validation for clone tests in
the e2e tests. This commit remove it for CephFS.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
we no longer require the kubernetes validation for clone tests in
the e2e tests. This commit remove it for RBD.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
On OpenShift it is not possible for the Rook toolbox to get the metrics
from Kubelet (without additional configuration). By passing
-is-openshift, the metrics are not checked, and the e2e suite does not
fail on that particular piece.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
A new -filesystem=... option has been added so that the e2e tests can
run against environments that do not have a "myfs" CephFS filesystem.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case the toolbox pod is not available, the error message lists that
no Pods are found, but there is no hint about the toolbox. By mentioning
the toolbox in the error message, it suggests a good place to start
troubleshooting the environment.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit correct the release version of upgrade tests from
unsupported 3.3.1 to supported version.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering snapshot controllers have been moved to GA since
kube version 1.20, we no longer need to have a mention of beta
version of the same in our deployment.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
* create a PVC and check PVC/PV metadata on RBD image
* create and delete a PVC, attach the old PV to a new PVC and check if
PVC metadata is updated on RBD image
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Makes the rbd images features in the storageclass
as optional so that default image features of librbd
can be used. and also kept the option to user
to specify the image features in the storageclass.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as deep-flatten is long supported in ceph and its
enabled by default in the librbd, providing an option
to enable it in cephcsi for the rbd images we are
creating.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
it might need sometime for the deployment to
get created, consider the NotFound as a valid
error and retry again.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
On occasion deploying CephFS components fail due to errors like these:
failed to delete provisioner rbac .../csi-provisioner-rbac.yaml
By using the deleteResource() helper, an retry is done in case of a
failure.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There have been errors while CephFS tests were running, like:
failed to create storageclass: etcdserver: request timed out
When retrying to create the StorageClass, the e2e tests are expected to
continue and (hopefully) succeed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The CentOS Stream 8 base container image does not have `ps` installed.
This causes CI jobs to fail, when checking for a restarted rbd-nbd
process.
Instead of using `ps`, the `pstree` command can be used. This will add
some ASCII-tree symbols in front of the command that is logged by the
e2e tests, but that is only used for manual reviewing and does not harm
the running test.
Fixes: #2850
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit removes the thick provisioning
code as thick provisioning is deprecated in
cephcsi 3.5.0.
fixes: #2795
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
cephfs data pool name is changed from filesystem-data0
to filesystem-replicated in Rook 1.8. updating
the cephcsi helper functions also to use new
pool names.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as ioutil.ReadFile is deprecated and
suggestion is to use os.ReadFile as
per https://pkg.go.dev/io/ioutil updating
the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit make recreateCSIRBDPods function to be a general one
so that it can be consumed by more clients.
Updates https://github.com/ceph/ceph-csi/issues/2509
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
added e2e for below cases
Normal PVC clone to a bigger
size PVC (without encryption)
* Filesystem pvc clone to a bigger size
* Block pvc clone to a bigger size
Encrypted PVC clone to a bigger
size PVC
* Filesystem pvc clone to a bigger size
* Block pvc clone to a bigger size
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added e2e for below cases
Normal PVC snapshot restore to a bigger
size PVC (without encryption)
* Filesystem pvc restore to a bigger size
* Block pvc restore to a bigger size
Encrypted PVC snapshot restore to a bigger
size PVC
* Filesystem pvc restore to a bigger size
* Block pvc restore to a bigger size
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit adjust existing migration e2e tests to a couple of tests
to cover the scenarios. The seperate filesystem and block tests have
been shrinked to single one and also introduced a couple of helper
functions to setup and teardown migraition specific secret,configmap
and sc. The static pv function has been renamed to a general name
while the tests were adjusted.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This `unparam` linter escape is no longer needed and CI is failing
if we keep there. This commit remove the same and make CI happy.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
adding e2e testcase to validate the workflow
of pvc creation and attaching to pod works for
new image features like fast-diff,obj-map,exclusive-lock
and layering.
fixes: #2695
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently, we are skipping the generic ephemeral
testing if the kubernetes version is less than
1.21 because of this one the who test suite is
getting skipped and e2e is marked as success
in 2 minutes. This commit runs the ephemeral
tests if the kube=>1.21+. If we do this, for
the lower version we can run other tests.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The e2e sometimes fail getting objects like PVCs from the Kubernetes API
server, and log the following error:
Error getting pvc "rbd-6940" in namespace "rbd-694": rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field
By checking the error message, and initiating a retry on this failure,
CI jobs should fail less regulary.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add tests for RWX and ROX accessModes for Block and FileSystem Mode
PVCs.
Fixes: #2262
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove with error presence from the logs and this commit
does that for e2e/snapshot.go.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove with error presence from the logs and this commit
does that for e2e/cephfs_helper.go.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove with error presence from the logs and this commit
does that for e2e/upgrade-rbd.go.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove with error presence from the logs and this commit
does that for e2e/upgrade-cephfs.go.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove with error presence from the logs and this commit
does that for e2e/rbd_helper.go.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove with error presence from the logs and this commit
does that for e2e/ceph_user.go.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove with error presence from the logs and this commit
does that for e2e/utils.go.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove `with error` presence from the logs and this commit
does that for cephfs tests.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
To make the error return consistent across e2e tests we have decided
to remove `with error` presence from the logs and this commit
does that for rbd tests.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds the validation of csi cephfs driver to work with
ephemeral volume support. With ephemeral volume support a user can
specify ephemeral volumes in its pod spec and tie the lifecycle
of the PVC with the POD.
An example POD spec also included in this commit.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds the validation of csi RBD driver to work with
ephemeral volume support. With ephemeral volume support a user can
specify ephemeral volumes in its pod spec and tie the lifecycle
of the PVC with the POD.
An example pod spec is also included in this commit.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering we are far out of these release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering we are far out of these release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering we are far out of these release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
considering we are far out of this release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
considering we are far out of this release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
considering we are far out of this release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
There have been occasional CI job failures due to "transport is closing"
errors. Adding this error to the isRetryableAPIError() function should
make sure to retry the request until the connection is restored.
Fixes: #2613
Signed-off-by: Niels de Vos <ndevos@redhat.com>
currently the mountType validation of the encrypted volume is done in
the application, we should rather validate this inside the nodeplugin
pod.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Currently, at "perform IO on rbd-nbd volume after nodeplugin restart"
test we are performing write on the rbd-nbd based mount after nodeplugin
restart. But due to a bug in NBD driver the writes are failing, please
note NBD zero cmd timeout handling is fixed with kernel >= 5.4 and hence
we should defend on writes based on kernel version to avoid unnecessary
CI failures.
For more information see
https://github.com/ceph/ceph-csi/issues/2204#issuecomment-930941047
updates: #2204
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
this commit create and make use of migration secret in the requests and
validate various csi operations
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This is to preserve the rbd-nbd logs post unmap, so that the CI can dump
the available logs from logdir.
Fixes: #2451
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
For static volume, the user will manually mounts
already existing image as a volume to the application
pods. As its a rbd Image, if the PVC is of type
fileSystem the image will be mapped, formatted
and mounted on the node,
If the user resizes the image on the ceph cluster.
User cannot not automatically resize the filesystem
created on the rbd image. Even if deletes and
recreates the kubernetes objects, the new size
will not be visible on the node.
With this changes During the NodeStageVolumeRequest
the nodeplugin will check the size of the mapped rbd
image on the node using the devicePath. and also
the rbd image size on the ceph cluster.
If the size is not matching it will do the file
system resize on the node as part of the
NodeStageVolumeRequest RPC call.
The user need to do below operation to see new size
* Resize the rbd image in ceph cluster
* Scale down all the application pods using the static
PVC.
* Make sure no application pods which are using the
static PVC is running on a node.
* Scale up all the application pods.
Validate the new size in application pod mounted
volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit add test for migration delete volID detection scenario
by passing a custom volID and with the entries in configmap changed
to simulate the situation. The staticPV function also changed its
accept the annotation map which make it more general usage.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
createCustomConfigmap helps to create a custom cluster entry in
the configmap, however this was coupled with subvolumegroup filling
in the cluster configuration. This commit helps to make it more
general and the subvolumegroup filling is controlled now with a flag
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
e2elog.Logf("waiting for kubectl (%s -f $q args %s) to finish", action, args)
changed to
e2elog.Logf("waiting for kubectl (%s -f args %s) to finish", action, args)
Signed-off-by: Rakshith R <rar@redhat.com>
We need
https://www.mail-archive.com/linux-block@vger.kernel.org/msg38060.html
inorder to use `--io-timeout=0`. This patch is part of kernel 5.4
Since minikube doesn't have a v5.4 kernel yet, lets use io-timeout value
conditionally based on kernel version at our e2e.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Currently, we get the kernel version where the e2e (client) executable runs,
not the kernel version that is used by the csi-rbdplugin pod.
Add a function that run `uname -r` command from the specified container and
returns the kernel version.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Suggested-by: Niels de Vos <ndevos@redhat.com>
Ceph’s logging levels operate on a scale of 1 to 20, where 1 is terse
and 20 is verbose.
Format:
debug-{subsystem} = {log-level}
Setting `rbd` loglevel to 20 at our e2e tests.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The rbd-nbd resize volume support with its netlink interface needs linux
kernel version >= v5.3.0
Hence define a defence check for the supported kernel version
Fixes: #2234
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The kmsConfig type in the e2e suite has been enhanced with two functions
that make it possible to validate the destruction of deleted keys.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
By using retryKubectl helper function,
a retry will be done, and the known error
messages will be skipped.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
this provides caller ability to pass the arguments
like ignore-not-found=true etc when executing
the kubectl commands.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added isAlreadyExistsCLIError to check for known error.
if error is already exists we are considering it
as a success.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
expandPVCSize() uses the namespace of the PVC that was checked. In case
the .Get() call fails, the PVC will not have its namespace set, and
subsequent tries will fail with errors like:
Error getting pvc in namespace: '': etcdserver: request timed out
waiting for PVC (9 seconds elapsed)
Error getting pvc in namespace: '': an empty namespace may not be set when a resource name is provided
By using the original namespace of the PVC stored in a separate variable
as is done with the name of the PVC, this problem should not occur
anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case listing the Kubernetes Services fails, the following error is
returned immediately:
failed to create configmap with error failed to list services: etcdserver: request timed out
Wrapping the listing of the Services in a PollImmediate() routine, adds
a retry in case of common temporary issues.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
registry.centos.org is not officially maintained by the CentOS
infrastructure team. The container images on quay.io are the official
once and we should use those instead.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Until we have a real fix, just to avoid occasionally file system entering
into read-only on nodeplugin restart, lets sync data from the application
pod.
Updates: #2204
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
There are reports where CephFS deploying failed with etcdserver
timeouts:
INFO: Running '/usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -'
INFO: rc: 1
FAIL: failed to create CephFS provisioner rbac with error error running /usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -:
Command stdout:
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
stderr:
Error from server: error when creating "STDIN": etcdserver: request timed out
Error from server: error when creating "STDIN": etcdserver: request timed out
Error from server: error when creating "STDIN": etcdserver: request timed out
error:
exit status 1
By using retryKubectlInput() helper function, a retry will be done, and
the failure should not be fatal any longer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
RBD image metadata keys that start with '.rbd' are expected to be
internal to RBD itself and are not mirrored to remote sites. Renaming
the keys (dropping the '.' prefix) and using the new MigrateMetadata()
function now makes the keys available on remote sites too.
Closes: #2219
Signed-off-by: Niels de Vos <ndevos@redhat.com>
framework.RunKubectl() returns an error that does not end with
"etcdserver: request timed out", but contains the text somewhere in the
middle:
error running /usr/bin/kubectl --server=https://192.168.39.57:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-a44ec4b4 create -f -:
Command stdout:
stderr:
Error from server: error when creating "STDIN": etcdserver: request timed out
error:
exit status 1
isRetryableAPIError() should return `true` for this case as well, so
instead of using HasSuffix(), we'll use Contains().
Signed-off-by: Niels de Vos <ndevos@redhat.com>