There are occasions where deleting a PVC (or PV) never succeeds. The
reported status of the deleted object is sometimes empty, which suggests
that the PVC or PV was, in fact, deleted.
To diagnose the incorrect error checking, include the errors for
retrying in the logs.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The scale down/up functions fail often with "deployment not found"
errors. Possibly deploying with Podman is slower than deploying in a
minikube VM, and there is a delay for the deployment to become
available.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Ginkgo v1 is deprecated and was replaced
with the v2.
Ref: https://onsi.github.io/ginkgo/
MIGRATING_TO_V2#upgrading-to-ginkgo-20
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as PSP is deprecated in kubernetes 1.21
and will be removed in kubernetes 1.25
removing the existing PSP related templates
from the repo and updated the required documents.
fixes#1988
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit makes modification to nfs daemonset to use
nfs nodeserver. `nfs.NetNamespaceFilePath` example is
added.
Signed-off-by: Rakshith R <rar@redhat.com>
Some of the steps still refer to CephFS, likely missed some replacements
while copy/pasting. The logging is a little confusing when messages
claim something with CephFS failed, but the test is about NFS.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Sometimes executing a command in a Pod fails with "unable to upgrade
connection". This is most likely a temporary situation, and retrying
hopefully reduces the number of spurious failures because of it.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
k8sVersionGreaterEquals is not used anywhere but it
will be used in future if we need to have a kubernetes
version check. adding nolint for it now to skip it
from static check.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
We run CI jobs on kubernetes 1.22 by default
and we dont need to have a check to make sure
we have atleast Kubernetes 1.22 for few tests.
As we have CI runs on 1.22 by default, Removing
unwanted check.
updates: #3086
depends-on #3255
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When getting the PVC or PV failed, the returned object may contain empty
values. If that happens, a retry uses the empty values for Namespace and
Name, which will never be successful.
Instead, use the Namespace and Name attributes from the original object,
and not from the object returned by the Get() call.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When using `lock_on_read`, the RBD image needs to have the
`exclusive-lock` feature enabled too.
Fixes: #3221
Signed-off-by: Niels de Vos <ndevos@redhat.com>
CI is failing very frequently hitting resource leaks issue,
until we solve the root cause for resource leaks reducing the clone
count from 10 to 3.
related: #2327
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
RBD supports creating rbd images with
object size, stripe unit and stripe count
to support striping. This PR adds the support
for the same.
More details about striping at
https://docs.ceph.com/en/quincy/man/8/rbd/#stripingfixes: #3124
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
at present we have the check for kube version 1.21 in the tests
which no longer required as it falls under supported kubernetes
versions with the driver.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commits removes
TODO: update nfs node-plugin that has kubernetes-csi/csi-driver-nfs#319
Since, the nfsplugin image is already updated to v4.0.0.
Signed-off-by: Rakshith R <rar@redhat.com>
2 omap objects are getting leaked in the e2e tests, this change is to
workaround them for now.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
validate omap count in every testcase right after
validateSubvolumeCount()
Fixes: #2834
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Validate that we:
* Unset the PVC metadata on the rbd image created for the snapshot
* snapshot metadata on CreateVolume from snapshot
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
On occasion the Pods have not been (re)started before they get listed.
This can result in an empty list. It can occur during RBD testing where
Pods are restarted before `uname` is executed. In case the Pods are not
available yet, the test will fail with the "podlist is empty" error.
By adding a retry when the list of Pods is empty, the tests should
become a little more stable.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Some of the deployment artifacts refer to others (like ServiceAccount in
a Deployment). If the dependencies are not available (yet), there will
be errors reported in the logs. By deploying the components in a more
correct order, fewer errors are reported, making the logs a little
easier to understand.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When testing NFS-provisioning on a cluster that has an NFS-provisioner
and node-plugins deployed with a different driver-name, it is very
useful to have a commandline option to change the name of the
provisioner that is placed in the StorageClass.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
NFS testing will automatically be enabled when CephFS is enabled. This
makes sure the NFS tests run in the CI where there are different jobs
for CephFS and RBD. With a dedicated testNFS variable, it is still
possible to only run the NFS tests, when both CephFS and RBD are
disabled.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The tests for the NFS-provisioner can be run by passing -deploy-nfs and
-test-nfs as parameters to the `go test` or `e2e.test` command.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
added getPersistentVolume helper function
to get the PV and also try if there is any API
error to improve the CI.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added getPersistentVolumeClaim helper function
to get the PVC and also try if there is any API
error to improve the CI.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The Ceph cluster-id is usually detected with `ceph fsid`. This is not
always correct, as the the Ceph cluster can also be configured by name.
If the -clusterid=... is passed, it will be used instead of trying to
detect it with `ceph fsid`.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are many locations where the cluster-id (`ceph fsid`) is obtained
from the Rook Toolbox. Instead of duplicating the code everywhere, use a
new helper function getClusterID().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
A new -filesystem=... option has been added so that the e2e tests can
run against environments that do not have a "myfs" CephFS filesystem.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
StorageClasses are cluster resources, not namespaced; there is no need
to log the namespace of a StorageClass.
When creating a StorageClass, NotFound is not an error that will be
returned, not need to check for it.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
On occasion the creation of the StorageClass can fail due to an
etcdserver timeout. If that happens, the creation can be attempted after
a delay.
This has already been done for CephFS StorageClasses, but was missed for
RBD.
See-also: ceph/ceph-csi@8a0377ef02
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Some parts of the Context() seem to get executed, even when BeforeEach()
did a Skip() for the test. By adding a return inside the Context(), the
tests should not get executed at all.
This was noticed in a failed test, where upgrade was running, eventhough
the job was executed as a nornal non-upgrade one.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit change the image registry URL for sidecars in the
deployment from `k8s.gcr.io` to `registry.k8s.io` as
the migration is happening from former to the latter. This commit
also correct the e2e readme for the change.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
As radosNamespace is more specific to
RBD not the general ceph configuration. Now
we introduced a new RBD section for RBD specific
options, Moving the radosNamespace to RBD section
and keeping the radosNamespace still under the
global ceph level configration for backward
compatibility.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Instead of patching the PV to update
the persistentVolumeReclaimPolicy and
the claimRef before deleting the PVC.
Patch PV persistentVolumeReclaimPolicy to Retain
to retain the PV after deleting the PVC.
Remove the claimRef on the PV after deleting
the PVC so that claim can be attached to a new PVC.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we no longer require the kubernetes validation for clone tests in
the e2e tests. This commit remove it for CephFS.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
we no longer require the kubernetes validation for clone tests in
the e2e tests. This commit remove it for RBD.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
On OpenShift it is not possible for the Rook toolbox to get the metrics
from Kubelet (without additional configuration). By passing
-is-openshift, the metrics are not checked, and the e2e suite does not
fail on that particular piece.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
A new -filesystem=... option has been added so that the e2e tests can
run against environments that do not have a "myfs" CephFS filesystem.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case the toolbox pod is not available, the error message lists that
no Pods are found, but there is no hint about the toolbox. By mentioning
the toolbox in the error message, it suggests a good place to start
troubleshooting the environment.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit correct the release version of upgrade tests from
unsupported 3.3.1 to supported version.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering snapshot controllers have been moved to GA since
kube version 1.20, we no longer need to have a mention of beta
version of the same in our deployment.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
* create a PVC and check PVC/PV metadata on RBD image
* create and delete a PVC, attach the old PV to a new PVC and check if
PVC metadata is updated on RBD image
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Makes the rbd images features in the storageclass
as optional so that default image features of librbd
can be used. and also kept the option to user
to specify the image features in the storageclass.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as deep-flatten is long supported in ceph and its
enabled by default in the librbd, providing an option
to enable it in cephcsi for the rbd images we are
creating.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
it might need sometime for the deployment to
get created, consider the NotFound as a valid
error and retry again.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
On occasion deploying CephFS components fail due to errors like these:
failed to delete provisioner rbac .../csi-provisioner-rbac.yaml
By using the deleteResource() helper, an retry is done in case of a
failure.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There have been errors while CephFS tests were running, like:
failed to create storageclass: etcdserver: request timed out
When retrying to create the StorageClass, the e2e tests are expected to
continue and (hopefully) succeed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The CentOS Stream 8 base container image does not have `ps` installed.
This causes CI jobs to fail, when checking for a restarted rbd-nbd
process.
Instead of using `ps`, the `pstree` command can be used. This will add
some ASCII-tree symbols in front of the command that is logged by the
e2e tests, but that is only used for manual reviewing and does not harm
the running test.
Fixes: #2850
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit removes the thick provisioning
code as thick provisioning is deprecated in
cephcsi 3.5.0.
fixes: #2795
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
cephfs data pool name is changed from filesystem-data0
to filesystem-replicated in Rook 1.8. updating
the cephcsi helper functions also to use new
pool names.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as ioutil.ReadFile is deprecated and
suggestion is to use os.ReadFile as
per https://pkg.go.dev/io/ioutil updating
the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit make recreateCSIRBDPods function to be a general one
so that it can be consumed by more clients.
Updates https://github.com/ceph/ceph-csi/issues/2509
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
added e2e for below cases
Normal PVC clone to a bigger
size PVC (without encryption)
* Filesystem pvc clone to a bigger size
* Block pvc clone to a bigger size
Encrypted PVC clone to a bigger
size PVC
* Filesystem pvc clone to a bigger size
* Block pvc clone to a bigger size
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added e2e for below cases
Normal PVC snapshot restore to a bigger
size PVC (without encryption)
* Filesystem pvc restore to a bigger size
* Block pvc restore to a bigger size
Encrypted PVC snapshot restore to a bigger
size PVC
* Filesystem pvc restore to a bigger size
* Block pvc restore to a bigger size
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit adjust existing migration e2e tests to a couple of tests
to cover the scenarios. The seperate filesystem and block tests have
been shrinked to single one and also introduced a couple of helper
functions to setup and teardown migraition specific secret,configmap
and sc. The static pv function has been renamed to a general name
while the tests were adjusted.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This `unparam` linter escape is no longer needed and CI is failing
if we keep there. This commit remove the same and make CI happy.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
adding e2e testcase to validate the workflow
of pvc creation and attaching to pod works for
new image features like fast-diff,obj-map,exclusive-lock
and layering.
fixes: #2695
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently, we are skipping the generic ephemeral
testing if the kubernetes version is less than
1.21 because of this one the who test suite is
getting skipped and e2e is marked as success
in 2 minutes. This commit runs the ephemeral
tests if the kube=>1.21+. If we do this, for
the lower version we can run other tests.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The e2e sometimes fail getting objects like PVCs from the Kubernetes API
server, and log the following error:
Error getting pvc "rbd-6940" in namespace "rbd-694": rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field
By checking the error message, and initiating a retry on this failure,
CI jobs should fail less regulary.
Signed-off-by: Niels de Vos <ndevos@redhat.com>