Log a few commands that help troubleshooting Rook deployment issues.
This might need to get extended with more commands.
Updates: #1636
Signed-off-by: Niels de Vos <ndevos@redhat.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The GitHub style for Pull Request and Issue templates add HTML tags for
some advanced usage. The MarkDown linter should not give warnings when
these are used.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The StorageClasses that get deployed for the Kubernetes e2e external
storage tests reference a ConfigMap that contains the connection details
for the Ceph cluster. Without this ConfigMap, Ceph-CSI will not function
correctly.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Currently the scripts/install-snapshot.sh script needs to be called
depending on the Kubernetes version. It would be much easier to use the
script if it is intelligent enough to decide itself whether k8s snapshot
controller needs to be installed or not.
Fixes: #1139
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Just like deploy-rook and teardown-rook, this patch will add
install snapshotter and cleanup snapshotter option to minikube
script.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
When replication count is >1 of the provisioner, the added anti-affinity rules
will prevent provisioner operators from scheduling on the same nodes. The
kubernetes scheduler will spread the pods across nodes to improve availability
during node failures.
Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
Instead of using the Docker command to push the image to to minikube VM,
read the image from stdin over ssh and load it with the Docker command
that is available inside the VM.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Allow passing:
$ CONTAINER_CMD="sudo docker" ./scripts/minikube.sh cephcsi
or
$ CONTAINER_CMD="sudo podman" ./scripts/minikube.sh cephcsi
Because the container images could list in '# sudo docker images' or
'# sudo podman images' incase if the Makefile target image-cephcsi is
run with sudo
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Add a way to supply local CONTAINER_CMD option of choice via
env variable to minikube.sh
Note: we still use docker daemon env at minikube box, in the future
we can switch to podman service env '# minikube podman-env' if needed
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Problem:
-------
$ minikube version
minikube version: v1.12.2
commit: be7c19d391302656d27f1f213657d925c4e1cfc2-dirty
$ ./scripts/minikube.sh up
installed minikube version v1.12.2 is not matching requested version latest
Here v1.12.2 is the latest version of minikube, but the script simply bails out.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
In this script we defined all the functions at the top and then started
with executable commands (entry points to script start).
Only this function is odd in the script unlike the rest of them, defined
in between the execution sequence taking away the readability.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
we cannot depend on the master branch of external-snapshotter
in cephcsi as the master branch can change anytime. its
good to use released tags to our E2E.
fixes: #1416
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
There can be spurious failures in the CI when running kubectl create. On
occasion, the command returns with an error, but the api-server did
receive and process the request. This causes a 2nd create action to fail
with messages like:
cephcluster.ceph.rook.io/my-cluster created
Error from server: error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": etcdserver: request timed out
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
By handling the create action differently, and checking for the
AlreadyExists word in the stderr output, it is possible to detect
repeated creates that are not needed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add retries to prevent ci failure instantly.
Now, the command execution will retry upto
5 times, to avoid failures in some runs.
Signed-off-by: Yug <yuggupta27@gmail.com>
By default the install-helm.sh script uses "latest" as version for Helm.
Unfortunately this version does not exist. The HELM_VERSION variable is
already set in build.env, so source the configuration file as one of the
first actions in install-helm.sh.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
By default minikube uses 2 CPUs, which might be too little for some of
the tests. When not passing a CPUS environment variable, use all CPUs
available on the system (detected with 'nproc').
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The keeps the standard arguments for e2e testing in a single location
instead of spread over multiple files and CI jobs.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In test environments the default pool size is set to 1, so there is no
redundancy. This causes recent Ceph versions to complain with
HEALTH_WARN as POOL_NO_REDUNDANCY get set.
By disabling the mon_warn_on_pool_no_redundancy option in ceph.conf, the
warning is not reported and the cluster is marked HEALTHY.
See-also: rook/rook#5925
Signed-off-by: Niels de Vos <ndevos@redhat.com>
minikube has /sbin/losetup from Busybox, and that does not work with
raw-block PVCs. Use the losetup executable from the host in the VM
instead.
See-also: kubernetes/minikube#8284
Signed-off-by: Niels de Vos <ndevos@redhat.com>
While testing with tehj default 3000 MB RAM in the minikube VM, creating
a encrypted RBD volume fails because 'cryptsetup' gets killed:
[ 766.072585] Out of memory: Kill process 18497 (cryptsetup) score 1182 or sacrifice child
[ 766.072589] Killed process 18497 (cryptsetup) total-vm:863136kB, anon-rss:510336kB, file-rss:10788kB, shmem-rss:0kB
[ 766.072688] oom_reaper: reaped process 18497 (cryptsetup), now anon-rss:510336kB, file-rss:10780kB, shmem-rss:0kB
Using 4 GB RAM should prevent this from occuring.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case kubectl did not get installed (VM_DRIVER != none),
scripts/minikube.sh can fail when kubectl is not in the path. By running
the "kubectl cluster-info" command through minikube, the script will
succeed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case there is a minikube executable in the $PATH already, use that
for all commands. If there is none, install_minikube() will place a
newly downloaded executable in /usr/local/bin which will be used by the
full pathname, so that commands as root without /usr/local/bin in the
$PATH will work.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The command fails when PWD=/. It is unclear what the command tries to
achieve. The next command does something more useful, although it can
maybe be removed as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When starting minikube as root with --driver=kvm2, minikube complains
that this is not the right thing to do. However, in the CentOS CI we
really want to run as root, as that makes the scripts simpler.
Add the --force option while starting, so that minikube does not abort
anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Storage providers and the default storage class is not needed for
Ceph-CSI testing. In order to reduce resources and potential conflicts
between storage plugins, disable them.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
As part of https://github.com/ceph/ceph-csi/pull/1237/ there was
a patching enabled for the ceph cluster deployed, however due to
an error in the version fetching logic, the patching was not applied
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
goerr113 linter checks the errors handling expressions.
It warns about using wrapped static errors in place
of dynamic at multiple places.
Disabled the linter as of now to avoid regression,
and this need to be handled in a seperate issue.
Tracker Issue: #1227
Signed-off-by: Yug <yuggupta27@gmail.com>
The `nestif` linter reports deeply nested if statements.
Disabled `nestif` as of now to avoid regression and
needs to addressed seperately.
Tracking Issue: #1229
Signed-off-by: Yug <yuggupta27@gmail.com>
The 'testpackage' linter recommends "black box" testing, which prevents
testing internal/non-exported functions from being tested. We have tests
that *do* test non-exported functions and types.
Disabling the linter allows us to test non-exported types and functions.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Rook version is currently 1.1.7 in our e2e deployment which brings 14.2.4 version
of ceph cluster. To support cephfs snapshot e2e, we need latest version of Ceph Cluster
in E2E. Rook 1.2.7 is good enough which on patching bring up ceph 14.2.10 cluster.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
With minikube versions greater than 1.6.2 and less than 1.11.1, the YAML files
minikube path will not be automatically applied to the cluster. we will get
errors during bootstrap of the cluster if the admission controller is enabled.
To use Pod Security Policies with these versions of minikube, first start a
cluster without the `PodSecurityPolicy` admission controller enabled.
Next, apply the psp yaml. and stop the cluster and then restart it
with the admission controller enabled.
```
minikube start
kubectl apply -f /path/to/psp.yaml
minikube stop
minikube start --extra-config=apiserver.enable-admission-plugins=PodSecurityPolicy
```
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>