This commit disables mon,mgr and mds liveness probe
which on failing caused `crashLoopBackOff` state.
Updates: #2094
Signed-off-by: Rakshith R <rar@redhat.com>
(cherry picked from commit a4e4750fdc)
Changes:
1. Add a variable in build.env for rook ceph cluster version.
2. Modify rook.sh so that it can deploy ceph cluster with
desirable version also rather than the one which rook installs
by default.
3. Remove the code which is no longer required:
a. Code which was added to test snapshot feature.
b. Code which was required because
https://github.com/rook/rook/pull/5925 was not fixed.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
While deploying Rook, there can be issues when the environment is not
completely settled yet. On occasion the 1st kubectl command fails with
The connection to the server ... was refused - did you specify the right host or port?
This would set the 'ret' variable to a non-zero value, before the next
retry of the kubectl command is done. In case the kubectl command
succeeds, the 'ret' variable still contains the old non-zero value, and
kubectl_retry returns the incorrect result.
By setting the 'ret' variable to 0 before calling kubectl again, this
problem is prevented.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It seems that the new log_errors() function does not get triggered when
the script hits `exit 1` conditions in functions. The functions should
return a non-0 value, not cause an exit of the script.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Log a few commands that help troubleshooting Rook deployment issues.
This might need to get extended with more commands.
Updates: #1636
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There can be spurious failures in the CI when running kubectl create. On
occasion, the command returns with an error, but the api-server did
receive and process the request. This causes a 2nd create action to fail
with messages like:
cephcluster.ceph.rook.io/my-cluster created
Error from server: error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": etcdserver: request timed out
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
By handling the create action differently, and checking for the
AlreadyExists word in the stderr output, it is possible to detect
repeated creates that are not needed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add retries to prevent ci failure instantly.
Now, the command execution will retry upto
5 times, to avoid failures in some runs.
Signed-off-by: Yug <yuggupta27@gmail.com>
In test environments the default pool size is set to 1, so there is no
redundancy. This causes recent Ceph versions to complain with
HEALTH_WARN as POOL_NO_REDUNDANCY get set.
By disabling the mon_warn_on_pool_no_redundancy option in ceph.conf, the
warning is not reported and the cluster is marked HEALTHY.
See-also: rook/rook#5925
Signed-off-by: Niels de Vos <ndevos@redhat.com>
As part of https://github.com/ceph/ceph-csi/pull/1237/ there was
a patching enabled for the ceph cluster deployed, however due to
an error in the version fetching logic, the patching was not applied
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Rook version is currently 1.1.7 in our e2e deployment which brings 14.2.4 version
of ceph cluster. To support cephfs snapshot e2e, we need latest version of Ceph Cluster
in E2E. Rook 1.2.7 is good enough which on patching bring up ceph 14.2.10 cluster.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds support to mention dataPool parameter for the
topology constrained pools in the StorageClass, that can be
leveraged to mention erasure coded pool names to use for RBD
data instead of the replica pools.
Signed-off-by: ShyamsundarR <srangana@redhat.com>
- This commit adds tests only for RBD, as CephFS still needs
an enhancement in CephFS subvolume commands to effectively use
topology based provisioning
Signed-off-by: ShyamsundarR <srangana@redhat.com>
We have the e2e test with --deploy-rook=true that makes all test
environment. It works fine, but It does not seem to be the role of
e2e test. In addition, when developing the code we need to run full
test scenario with deploying rook every time, or we need to build
rook environment by hand. Move rook-deploy code to minikube.sh.