ci: only retry "kubectl create" if objects are missing

There can be spurious failures in the CI when running kubectl create. On
occasion, the command returns with an error, but the api-server did
receive and process the request. This causes a 2nd create action to fail
with messages like:

    cephcluster.ceph.rook.io/my-cluster created
    Error from server: error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": etcdserver: request timed out
    Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
    Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
    Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
    Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
    Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
    Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists
    Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": configmaps "rook-config-override" already exists
    Error from server (AlreadyExists): error when creating "/tmp/tmp.Ur1ZPG85o9/cluster-test.yaml": cephclusters.ceph.rook.io "my-cluster" already exists

By handling the create action differently, and checking for the
AlreadyExists word in the stderr output, it is possible to detect
repeated creates that are not needed.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit is contained in:
Niels de Vos 2020-08-11 11:13:29 +02:00 committed by mergify[bot]
parent f11486f4b6
commit c0fbaf4276

View File

@ -13,18 +13,43 @@ rook_version() {
}
kubectl_retry() {
local retries=0
local retries=0 action="${1}" ret=0 stdout stderr
shift
while ! kubectl "${@}"
# temporary files for kubectl output
stdout=$(mktemp rook-kubectl-stdout.XXXXXXXX)
stderr=$(mktemp rook-kubectl-stderr.XXXXXXXX)
while ! kubectl "${action}" "${@}" 2>"${stderr}" 1>"${stdout}"
do
# in case of a failure when running "create", ignore errors with "AlreadyExists"
if [ "${action}" == 'create' ]
then
# count lines in stderr that do not have "AlreadyExists"
ret=$(grep -cvw 'AlreadyExists' "${stderr}")
if [ "${ret}" -eq 0 ]
then
# Succes! stderr is empty after removing all "AlreadyExists" lines.
break
fi
fi
retries=$((retries+1))
if [ ${retries} -eq ${KUBECTL_RETRY} ]
then
return 1
ret=1
break
fi
sleep ${KUBECTL_RETRY_DELAY}
done
return 0
# write output so that calling functions can consume it
cat "${stdout}" > /dev/stdout
cat "${stderr}" > /dev/stderr
rm -f "${stdout}" "${stderr}"
return ${ret}
}
function deploy_rook() {