It seems that `/var/log/rook` inside the VM does not contain any files.
Getting the logs from the Pods through kubectl may not be as stable, but
it should get some logs when minikube is still/again available.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
More details of the Rook (and Ceph) deployment should be useful when
troubleshooting CI failures. This now includes the status of the most
important Kubernetes objects, and all the logs Ceph stores on the host.
Updates: #1969
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Without the `-w` argument, the output of `top` gets truncated, and the
commandline of the processes is not comlete. It would be useful to eb
able to tell which command uses 100% CPU in an output like:
17377 root 20 0 110.8m 8.2m 0.0 0.1 0:00.89 S `- containerd+
17414 167 20 0 1036.7m 59.6m 0.0 0.4 0:03.47 S `- ceph-o+
40875 root 20 0 283.9m 30.4m 100.0 0.2 0:00.23 R `- ceph
Updates: #1969
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It seems that it is required to re-throw the error after a catch{..}
block. Without this, and a successful execution of system-status.sh, the
CI jobs get marked as SUCCESS, even when there was a failure.
Fixes: e36155283 "ci: run system-status.sh in case a job fails"
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Without the script on the node, it can not be executed...
Fixes: e36155283 "ci: run system-status.sh in case a job fails"
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new `system-status.sh` script logs the status of the host and the
minikube VM. This gets executed when a CI job fails, and should aid in
troubleshooting spurious failures.
Updates: #1969
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The e2e tests very regulary hit a timeout where the Kubernetes API
becomes unreachable for 3 minutes. Hopefully it helps when more RAM is
available to the VM.
Updates: #1969
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The Kubernetes e2e external storage tests from v1.21 do not work yet
with Ceph-CSI. In order to address the issues, the job is now provided
and can be run with:
/test ci/centos/k8s-e2e-external-storage/1.21
The job for v1.20 is enabled by default, and identified by the
ci/centos/k8s-e2e-external-storage/1.20 context in PRs.
Updates: #2017
Signed-off-by: Niels de Vos <ndevos@redhat.com>
k8s-e2e-external-storage fails with error
`./podman2minikube.sh: line 16: minikube: command not found`.
This commit fixes it by starting minikube before calling
./podman2minikube.sh.
Signed-off-by: Rakshith R <rar@redhat.com>
added code to pre-pull the required container
images to run the k8s-e2e-external-storage E2E.
fixes: #2023
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
By default a version of Helm is used that does not want to get
installed. Using the same version as the devel branch makes the testing
work again.
See-also: helm/helm#9617
Signed-off-by: Niels de Vos <ndevos@redhat.com>
To test a PR with Kubernetes 1.21, leave a comment in the PR like:
/test ci/centos/mini-e2e-helm/k8s-1.21
The status of the job will be recorded in the PR, but running this job
is not required (yet).
Updates: #1963
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case a job has been started without a PR (manual, or timed), the
current checked out branch matches the original as there are not
additional changes in the tree. There is no need to abort the jobs when
the skip-doc-change.sh script did not detect any non-doc changes, as
there are no changes at all.
Updates: #1963
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When tests are started manually (through the Jenkins webui), there is no
PR associated with the job. That means the `git_since` and `ref` are
equal. Trying to create a new branch named `ref` will not work, as the
branch was already created when cloning the repository with `git_since`.
With this change, Jenkins jobs can be started manually. This makes it
possible to run regular/nightly jobs as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit enables pre-pulling of ROOK_CEPH_CLUSTER_IMAGE similar to
mini-e2e and min-e2e-helm to overcome Docker Hub pull rate limiter.
Signed-off-by: Rakshith R <rar@redhat.com>
After the introduction of ROOK_CEPH_CLUSTER_IMAGE in build.env, the
additional image needs to get pulled from the CI registry mirror and
pushed into the minikube VM.
Without this addition, the Docker Hub pull limits may prevent deploying
Rook.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the container image needs to be rebuild, two parallel jobs will try
to attempt that. With recent versions of Podman, this now fails.
When the image needs to be rebuild, do so in the stage where it would
otherwise get pulled. This makes sure the image gets build only once.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are timeouts happening where the logs do not show sufficient
output to diagnose the issue. These timeouts suggests that something
inside the minikube VM is not running as expected. Increasing the RAM to
12GB might help.
The bare-metal systems in the CentOS CI have a minimum of 16GB, so
running a single VM with 12GB should be possible.
See-also: https://wiki.centos.org/QaWiki/PubHardware
Updates: #1867
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Kubernetes v1.20 has been released, so lets use that for testing. Note
that v1.18 is still maintained, but our CI jobs will not consume it
anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
New Kubernetes versions are now prefixed with "Kubernetes", like:
$ ./scripts/get_patch_release.py
Kubernetes v1.18.13
Kubernetes v1.17.15
Kubernetes v1.19.5
Kubernetes v1.20.0
Kubernetes v1.20.0-rc.0
v1.20.0-beta.2
v1.18.12
v1.19.4
v1.17.14
v1.20.0-beta.1
v1.20.0-beta.0
v1.20.0-alpha.3
v1.18.10
v1.17.13
The new "Kubernetes" prefix prevents the current logic to not match the
version. By splitting the returned version string on words, and
returning the last component in get_releases(), the script works as
intended again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add CI jobs for Kubernetes v1.20 testing. These jobs will run, but are
not (yet) required before changes get merged.
Updates: #1784
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When podman2minukube is called, the output to stdout is lost. This makes
debugging issues difficult. Log the output, so that the name of the
image that is pushed into minikube can be verified.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The CI scripts pull all container images from the local CI registry. If
the image name starts with "docker.io/", the images will be pushed into
the test environment as "docker.io/docker.io/ceph/ceph:v15". This image
will not be used by the tests, so things can still fail in case Docker
Hub has reached the pull rate-limit.
By dropping the additional "docker.io/" from the BASE_IMAGE name, the
image gets pushed as "docker.io/ceph/ceph:v15" so the tests will use it
automatically.
Groovy-syntax: https://www.baeldung.com/groovy-remove-string-prefix#using-regex
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Instead of using a mirror, the CI registry is now pupulated with
container images that get pulled and tagged as if they get from Docker
Hub or other locations.
This is more of a manual mirror, as the Docker Registry mirror
functionality is not flexible enough for our usecase (push images
without providing them on docker.io).
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The mirror option of the Docker Registry container is very limited and
prevents updating or manually pushing images to the registry. Instead,
it tries to push the images to the docker.io, which is not what we need.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
docker.io/nginx:latest and docker.io/vault:latest are being redirected
to docker.io/library/. The redirection is not cached, and Docker Hub
might return an error during redirection when the pull rate-limit is
hit.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Deploying Ceph with Rook fails as the ceph/ceph:v15 base image can not
be pulled from within the minikube VM. By pushing the image into the VM,
but before deploying Rook, there should be no need to pull the image
from Docker Hub anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Changes in scripts will affect the CI jobs as the scripts are used while
deploying minikube, Rook and other components. These changes need
testing, just like anything else.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Some versions of minikube/docker add a "localhost/" prefix to imported
images. In that case, the image needs to get tagged without the prefix
as well.
When running podman2minikube.sh, the docker process inside the minikube
VM sometimes responds with:
# ./podman2minikube.sh rook/ceph:v1.3.9
Loaded image: localhost/rook/ceph:v1.3.9
When the "localhost/" prefix is added to the image name, deploying Rook
will try to pull the rook/ceph:v1.3.9 image again. This can fail when
the Docker Hub pull rate-limit is hit.
Without the "localhost/" prefix, there should be no further attempt to
pull the image, as it should be detected that the image is available.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Unqualified container images are currently used for CI jobs. In the
future this is expected to change. By preparing the cache/mirror and
images in minikube with the qualified tags, transition to qualified
image names should become easier.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case ROOK_VERSION is set in build.env, use the version from there.
Otherwise fall back to version 1.3.9.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This makes it possible to pull images from Docker Hub through the local
container image registry in the CI OpenShift deployment. The registry in
the CI is configured with the 'cephcsibot' account so that pulling
images is accounted towards the account, and not anonymous consumers
within the whole CentOS CI.
There should be no need to manually sync the images between the local
registry and Docker Hub anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Functions with Groovy can not use `def ci_registry` as the variable is
not in the scope. Pass the registry to the podman_login() and
podman_pull() functions instead.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
A typo when calling podman_log() causes CI jobs to fail.
Fixes: 1eec379 "ci: pre-pull Ceph base-image and cephcsi:devel for mini-e2e-helm jobs"
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The same changes have been made for the mini-e2e jobs yesterday, and
those seem to work well. Use the same pre-pull method for the Helm
deployment.
Signed-off-by: Niels de Vos <ndevos@redhat.com>