Commit Graph

320 Commits

Author SHA1 Message Date
Niels de Vos
c40a055628 ci: log events from the rook-ceph namespace on failure
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-14 16:45:24 +00:00
Niels de Vos
0049638e64 ci: get logs from all pods in the rook-ceph namespace on failure
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-14 16:45:24 +00:00
Niels de Vos
7ebc8306ef ci: get the logs from the Ceph cluster pods
It seems that `/var/log/rook` inside the VM does not contain any files.
Getting the logs from the Pods through kubectl may not be as stable, but
it should get some logs when minikube is still/again available.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-14 13:06:17 +00:00
Niels de Vos
410db81215 ci: include status of the Rook deployment on failure
More details of the Rook (and Ceph) deployment should be useful when
troubleshooting CI failures. This now includes the status of the most
important Kubernetes objects, and all the logs Ceph stores on the host.

Updates: #1969
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-12 14:42:09 +00:00
Yug
86d5c17ba4 ci: use "-T" to display the filesystem type
Provide filesystem type information with logs

Signed-off-by: Yug <yuggupta27@gmail.com>
2021-05-12 09:36:53 +00:00
Niels de Vos
bae519db07 ci: use "top -w" for untruncated wide output
Without the `-w` argument, the output of `top` gets truncated, and the
commandline of the processes is not comlete. It would be useful to eb
able to tell which command uses 100% CPU in an output like:

  17377 root      20   0  110.8m   8.2m   0.0   0.1   0:00.89 S  `- containerd+
  17414 167       20   0 1036.7m  59.6m   0.0   0.4   0:03.47 S      `- ceph-o+
  40875 root      20   0  283.9m  30.4m 100.0   0.2   0:00.23 R      `- ceph

Updates: #1969
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-12 05:12:09 +00:00
Niels de Vos
e599e95f25 ci: in case of a failure, return error after logging system status
It seems that it is required to re-throw the error after a catch{..}
block. Without this, and a successful execution of system-status.sh, the
CI jobs get marked as SUCCESS, even when there was a failure.

Fixes: e36155283 "ci: run system-status.sh in case a job fails"
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-11 13:13:39 +00:00
Niels de Vos
5b03721a58 ci: copy system-status.sh script to the bare metal node
Without the script on the node, it can not be executed...

Fixes: e36155283 "ci: run system-status.sh in case a job fails"
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-11 12:33:03 +00:00
Niels de Vos
e36155283b ci: run system-status.sh in case a job fails
The new `system-status.sh` script logs the status of the host and the
minikube VM. This gets executed when a CI job fails, and should aid in
troubleshooting spurious failures.

Updates: #1969
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-11 11:27:01 +00:00
Niels de Vos
4ef36aed0c ci: increase memory for minikube VM to 14GB
The e2e tests very regulary hit a timeout where the Kubernetes API
becomes unreachable for 3 minutes. Hopefully it helps when more RAM is
available to the VM.

Updates: #1969
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-07 02:04:35 +00:00
Niels de Vos
2fb24c3c4c ci: provide k8s-e2e-external-storage jobs for different k8s versions
The Kubernetes e2e external storage tests from v1.21 do not work yet
with Ceph-CSI. In order to address the issues, the job is now provided
and can be run with:

     /test ci/centos/k8s-e2e-external-storage/1.21

The job for v1.20 is enabled by default, and identified by the
ci/centos/k8s-e2e-external-storage/1.20 context in PRs.

Updates: #2017
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-03 07:30:50 +00:00
Rakshith R
ddd10c3245 ci: pull and push busybox image after installing minikube
k8s-e2e-external-storage fails with error
`./podman2minikube.sh: line 16: minikube: command not found`.
This commit fixes it by starting minikube before calling
./podman2minikube.sh.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-04-29 10:50:07 +00:00
Madhu Rajanna
1cc12b1a1c ci: pre-pull busybox container image
pre-pull the required busy box container
image in k8s-e2e-external-storage.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-28 12:10:40 +00:00
Madhu Rajanna
c5ce8e1a95 ci: add missing ci_registry k8s-e2e-external-storage groovy
added missing ci_registry variable to the
k8s-e2e-external-storage groovy file.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-27 13:02:51 +00:00
Madhu Rajanna
43dd2a20e6 ci: pre-pull the required container images
added code to pre-pull the required container
images to run the k8s-e2e-external-storage E2E.

fixes: #2023

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-27 11:08:16 +00:00
Niels de Vos
918d5e0870 ci: enable running of k8s-e2e-external-storage job by default
The job seems stable, and can be run by default now. Once it has been
run on several PRs, the `ci/centos/k8s-e2e-external-storage` job ID
can be added to the Mergify configuration.

See-also: https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/organizations/jenkins/k8s-e2e-external-storage/activity
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-26 04:18:39 +00:00
Niels de Vos
955559a235 ci: use Helm 3.1.2 in test container
By default a version of Helm is used that does not want to get
installed. Using the same version as the devel branch makes the testing
work again.

See-also: helm/helm#9617
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-20 07:08:16 +00:00
Niels de Vos
b5aa0b11d7 ci: add optional Kubernetes 1.21 job
To test a PR with Kubernetes 1.21, leave a comment in the PR like:

    /test ci/centos/mini-e2e-helm/k8s-1.21

The status of the job will be recorded in the PR, but running this job
is not required (yet).

Updates: #1963
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-20 07:08:16 +00:00
Niels de Vos
c04a319aa9 ci: only abort on doc-change when running for PRs
In case a job has been started without a PR (manual, or timed), the
current checked out branch matches the original as there are not
additional changes in the tree. There is no need to abort the jobs when
the skip-doc-change.sh script did not detect any non-doc changes, as
there are no changes at all.

Updates: #1963
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-09 09:53:22 +00:00
Niels de Vos
8f84e592d5 ci: do not re-checkout current branch
When tests are started manually (through the Jenkins webui), there is no
PR associated with the job. That means the `git_since` and `ref` are
equal. Trying to create a new branch named `ref` will not work, as the
branch was already created when cloning the repository with `git_since`.

With this change, Jenkins jobs can be started manually. This makes it
possible to run regular/nightly jobs as well.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-08 08:38:11 +00:00
Yug
e030097eaa ci: Disable containerized-tests job
The containerized-test job should run only if
manually triggered.

Signed-off-by: Yug <yuggupta27@gmail.com>
2021-04-07 14:16:38 +00:00
Rakshith R
54753f898b ci: enable ceph image pre-pulling in upgrade-tests
This commit enables pre-pulling of ROOK_CEPH_CLUSTER_IMAGE similar to
mini-e2e and min-e2e-helm to overcome Docker Hub pull rate limiter.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-04-05 07:25:45 +00:00
Niels de Vos
69cb6aeead ci: pre-pull ROOK_CEPH_CLUSTER_IMAGE if set
After the introduction of ROOK_CEPH_CLUSTER_IMAGE in build.env, the
additional image needs to get pulled from the CI registry mirror and
pushed into the minikube VM.

Without this addition, the Docker Hub pull limits may prevent deploying
Rook.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-03-03 11:29:52 +00:00
Niels de Vos
415abead1e build: only use --cpuset options when the cgroup controller is available
Fixes: #1670
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-03-03 05:48:42 +00:00
Niels de Vos
e6b70c494e ci: prevent parallel builds from causing conflicts
When the container image needs to be rebuild, two parallel jobs will try
to attempt that. With recent versions of Podman, this now fails.

When the image needs to be rebuild, do so in the stage where it would
otherwise get pulled. This makes sure the image gets build only once.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-03-03 05:48:42 +00:00
Niels de Vos
1c2974d49e ci: the "master" branch got renamed to "devel"
Closes: #1193
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-03-01 05:22:06 +00:00
Mudit Agarwal
92913912ef ci: read ROOK_CEPH_CLUSTER_IMAGE from build.env if available
In case ROOK_CEPH_CLUSTER_IMAGE is set in build.env, use the
version from there.

Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
2021-02-19 04:13:15 +00:00
Niels de Vos
322a7e4e08 ci: request minikube VMs with 12GB RAM
There are timeouts happening where the logs do not show sufficient
output to diagnose the issue. These timeouts suggests that something
inside the minikube VM is not running as expected. Increasing the RAM to
12GB might help.

The bare-metal systems in the CentOS CI have a minimum of 16GB, so
running a single VM with 12GB should be possible.

See-also: https://wiki.centos.org/QaWiki/PubHardware
Updates: #1867
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-15 09:43:38 +00:00
Niels de Vos
0b78359b9b ci: do not run jobs with Kubernetes v1.18 anymore
Kubernetes v1.20 has been released, so lets use that for testing. Note
that v1.18 is still maintained, but our CI jobs will not consume it
anymore.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-18 06:35:49 +00:00
Niels de Vos
d610c51007 ci: detect latest version with "Kubernetes" prefix
New Kubernetes versions are now prefixed with "Kubernetes", like:

    $ ./scripts/get_patch_release.py
    Kubernetes v1.18.13
    Kubernetes v1.17.15
    Kubernetes v1.19.5
    Kubernetes v1.20.0
    Kubernetes v1.20.0-rc.0
    v1.20.0-beta.2
    v1.18.12
    v1.19.4
    v1.17.14
    v1.20.0-beta.1
    v1.20.0-beta.0
    v1.20.0-alpha.3
    v1.18.10
    v1.17.13

The new "Kubernetes" prefix prevents the current logic to not match the
version. By splitting the returned version string on words, and
returning the last component in get_releases(), the script works as
intended again.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-16 04:47:42 +00:00
Niels de Vos
5562a6ded9 ci: add jobs with Kubernetes v1.20
Add CI jobs for Kubernetes v1.20 testing. These jobs will run, but are
not (yet) required before changes get merged.

Updates: #1784
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-15 13:19:04 +00:00
Niels de Vos
8ee790a44d ci: log output of "docker image save"
When podman2minukube is called, the output to stdout is lost. This makes
debugging issues difficult. Log the output, so that the name of the
image that is pushed into minikube can be verified.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-01 16:52:41 +00:00
Niels de Vos
a26772188a ci: pull BASE_IMAGE from local registry
The CI scripts pull all container images from the local CI registry. If
the image name starts with "docker.io/", the images will be pushed into
the test environment as "docker.io/docker.io/ceph/ceph:v15". This image
will not be used by the tests, so things can still fail in case Docker
Hub has reached the pull rate-limit.

By dropping the additional "docker.io/" from the BASE_IMAGE name, the
image gets pushed as "docker.io/ceph/ceph:v15" so the tests will use it
automatically.

Groovy-syntax: https://www.baeldung.com/groovy-remove-string-prefix#using-regex
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-30 08:47:18 +00:00
Niels de Vos
7cac1f7609 ci: remove docker mirror configuration
Instead of using a mirror, the CI registry is now pupulated with
container images that get pulled and tagged as if they get from Docker
Hub or other locations.

This is more of a manual mirror, as the Docker Registry mirror
functionality is not flexible enough for our usecase (push images
without providing them on docker.io).

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-26 16:26:57 +00:00
Niels de Vos
468b6cd67d ci: pull images from local registry directly
The mirror option of the Docker Registry container is very limited and
prevents updating or manually pushing images to the registry. Instead,
it tries to push the images to the docker.io, which is not what we need.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-26 16:26:57 +00:00
Niels de Vos
005d201f2f ci: use docker.io/library/ as prefix for nginx and vault images
docker.io/nginx:latest and docker.io/vault:latest are being redirected
to docker.io/library/. The redirection is not cached, and Docker Hub
might return an error during redirection when the pull rate-limit is
hit.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-26 12:40:48 +00:00
Niels de Vos
7384683af2 ci: push ceph/ceph container image into minikube
Deploying Ceph with Rook fails as the ceph/ceph:v15 base image can not
be pulled from within the minikube VM. By pushing the image into the VM,
but before deploying Rook, there should be no need to pull the image
from Docker Hub anymore.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-26 07:23:58 +00:00
Niels de Vos
5db647ba64 ci: do not mark script changes as doc-only
Changes in scripts will affect the CI jobs as the scripts are used while
deploying minikube, Rook and other components. These changes need
testing, just like anything else.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-26 07:01:17 +00:00
Niels de Vos
7c6dbfdb8e ci: strip localhost/ prefix after importing images in minikube
Some versions of minikube/docker add a "localhost/" prefix to imported
images. In that case, the image needs to get tagged without the prefix
as well.

When running podman2minikube.sh, the docker process inside the minikube
VM sometimes responds with:

    # ./podman2minikube.sh rook/ceph:v1.3.9
    Loaded image: localhost/rook/ceph:v1.3.9

When the "localhost/" prefix is added to the image name, deploying Rook
will try to pull the rook/ceph:v1.3.9 image again. This can fail when
the Docker Hub pull rate-limit is hit.

Without the "localhost/" prefix, there should be no further attempt to
pull the image, as it should be detected that the image is available.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-25 15:13:43 +00:00
Niels de Vos
a9557f36f3 ci: provide qualified image tags for docker.io images
Unqualified container images are currently used for CI jobs. In the
future this is expected to change. By preparing the cache/mirror and
images in minikube with the qualified tags, transition to qualified
image names should become easier.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-24 05:09:38 +00:00
Niels de Vos
5fd567f354 cleanup: "podman inspect rook/ceph" does not need to show output
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-24 05:09:38 +00:00
Niels de Vos
6c4c6784c4 ci: read ROOK_VERSION from build.env if available
In case ROOK_VERSION is set in build.env, use the version from there.
Otherwise fall back to version 1.3.9.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-24 05:09:38 +00:00
Niels de Vos
5ae8fb7c9b ci: add configuration for the proxy/mirror registry
This makes it possible to pull images from Docker Hub through the local
container image registry in the CI OpenShift deployment. The registry in
the CI is configured with the 'cephcsibot' account so that pulling
images is accounted towards the account, and not anonymous consumers
within the whole CentOS CI.

There should be no need to manually sync the images between the local
registry and Docker Hub anymore.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-24 05:09:38 +00:00
Niels de Vos
6a7e6c841f ci: pre-pull rook/ceph image from local registry
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-19 12:48:32 +00:00
Niels de Vos
b9cffc1b42 ci: pass registry to podman helper functions
Functions with Groovy can not use `def ci_registry` as the variable is
not in the scope. Pass the registry to the podman_login() and
podman_pull() functions instead.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-18 12:31:22 +00:00
Niels de Vos
ea5985fa3a ci: fix calling podman_login()
A typo when calling podman_log() causes CI jobs to fail.

Fixes: 1eec379 "ci: pre-pull Ceph base-image and cephcsi:devel for mini-e2e-helm jobs"
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-18 12:02:25 +00:00
Niels de Vos
f36ef72a19 ci: pre-pull nginx and vault images from local registry
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-18 11:49:10 +00:00
Niels de Vos
dd10e66a98 ci: move podman2minikube() into its own script
This way, it can easier be re-used for other container images.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-18 11:49:10 +00:00
Niels de Vos
e4339fea72 ci: pre-pull Ceph base-image and cephcsi:devel for upgrade-tests
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-18 11:49:10 +00:00
Niels de Vos
1eec3792ec ci: pre-pull Ceph base-image and cephcsi:devel for mini-e2e-helm jobs
The same changes have been made for the mini-e2e jobs yesterday, and
those seem to work well. Use the same pre-pull method for the Helm
deployment.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-18 11:49:10 +00:00