Commit Graph

4327 Commits

Author SHA1 Message Date
Humble Chirammal
4ca19ad2ff e2e: reformat error message with consistent formatting
To make the error return consistent across e2e tests we have decided
 to remove with error presence from the logs and this commit
 does that for e2e/ceph_user.go.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-22 11:07:15 +00:00
Humble Chirammal
14389c7b40 e2e: reformat error message with consistent formatting
To make the error return consistent across e2e tests we have decided
to remove with error presence from the logs and this commit
does that for e2e/utils.go.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-22 11:07:15 +00:00
Humble Chirammal
5647d4da24 e2e: reformat error message with proper error formatting
To make the error return consistent across e2e tests we have decided
to remove `with error` presence from the logs and this commit
does that for cephfs tests.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-22 11:07:15 +00:00
Humble Chirammal
4785c55bb8 e2e: reformat error message with proper error formatting
To make the error return consistent across e2e tests we have decided
to remove `with error` presence from the logs and this commit
does that for rbd tests.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-22 11:07:15 +00:00
Rakshith R
8488d6bec2 ci: fix helm chart push for release branches
Currently BRANCH_NAME for release branches is not set causing
the source in helm chart to be set as
sources:
  - https://github.com/ceph/ceph-csi/tree//charts/ceph-csi-cephfs

Current change fixes it.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-11-22 07:55:59 +00:00
Humble Chirammal
aa600754c1 deploy: remove expandCSIVolumes feature gate from deployment
The `expandCSIVolumes` feature gate is beta since kubernetes 1.16
version and we no longer wanted to explictly enable it in the
deployment.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-22 06:08:23 +00:00
Humble Chirammal
e6949945bb cephfs: add validation for generic ephemeral volumes
This commit adds the validation of csi cephfs driver to work with
ephemeral volume support. With ephemeral volume support a user can
specify ephemeral volumes in its pod spec and tie the lifecycle
of the PVC with the POD.

An example POD spec also included in this commit.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-22 06:08:23 +00:00
Humble Chirammal
df0901ddd8 rbd: add generic ephemeral volume validation
This commit adds the validation of csi RBD driver to work with
ephemeral volume support. With ephemeral volume support a user can
specify ephemeral volumes in its pod spec and tie the lifecycle
of the PVC with the POD.

An example pod spec is also included in this commit.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-22 06:08:23 +00:00
Madhu Rajanna
04e99ae2e0 ci: update minikube to v1.24.0
update minikube version to latest
available version i.e 1.24.0

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-19 17:51:00 +00:00
Yug Gupta
c339d43272 deploy: deploy erasure coded pool
deploy erasure coded pool during rook
deployment to allow usage and testing
in erasure coded pools.

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2021-11-19 14:03:21 +00:00
dependabot[bot]
ecd5d2d46c rebase: bump sigs.k8s.io/controller-runtime from 0.10.2 to 0.10.3
Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.10.2 to 0.10.3.
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases)
- [Commits](https://github.com/kubernetes-sigs/controller-runtime/compare/v0.10.2...v0.10.3)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-runtime
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-19 12:47:15 +00:00
Madhu Rajanna
5524b2d538 ci: use 1.8.5 vault for e2e
current latest vault release is 1.9.0 but
with the latest image our E2E is broken.
reverting back the vault version to 1.8.5
till we root cause the issue.

Note:- This is to unblock PR merging

updates: #2657

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-19 10:37:14 +00:00
Shyamsundar Ranganathan
d1c21eece9 rbd: Update sequence of operations on dummy mirror image
The dummy mirror image needs to be disabled and then
reenabled for mirroring, to ensure a newly promoted
primary is now starting to schedule snapshots.

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
2021-11-19 09:38:59 +05:30
Madhu Rajanna
517ad8c644 rbd: use dummy image to workaround rbd scheduling bug
currently we have a bug in rbd mirror scheduling module.
After doing failover and failback the scheduling is not
getting updated and the mirroring snapshots are not
getting created periodically as per the scheduling
interval. This PR workarounds this one by doing below
operations

* Create a dummy (unique) image per cluster and this image
should be easily identified.

* During Promote operation on any image enable the
mirroring on the dummy image. when we enable the mirroring
on the dummy image the pool will get updated and the
scheduling will be reconfigured.

* During Demote operation on any image disable the mirroring
on the dummy image. the disable need to be done to enable
the mirroring again when we get the promote request to make
the image as primary

* When the DR is no more needed, this image need to be
manually cleanup as for now as we dont want to add a check
in the existing DeleteVolume code path for delete dummy image
as it impact the performance of the DeleteVolume workflow.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-19 09:38:59 +05:30
Madhu Rajanna
d05fc1e8e5 util: add helper to get the cluster ID
added helper function to get the cluster ID.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-19 09:38:59 +05:30
Madhu Rajanna
e4e0f397a6 rbd: run schedule during promote operation
Moved to add scheduling to the promote
operation as scheduling need to be added
when the image is promoted and this is
the correct method of adding the scheduling
to make the scheduling take place.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-19 09:38:59 +05:30
dependabot[bot]
7125df23c1 rebase: bump k8s.io/kubernetes from 1.22.2 to 1.22.3
Bumps [k8s.io/kubernetes](https://github.com/kubernetes/kubernetes) from 1.22.2 to 1.22.3.
- [Release notes](https://github.com/kubernetes/kubernetes/releases)
- [Commits](https://github.com/kubernetes/kubernetes/compare/v1.22.2...v1.22.3)

---
updated-dependencies:
- dependency-name: k8s.io/kubernetes
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-18 13:01:32 +00:00
Madhu Rajanna
7bbd2ea284 rbd: use small case of error message
the error message should not start with
the capital letter changing the case as
per the standard.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-18 10:44:12 +00:00
Madhu Rajanna
51998a5f4a cleanup: log the image name and pool name
instead of logging the volumeID and the pool
name. log the poolname and image name for better
debugging.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-18 10:44:12 +00:00
Niels de Vos
5c59a89b02 ci: add actions/retest to dependabot checks
Adding actions/retest to the dependabot configuration makes sure all
vendored packages will get updated when new releases are available.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-18 07:52:29 +00:00
Humble Chirammal
aef3cc0c3c e2e: remove 1.15 based test enablement in cephfs
Considering we are far out of these release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-18 05:04:44 +00:00
Humble Chirammal
0c5be6d12d e2e: remove 1.16 based test enablement in cephfs
Considering we are far out of these release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-18 05:04:44 +00:00
Humble Chirammal
7090a18141 e2e: remove 1.17 based test enablement in cephfs
Considering we are far out of these release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-18 05:04:44 +00:00
Humble Chirammal
2ac3f129c0 e2e: remove 1.15 based test enablement in rbd
considering we are far out of this release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-18 05:04:44 +00:00
Humble Chirammal
1354cfbf03 e2e: remove 1.16 based test enablement in rbd
considering we are far out of this release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-18 05:04:44 +00:00
Humble Chirammal
c03969fa65 e2e: remove 1.17 based test enablement in rbd
considering we are far out of this release and only care about
kubernetes releases from v1.20, there is no need to have this
version check in place for the tests.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-18 05:04:44 +00:00
Madhu Rajanna
0f0cda49a7 rbd: log stdError for cryptosetup command
If we hit any error while running the cryptosetup
commands we are logging only the error message.
with only error message it is difficult to analyze
the problem, logging the stdError will help us to
check what is the problem.

updates: #2610

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-18 02:17:15 +00:00
Niels de Vos
7e22180125 rbd: call undoStagingTransaction() when NodeStageVolume() fails
On line 341 a `transaction` is created. This is passed to the deferred
`undoStagingTransaction()` function when an error in the
`NodeStageVolume` procedure is detected. So far, so good.

However, on line 356 a new `transaction` is returned. This new
`transaction` is not used for the defer call.

By removing the empty `transaction` that is used in the defer call, and
calling `undoStagingTransaction()` on an error of `stageTransaction()`,
the code is a little simpler, and the cleanup of the transaction should
be done correctly now.

Updates: #2610
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-17 23:58:00 +00:00
dependabot[bot]
50832f4f06 rebase: bump github.com/onsi/gomega from 1.16.0 to 1.17.0
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.16.0 to 1.17.0.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.16.0...v1.17.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-17 21:37:48 +00:00
dependabot[bot]
335c945d97 rebase: bump google.golang.org/grpc from 1.41.0 to 1.42.0
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.41.0 to 1.42.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.41.0...v1.42.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-17 18:47:04 +00:00
Niels de Vos
fac3ef01c6 build: use golang:1.16 as runtime container for retest action
It seems that building the `retest` action makes it consume shared
libraries that are not part of the `scratch` base container layer. By
using the golang:1.16 container image as a base, all required shared
libraries are available.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-17 14:36:13 +00:00
Niels de Vos
1fa8939e84 e2e: retry when a "transport is closing" error is hit
There have been occasional CI job failures due to "transport is closing"
errors. Adding this error to the isRetryableAPIError() function should
make sure to retry the request until the connection is restored.

Fixes: #2613
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-17 14:07:07 +00:00
Niels de Vos
1f650e1204 rebase: replace vendored layeh.com/radius with GitHub source
The webserver at layeh.com seems to be misbehaving, which causes `go mod
verify` to fail. The layeh.com/radius repository is maintained on
GitHub, so the sources can be vendored/verified from there too.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-17 11:51:30 +00:00
Madhu Rajanna
0a5bd09a61 ci: fix branch name in retest action
updated the branch name from main to
devel in retest action workflow.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-17 05:50:43 +00:00
Madhu Rajanna
b62de1376d ci: update github workflow to test docker build
updated github action to test a retest action
docker build workflow.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-17 05:50:43 +00:00
Madhu Rajanna
c4ceadd06a ci: fix docker build issue for retest
fix docker build `cannot normalize nothing`

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-17 05:50:43 +00:00
Madhu Rajanna
46c40fe5ad ci: skip shell check in vendor directory
skip shell check in vendor directories.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-16 12:03:36 +00:00
Madhu Rajanna
ec34fdd505 ci: skip codespell for retest vendor
as there are lot of spell check failures
on the vendor directory, skipping it.

skipping retest action folder as
PullRequests word is not getting ignored

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-16 12:03:36 +00:00
Madhu Rajanna
f9f465073f ci: add github action to build retest
added basic github action for
retest building.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-16 12:03:36 +00:00
Madhu Rajanna
5a53f53166 ci: add retest github action
added source code of github retest
action.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-16 12:03:36 +00:00
Madhu Rajanna
f7e7172c7b doc: add documentation for retest job
added details about retest job and the
creteria to auto retest PR.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-16 12:03:36 +00:00
Madhu Rajanna
ed6d28a1fc ci: add action to retest failed approved PR's
Adding github action to retest the failed
approved PR's. sample output is available
at https://github.com/Madhu-1/retest-action/pull/3

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-16 12:03:36 +00:00
Rakshith R
191b603974 ci: remove gh action gosec linter,since it is already part of golangci
This commit removes gosec standalone linter and related parts,
since golangci linter runs gosec linter too.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-11-16 12:29:56 +01:00
Prasanna Kumar Kalever
0bf9db822b e2e: validate encrypted image mount inside the nodeplugin
currently the mountType validation of the encrypted volume is done in
the application, we should rather validate this inside the nodeplugin
pod.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-16 10:12:46 +00:00
Prasanna Kumar Kalever
e6fa392df1 rbd: fix mapOptions passing with rbd-nbd mounter
This was a regression introduced by:
https://github.com/ceph/ceph-csi/pull/2556

Fixes: #2610
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-16 10:12:46 +00:00
Prasanna Kumar Kalever
cee6da5313 e2e: adding io-timeout for lower kernel versions
This got removed unintentionally with
https://github.com/ceph/ceph-csi/pull/2628

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-16 10:12:46 +00:00
Prasanna Kumar Kalever
c97b6432e3 e2e: restrict IO with lower version kernel at rbd-nbd tests
Currently, at "perform IO on rbd-nbd volume after nodeplugin restart"
test we are performing write on the rbd-nbd based mount after nodeplugin
restart. But due to a bug in NBD driver the writes are failing, please
note NBD zero cmd timeout handling is fixed with kernel >= 5.4 and hence
we should defend on writes based on kernel version to avoid unnecessary
CI failures.

For more information see
https://github.com/ceph/ceph-csi/issues/2204#issuecomment-930941047

updates: #2204
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-10 16:46:50 +00:00
Prasanna Kumar Kalever
50e9dfa5c5 cleanup: fix log level
This log line is seen frequently in the logs and its better to be at
Warning loglevel rather than Error based on its severity

E1109 08:30:45.612395   38328 util.go:247] kernel 4.19.202 does not support required features

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-10 10:54:29 +00:00
Humble Chirammal
5a4bf4d151 doc: add migration design documentation
This commit adds migration design doc which carry information about
the required changes and design for rbd intree to csi migration.

Fixes https://github.com/ceph/ceph-csi/issues/2596
Updates https://github.com/ceph/ceph-csi/issues/2509

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-11-09 12:06:50 +00:00
Prasanna Kumar Kalever
3686b6da8b rbd: utilize cookie support from rbd for nbd
Problem:
On remap/attach of device (i.e. nodeplugin restart), there is no way
for rbd-nbd to defend if the backend storage is matching with the initial
backend storage.

Say, if an initial map request for backend "pool1/image1" got mapped to
/dev/nbd0 and the userspace process is terminated (on nodeplugin restart).
A next remap/attach (nodeplugin start) request within reattach-timeout is
allowed to use /dev/nbd0 for a different backend "pool1/image2"

For example, an operation like below could be dangerous:

$ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4"
$ sudo pkill -15 rbd-nbd   <-- nodeplugin terminate
$ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs"

Solution:
rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify
between the device and the backend, so that when a remap/attach request is
made, rbd-nbd can compare and avoid such dangerous operations.

With the provided solution, as part of the initial map request, backend
cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so
that on a remap/attach request rbd-nbd will check and validate if the
backend per device cookie matches with the initial map backend with the help
of cookie.

At Ceph-csi we use VOLID as device cookie, which will be unique, we pass
the VOLID as cookie at map and use the same at the time of attach, that
way rbd-nbd can identify backends and their matching devices.

Requires:
https://github.com/ceph/ceph/pull/41323
https://lkml.org/lkml/2021/4/29/274

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-04 03:20:59 +00:00