Commit Graph

3467 Commits

Author SHA1 Message Date
Humble Chirammal
8ea495ab81 rbd: skip volumeattachment processing if pv marked for deletion
if the volumeattachment has been fetched but marked for deletion
the nbd healer dont want to process further on this pv. This patch
adds a check for pv is marked for deletion and if so, make the
healer skip processing the same

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-26 15:04:19 +00:00
Humble Chirammal
3417fe86e6 doc: update support matrix for deprecated ceph csi releases
As discussed in https://github.com/ceph/ceph-csi/issues/2438
we are marking ceph csi release support to N.(x-1) release versions.

N = latest major release
x = latest minor release

This address the release version support matrix based on the
same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-26 11:39:33 +00:00
Niels de Vos
6d00b39886 cleanup: move log functions to new internal/util/log package
Moving the log functions into its own internal/util/log package makes it
possible to split out the humongous internal/util packages in further
smaller pieces. This reduces the inter-dependencies between utility
functions and components, preventing circular dependencies which are not
allowed in Go.

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-26 09:34:05 +00:00
Madhu Rajanna
2036b587d7 ci: add github workflow for stale
added github action to check for the
stale issues and PRs. the action will
get scheduled everydata at 21:00 UTC.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-26 11:00:30 +05:30
Madhu Rajanna
630798d95e ci: remove stale bot configuration
This commit removes the stale bot
configration as stale bot repo is not actively
maintained anymore.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-26 11:00:30 +05:30
Niels de Vos
68588dc7df util: fix unit-test for GetClusterMappingInfo()
Unit-testing often fails due to a race condition while writing the
clusterMappingConfigFile from multiple go-routines at the same time.
Failures from `make containerized-test` look like this:

    === CONT  TestGetClusterMappingInfo/site2-storage_cluster-id_mapping
        cluster_mapping_test.go:153: GetClusterMappingInfo() = <nil>, expected data &[{map[site1-storage:site2-storage] [map[1:3]] [map[11:5]]} {map[site3-storage:site2-storage] [map[8:3]] [map[10:5]]}]
    === CONT  TestGetClusterMappingInfo/site3-storage_cluster-id_mapping
        cluster_mapping_test.go:153: GetClusterMappingInfo() = <nil>, expected data &[{map[site3-storage:site2-storage] [map[8:3]] [map[10:5]]}]
    --- FAIL: TestGetClusterMappingInfo (0.01s)
        --- PASS: TestGetClusterMappingInfo/mapping_file_not_found (0.00s)
        --- PASS: TestGetClusterMappingInfo/mapping_file_found_with_empty_data (0.00s)
        --- PASS: TestGetClusterMappingInfo/cluster-id_mapping_not_found (0.00s)
        --- FAIL: TestGetClusterMappingInfo/site2-storage_cluster-id_mapping (0.00s)
        --- FAIL: TestGetClusterMappingInfo/site3-storage_cluster-id_mapping (0.00s)
        --- PASS: TestGetClusterMappingInfo/site1-storage_cluster-id_mapping (0.00s)

By splitting the public GetClusterMappingInfo() function into an
internal getClusterMappingInfo() that takes a filename, unit-testing can
use different files for each go-routine, and testing becomes more
predictable.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-25 16:08:48 +00:00
Madhu Rajanna
b0b46680e3 doc: update development guide for new rules
updated development guide requirement to
have review from contributors and reviewers.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-25 16:33:56 +05:30
Madhu Rajanna
0a7a490496 ci: update mergify rules to include teams
updated mergify rules to consider the teams
approval to merge a PR.

more details at #2367

fixes #2367

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-25 16:33:56 +05:30
Prasanna Kumar Kalever
55d3226d6b e2e: use io-timeout conditionally based on kernel version
We need
https://www.mail-archive.com/linux-block@vger.kernel.org/msg38060.html
inorder to use `--io-timeout=0`. This patch is part of kernel 5.4

Since minikube doesn't have a v5.4 kernel yet, lets use io-timeout value
conditionally based on kernel version at our e2e.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
1bd2d46cdb e2e: add util to get kernel version from specified container
Currently, we get the kernel version where the e2e (client) executable runs,
not the kernel version that is used by the csi-rbdplugin pod.

Add a function that run `uname -r` command from the specified container and
returns the kernel version.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Suggested-by: Niels de Vos <ndevos@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
4f40213d8e rbd: fix rbd-nbd io-timeout to never abort
With the tests at CI, it kind of looks like that the IO is timing out after
30 seconds (default with rbd-nbd). Since we have tweaked reattach-timeout
to 300 seconds at ceph-csi, we need to explicitly set io-timeout on the
device too, as it doesn't make any sense to keep
io-timeout < reattach-timeout

Hence we set io-timeout for rbd nbd to 0. Specifying io-timeout 0 tells
the nbd driver to not abort the request and instead see if it can be
restarted on another socket.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Suggested-by: Ilya Dryomov <idryomov@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
3bf17ade7a doc: update code comments about available timeout options
Adding some code comments to make them readable and easy to understand.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
7576bf400c doc: update rbd-nbd doc about log path details
Document the changes needed for configuring custom logging path

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
473adf99fc deploy: provide variable to alter hostpath location for ceph clients
Also update the documentation about the same.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
ea3def0db2 rbd: remove per volume rbd-nbd logfiles on detach
- Update the meta stash with logDir details
- Use the same to remove logfile on unstage/unmap to be space efficient

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
d67e88ccd0 cleanup: embed args into struct and pass it to detachRBDImageOrDeviceSpec
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
474100c1f1 rbd: add a unit test for getCephClientLogFileName()
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
682b3a980b rbd: rbd-nbd logging the ceph-CSI way
- One logfile per device/volume
- Add ability to customize the logdir, default: /var/log/ceph

Note: if user customizes the hostpath to something else other than default
/var/log/ceph, then it is his responsibility to update the `cephLogDir`
in storageclass to reflect the same with daemon:

```
cephLogDir: "/var/log/mynewpath"
```

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
0be7024726 rbd: provide host-path for rbd-nbd logging
Problem:
--------
1. rbd-nbd by default logs to /var/log/ceph/ceph-client.admin.log,
Unfortunately, container doesn't have /var/log/ceph directory hence
rbd-nbd is not logging now.
2. Rbd-nbd logs are not persistent across nodeplugin restarts.

Solution:
--------
Provide a host path so that log directory is made available, and the
logs persist on the hostnode across container restarts.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
18f4a51a15 e2e: improve the debug logs for rbd-nbd
Ceph’s logging levels operate on a scale of 1 to 20, where 1 is terse
and 20 is verbose.

Format:
debug-{subsystem} = {log-level}

Setting `rbd` loglevel to 20 at our e2e tests.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-19 20:16:24 +00:00
Rakshith R
7da796dfc1 doc: add Github release badge to README.md
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-19 11:48:34 +00:00
Niels de Vos
7d04cf4fe7 doc: add tickgit TODO badge
There are TODO and FIXME comments in the Ceph-CSI source code that need
addressing at one point. Adding this TODO badge and link to tickgit to
the main README makes it obvious that some cleanup is needed.

This might invite new contributors to address reported TODOs.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-19 09:37:20 +00:00
Niels de Vos
8447a1feab cleanup: address pylint "consider-using-with" in tracevol.py
pylint started to report errors like the following:

    troubleshooting/tools/tracevol.py:97:10: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)

There probably has been an update of Pylint in the test-container that
is more strict than previous versions.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-19 09:06:17 +00:00
Humble Chirammal
9ac1391d0f util: correct interface name and remove redundancy
ContollerManager had a typo in it, and if we correct it,
linter  will fail and suggest not to use controller.ControllerManager
as the interface name and package name  is redundant, keeping manager
as the interface name which is the practice and also address the
linter issues.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-19 04:19:42 +00:00
Humble Chirammal
763387c8e2 rebase: update external-resizer to v1.3.0 release
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 17:05:22 +00:00
Humble Chirammal
e65fbe9862 rebase: make use of v0.10.0 of csi-lib-utils
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 17:05:22 +00:00
Humble Chirammal
edf511a833 cephfs: make use of subvolumeInfo.state to determine quota
https://github.com/ceph/go-ceph/pull/455/ added `state` field
to subvolume info struct which helps to identify the snapshot
retention state in the caller. This patch make use of the same

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 04:50:46 +00:00
Humble Chirammal
66fa5891b2 cephfs: correct typos in cephfs driver code
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 04:50:46 +00:00
Humble Chirammal
ca7809099d rebase: update external-snapshotter client to v4.2.0
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 10:58:52 +00:00
Humble Chirammal
68bbd58045 rebase: update sidecars to latest versions
external-provisioner: v2.3.0
external-attacher: v3.3.0
external-snapshotter: v4.2.0
node-driver-registrar: v2.3.0

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 10:58:52 +00:00
Humble Chirammal
5089a4ce5d doc: correct some source code comments in rbd driver code
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 06:57:09 +00:00
Humble Chirammal
7c2cbf473c doc: update readme for 3.4.0 release
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 06:57:09 +00:00
Madhu Rajanna
5562e46d0f rbd: Cleanup OMAP data for secondary image
If the image is in a secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote cluster
and the local image is getting replayed. Delete the
OMAP data generated as we cannot delete the
secondary image. When the image on the primary
cluster gets deleted/mirroring disabled, the image on
all the remote (secondary) clusters will get
auto-deleted. This helps in garbage collecting
the OMAP, PVC and PV objects after failback operation.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00
Madhu Rajanna
fc0d6f6b8b rbd: return succuss if image is healthy secondary
If the image is in secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote
cluster and the local image is getting replayed.
Return success for the Disabling mirroring as
we cannot disable the mirroring on the secondary
state, when the image on the remote site gets
disabled the image on all the remote (secondary)
will get auto deleted. This helps in garbage
collecting the volume replication kuberentes
artifacts

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00
Madhu Rajanna
35324b2e17 rbd: add helper function to get local state
added helper function to check the local image
state is up+replaying.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00
Humble Chirammal
3462cd9bbd helm: correct the groupVersion of CSIDriver in the chart
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-16 15:21:27 +00:00
Humble Chirammal
8e00c2c810 helm: correct watch verb in topology RBAC
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-16 15:21:27 +00:00
Rakshith R
2bd6b669fa ci: add csi sidecar version info to build.env
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-16 11:17:09 +00:00
Rakshith R
dc8479f5ad ci: make csi sidecar image version configurable in minikube.sh
This commit makes csi sidecar image version configurable in
minikube.sh.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-16 11:17:09 +00:00
Yug Gupta
bc18732cb7 ci: require job for k8s v1.22 in place of v1.19
As kubernetes v1.19 is heading towards its EOL
on 2021-09-30, run tests on kubernetes v1.22
and require it to pass for merging.

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2021-08-12 18:12:27 +05:30
Humble Chirammal
56ac143450 rebase: update go-ceph version to v0.11.0
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-12 12:42:20 +00:00
Humble Chirammal
87beaac25b rbd: add ReadWriteOncePod in accessModeStrToInt() conversion function
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-12 09:55:50 +00:00
Humble Chirammal
aa698bc3e1 rebase: update kubernetes and libraries to v1.22.0 version
Kubernetes v1.22 version has been released and this update
ceph csi dependencies to use the same version.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-12 09:55:50 +00:00
Yati Padia
e077c1fdf5 cleanup: run codespell on containerized testing
This commit adds a new target codespell to the
make containerized-test.

Fixes: #2229

Signed-off-by: Yati Padia <ypadia@redhat.com>
2021-08-12 09:42:54 +05:30
Rakshith R
7fba62dd47 ci: internally create & delete cephcsi namespace in install-helm.sh
This ensures the kubectl call is retried with kubectl_retry function.

Updates: #2309

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-11 08:42:21 +00:00
Rakshith R
eb8c1cd5ab ci: use kubectl_retry in install_helm.sh script
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-11 08:42:21 +00:00
Rakshith R
2b19197e2f ci: modify kubectl_retry() to handle NotFound on delete cmd
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-11 08:42:21 +00:00
Rakshith R
a15892a87a ci: move kubectl_retry() to utils.sh to be able to import it
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-11 08:42:21 +00:00
Prasanna Kumar Kalever
2723353f8d e2e: add testcase for encrypted volume with rbd-nbd mounter
Fixes: #2235

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-11 04:10:34 +00:00
Prasanna Kumar Kalever
396ab1b4d7 doc: update rbd-nbd documentation with encryption volume support details
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-11 04:10:34 +00:00