Commit Graph

3282 Commits

Author SHA1 Message Date
Madhu Rajanna
b383af20b4 cleanup: move cephfs errors to new util package
As part of the refactoring, moving the cephfs errors file to a new
package.

Updates: #852
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-01 06:50:16 +00:00
Humble Chirammal
aeebd5d03b doc: remove upgrade instructions for earlier releases
As we have deprecated earlier versions than v3.3.0, it is not required
to keep the upgrade docs for the same. The upgrade doc for v3.2.0 to
v3.3.0 has been kept intact.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-01 03:37:23 +00:00
dependabot[bot]
ed38157f9f rebase: bump k8s.io/klog/v2 from 2.9.0 to 2.10.0
Bumps [k8s.io/klog/v2](https://github.com/kubernetes/klog) from 2.9.0 to 2.10.0.
- [Release notes](https://github.com/kubernetes/klog/releases)
- [Changelog](https://github.com/kubernetes/klog/blob/main/RELEASE.md)
- [Commits](https://github.com/kubernetes/klog/compare/v2.9.0...v2.10.0)

---
updated-dependencies:
- dependency-name: k8s.io/klog/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-31 19:31:20 +00:00
Rakshith R
99168dc822 rbd: check for clusterid mapping in RegenerateJournal()
This commit adds fetchMappedClusterIDAndMons() which returns
monitors and clusterID info after checking cluster mapping info.
This is required for regenerating omap entries in mirrored cluster
with different clusterID.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-31 14:30:06 +00:00
Rakshith R
496bcba85c rbd: move GetMappedID() to util package
This commit moves getMappedID() from rbd to util
package since it is not rbd specific and exports
it from there.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-31 14:30:06 +00:00
Niels de Vos
e08d184984 ci: ignore k8s.io/kubernetes dependencies
These dependencies are pulled in by k8s.io/kubernetes with version
v0.0.0. It is therefore required to use 'replace' in go.mod to select a
compatible version of the additional k8s.io packages.

Dependabot does not seem to update packages listed in 'replace', only
under 'require'. That means, the version updates done by Dependabot do
not have any effect, as the contents is replaced with a different
version anyway. Ignoring these packages prevents the creation of
non-functional PRs.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-31 09:03:12 +00:00
Niels de Vos
1b30a58e53 ci: skip commitlint for dependabot PRs
Dependabot creates commit messages that commitlint fails to parse
correctly. The messages contain simple references to which dependencies
are updated, and do not need full validation like manual written
messages do.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-31 09:02:11 +02:00
Humble Chirammal
92e7f16654 ci: remove the rules for backport on unsupported versions
CSI version v3.3.0 and above are the supported versions now and this
patch also adjust the mergify rules according to that, ie the backport
rules are removed for all unsupported versions here.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-31 10:56:28 +05:30
Niels de Vos
4a3b1181ce cleanup: move KMS functionality into its own package
A new "internal/kms" package is introduced, it holds the API that can be
consumed by the RBD components.

The KMS providers are currently in the same package as the API. With
later follow-up changes the providers will be placed in their own
sub-package.

Because of the name of the package "kms", the types, functions and
structs inside the package should not be prefixed with KMS anymore:

    internal/kms/kms.go:213:6: type name will be used as kms.KMSInitializerArgs by other packages, and that stutters; consider calling this InitializerArgs (golint)

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 16:31:40 +00:00
Niels de Vos
778b5e86de cleanup: move k8s functions to the util/k8s package
By placing the NewK8sClient() function in its own package, the KMS API
can be split from the "internal/util" package. Some of the KMS providers
use the NewK8sClient() function, and this causes circular dependencies
between "internal/utils" -> "internal/kms" -> "internal/utils", which
are not alowed in Go.

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 16:31:40 +00:00
Niels de Vos
2cc96dc539 build: vendor code.cloudfoundry.org/gofileutils from GitHub
There is a problem accessing the code.cloudfoundry.org web service iver
TLS. It seems to redirect to GitHub, so use the package from there:

    running: go mod verify
    go: github.com/libopenstorage/secrets@v0.0.0-20210709082113-dde442ea20ec requires
    	github.com/hashicorp/vault@v1.4.2 requires
    	github.com/hashicorp/vault-plugin-auth-cf@v0.5.4 requires
    	github.com/cloudfoundry-community/go-cfclient@v0.0.0-20190201205600-f136f9222381 requires
    	code.cloudfoundry.org/gofileutils@v0.0.0-20170111115228-4d0c80011a0f: unrecognized import path "code.cloudfoundry.org/gofileutils": https fetch: Get "https://code.cloudfoundry.org/gofileutils?go-get=1": x509: certificate signed by unknown authority

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 13:51:49 +00:00
Niels de Vos
b2e5e0574b build: get gomodules.xyz/jsonpatch/v2 from github
The `make containerized-test TARGET=mod-check` jobs fail because the web
service at gomodules.xyz is not behaving correctly at the moment:

    running: go mod verify
    go: sigs.k8s.io/controller-runtime@v0.9.2 requires
    	gomodules.xyz/jsonpatch/v2@v2.2.0: unrecognized import path "gomodules.xyz/jsonpatch/v2": reading https://gomodules.xyz/jsonpatch/v2?go-get=1: 404 Not Found

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 13:51:49 +00:00
Niels de Vos
c17b3f69bd ci: add dependabot config for updating vendored packages
Vendored dependencies need updating on regular basis. This is currently
done manually by developers, but it can be automated by Dependabot. By
dropping the dependabot.yml config file in the .github/ directory the
bot should get enabled.

See-also: https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/enabling-and-disabling-version-updates
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 13:51:49 +00:00
Rakshith R
f9d4972444 e2e: fix log msg in retryKubectlInput()
e2elog.Logf("waiting for kubectl (%s -f $q args %s) to finish", action, args)
changed to
e2elog.Logf("waiting for kubectl (%s -f args %s) to finish", action, args)

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-27 07:14:10 +00:00
Rakshith R
dbf2eb3905 e2e: ignore lines with 'Warning' in isAlreadyExistsCliError()
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-27 07:14:10 +00:00
Humble Chirammal
8ea495ab81 rbd: skip volumeattachment processing if pv marked for deletion
if the volumeattachment has been fetched but marked for deletion
the nbd healer dont want to process further on this pv. This patch
adds a check for pv is marked for deletion and if so, make the
healer skip processing the same

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-26 15:04:19 +00:00
Humble Chirammal
3417fe86e6 doc: update support matrix for deprecated ceph csi releases
As discussed in https://github.com/ceph/ceph-csi/issues/2438
we are marking ceph csi release support to N.(x-1) release versions.

N = latest major release
x = latest minor release

This address the release version support matrix based on the
same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-26 11:39:33 +00:00
Niels de Vos
6d00b39886 cleanup: move log functions to new internal/util/log package
Moving the log functions into its own internal/util/log package makes it
possible to split out the humongous internal/util packages in further
smaller pieces. This reduces the inter-dependencies between utility
functions and components, preventing circular dependencies which are not
allowed in Go.

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-26 09:34:05 +00:00
Madhu Rajanna
2036b587d7 ci: add github workflow for stale
added github action to check for the
stale issues and PRs. the action will
get scheduled everydata at 21:00 UTC.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-26 11:00:30 +05:30
Madhu Rajanna
630798d95e ci: remove stale bot configuration
This commit removes the stale bot
configration as stale bot repo is not actively
maintained anymore.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-26 11:00:30 +05:30
Niels de Vos
68588dc7df util: fix unit-test for GetClusterMappingInfo()
Unit-testing often fails due to a race condition while writing the
clusterMappingConfigFile from multiple go-routines at the same time.
Failures from `make containerized-test` look like this:

    === CONT  TestGetClusterMappingInfo/site2-storage_cluster-id_mapping
        cluster_mapping_test.go:153: GetClusterMappingInfo() = <nil>, expected data &[{map[site1-storage:site2-storage] [map[1:3]] [map[11:5]]} {map[site3-storage:site2-storage] [map[8:3]] [map[10:5]]}]
    === CONT  TestGetClusterMappingInfo/site3-storage_cluster-id_mapping
        cluster_mapping_test.go:153: GetClusterMappingInfo() = <nil>, expected data &[{map[site3-storage:site2-storage] [map[8:3]] [map[10:5]]}]
    --- FAIL: TestGetClusterMappingInfo (0.01s)
        --- PASS: TestGetClusterMappingInfo/mapping_file_not_found (0.00s)
        --- PASS: TestGetClusterMappingInfo/mapping_file_found_with_empty_data (0.00s)
        --- PASS: TestGetClusterMappingInfo/cluster-id_mapping_not_found (0.00s)
        --- FAIL: TestGetClusterMappingInfo/site2-storage_cluster-id_mapping (0.00s)
        --- FAIL: TestGetClusterMappingInfo/site3-storage_cluster-id_mapping (0.00s)
        --- PASS: TestGetClusterMappingInfo/site1-storage_cluster-id_mapping (0.00s)

By splitting the public GetClusterMappingInfo() function into an
internal getClusterMappingInfo() that takes a filename, unit-testing can
use different files for each go-routine, and testing becomes more
predictable.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-25 16:08:48 +00:00
Madhu Rajanna
b0b46680e3 doc: update development guide for new rules
updated development guide requirement to
have review from contributors and reviewers.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-25 16:33:56 +05:30
Madhu Rajanna
0a7a490496 ci: update mergify rules to include teams
updated mergify rules to consider the teams
approval to merge a PR.

more details at #2367

fixes #2367

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-25 16:33:56 +05:30
Prasanna Kumar Kalever
55d3226d6b e2e: use io-timeout conditionally based on kernel version
We need
https://www.mail-archive.com/linux-block@vger.kernel.org/msg38060.html
inorder to use `--io-timeout=0`. This patch is part of kernel 5.4

Since minikube doesn't have a v5.4 kernel yet, lets use io-timeout value
conditionally based on kernel version at our e2e.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
1bd2d46cdb e2e: add util to get kernel version from specified container
Currently, we get the kernel version where the e2e (client) executable runs,
not the kernel version that is used by the csi-rbdplugin pod.

Add a function that run `uname -r` command from the specified container and
returns the kernel version.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Suggested-by: Niels de Vos <ndevos@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
4f40213d8e rbd: fix rbd-nbd io-timeout to never abort
With the tests at CI, it kind of looks like that the IO is timing out after
30 seconds (default with rbd-nbd). Since we have tweaked reattach-timeout
to 300 seconds at ceph-csi, we need to explicitly set io-timeout on the
device too, as it doesn't make any sense to keep
io-timeout < reattach-timeout

Hence we set io-timeout for rbd nbd to 0. Specifying io-timeout 0 tells
the nbd driver to not abort the request and instead see if it can be
restarted on another socket.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Suggested-by: Ilya Dryomov <idryomov@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
3bf17ade7a doc: update code comments about available timeout options
Adding some code comments to make them readable and easy to understand.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
7576bf400c doc: update rbd-nbd doc about log path details
Document the changes needed for configuring custom logging path

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
473adf99fc deploy: provide variable to alter hostpath location for ceph clients
Also update the documentation about the same.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
ea3def0db2 rbd: remove per volume rbd-nbd logfiles on detach
- Update the meta stash with logDir details
- Use the same to remove logfile on unstage/unmap to be space efficient

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
d67e88ccd0 cleanup: embed args into struct and pass it to detachRBDImageOrDeviceSpec
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
474100c1f1 rbd: add a unit test for getCephClientLogFileName()
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
682b3a980b rbd: rbd-nbd logging the ceph-CSI way
- One logfile per device/volume
- Add ability to customize the logdir, default: /var/log/ceph

Note: if user customizes the hostpath to something else other than default
/var/log/ceph, then it is his responsibility to update the `cephLogDir`
in storageclass to reflect the same with daemon:

```
cephLogDir: "/var/log/mynewpath"
```

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
0be7024726 rbd: provide host-path for rbd-nbd logging
Problem:
--------
1. rbd-nbd by default logs to /var/log/ceph/ceph-client.admin.log,
Unfortunately, container doesn't have /var/log/ceph directory hence
rbd-nbd is not logging now.
2. Rbd-nbd logs are not persistent across nodeplugin restarts.

Solution:
--------
Provide a host path so that log directory is made available, and the
logs persist on the hostnode across container restarts.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
18f4a51a15 e2e: improve the debug logs for rbd-nbd
Ceph’s logging levels operate on a scale of 1 to 20, where 1 is terse
and 20 is verbose.

Format:
debug-{subsystem} = {log-level}

Setting `rbd` loglevel to 20 at our e2e tests.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-19 20:16:24 +00:00
Rakshith R
7da796dfc1 doc: add Github release badge to README.md
Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-19 11:48:34 +00:00
Niels de Vos
7d04cf4fe7 doc: add tickgit TODO badge
There are TODO and FIXME comments in the Ceph-CSI source code that need
addressing at one point. Adding this TODO badge and link to tickgit to
the main README makes it obvious that some cleanup is needed.

This might invite new contributors to address reported TODOs.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-19 09:37:20 +00:00
Niels de Vos
8447a1feab cleanup: address pylint "consider-using-with" in tracevol.py
pylint started to report errors like the following:

    troubleshooting/tools/tracevol.py:97:10: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)

There probably has been an update of Pylint in the test-container that
is more strict than previous versions.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-19 09:06:17 +00:00
Humble Chirammal
9ac1391d0f util: correct interface name and remove redundancy
ContollerManager had a typo in it, and if we correct it,
linter  will fail and suggest not to use controller.ControllerManager
as the interface name and package name  is redundant, keeping manager
as the interface name which is the practice and also address the
linter issues.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-19 04:19:42 +00:00
Humble Chirammal
763387c8e2 rebase: update external-resizer to v1.3.0 release
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 17:05:22 +00:00
Humble Chirammal
e65fbe9862 rebase: make use of v0.10.0 of csi-lib-utils
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 17:05:22 +00:00
Humble Chirammal
edf511a833 cephfs: make use of subvolumeInfo.state to determine quota
https://github.com/ceph/go-ceph/pull/455/ added `state` field
to subvolume info struct which helps to identify the snapshot
retention state in the caller. This patch make use of the same

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 04:50:46 +00:00
Humble Chirammal
66fa5891b2 cephfs: correct typos in cephfs driver code
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 04:50:46 +00:00
Humble Chirammal
ca7809099d rebase: update external-snapshotter client to v4.2.0
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 10:58:52 +00:00
Humble Chirammal
68bbd58045 rebase: update sidecars to latest versions
external-provisioner: v2.3.0
external-attacher: v3.3.0
external-snapshotter: v4.2.0
node-driver-registrar: v2.3.0

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 10:58:52 +00:00
Humble Chirammal
5089a4ce5d doc: correct some source code comments in rbd driver code
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 06:57:09 +00:00
Humble Chirammal
7c2cbf473c doc: update readme for 3.4.0 release
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 06:57:09 +00:00
Madhu Rajanna
5562e46d0f rbd: Cleanup OMAP data for secondary image
If the image is in a secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote cluster
and the local image is getting replayed. Delete the
OMAP data generated as we cannot delete the
secondary image. When the image on the primary
cluster gets deleted/mirroring disabled, the image on
all the remote (secondary) clusters will get
auto-deleted. This helps in garbage collecting
the OMAP, PVC and PV objects after failback operation.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00
Madhu Rajanna
fc0d6f6b8b rbd: return succuss if image is healthy secondary
If the image is in secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote
cluster and the local image is getting replayed.
Return success for the Disabling mirroring as
we cannot disable the mirroring on the secondary
state, when the image on the remote site gets
disabled the image on all the remote (secondary)
will get auto deleted. This helps in garbage
collecting the volume replication kuberentes
artifacts

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00
Madhu Rajanna
35324b2e17 rbd: add helper function to get local state
added helper function to check the local image
state is up+replaying.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00