Commit Graph

683 Commits

Author SHA1 Message Date
Humble Chirammal
2e8e8f5e64 rbd: fill clusterID if its a migration nodestage request
the migration nodestage request does not carry the 'clusterID' in it
and only monitors are available with the volumeContext. The volume
context flag 'migration=true' and 'static=true' flags allow us to
fill 'clusterID' from the passed in monitors to the volume Context,so
that rest of the static operations on nodestage can be proceeded as we
do treat static volumes today.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-20 09:54:54 +00:00
Humble Chirammal
1f5963919f util: get clusterID for the passed in mon string
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates https://github.com/ceph/ceph-csi/issues/2509

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-20 09:54:54 +00:00
Prasanna Kumar Kalever
c9cc36d8db rbd: provide alternatives to preserve the ceph log files
Currently, we delete the ceph client log file on unmap/detach.

This patch provides additional alternatives for users who would like to
persist the log files.

Strategies:
-----------
`remove`: delete log file on unmap/detach
`compress`: compress the log file to gzip on unmap/detach
`preserve`: preserve the log file in text format

Note that the default strategy will be remove on unmap, and these options
can be tweaked from the storage class

Compression size details example:

On Map: (with debug-rbd=20)
---------
$ ls -lh
-rw-r--r-- 1 root root 526K Sep  1 18:15
rbd-nbd-0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-d2e89c87-0b4d-11ec-8ea6-160f128e682d.log

On unmap:
---------
$ ls -lh
-rw-r--r-- 1 root root  33K Sep  1 18:15
rbd-nbd-0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-d2e89c87-0b4d-11ec-8ea6-160f128e682d.gz

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-09-16 13:55:15 +00:00
Prasanna Kumar Kalever
10bbb049f7 cleanup: passing pointers to larger type
Log:
internal/rbd/rbd_attach.go:424:2: hugeParam: dArgs is heavy (88 bytes);
consider passing it by pointer (gocritic)

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-09-16 13:55:15 +00:00
Prasanna Kumar Kalever
ad2c6d2851 util: add gzip helper function
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-09-16 13:55:15 +00:00
Shyamsundar Ranganathan
47dc9cf28d rbd: Report errors when a resync maybe in progress
Currently we return a !ready status if an image
is not found when a replication resync is issued.

We also return a !ready just post issuing a resync.

The change is to ensure we return errors in these
cases for the caller to retry the operation till
we can determine we are actually resyncing, and then
return !ready with nil errors.

Part of addressing:
  https://github.com/csi-addons/volume-replication-operator/issues/101

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
2021-09-15 15:59:22 +00:00
Rakshith R
82d09d81cf util: modify GetMonsAndClusterID() to take clusterID instead of options
This commit:
- modifies GetMonsAndClusterID() to take clusterID instead of options.
- moves out validation of clusterID is set or not out of GetMonsAndClusterID().
- defines ErrClusterIDNotSet new error for reusability.
- add GetClusterID() to obtain clusterID from options.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-14 08:39:57 +00:00
Rakshith R
9d1e98ca60 rbd: check for clusterid mapping in genVolFromVolumeOptions()
This commit adds capability to genVolFromVolumeOptions() to fetch
mapped clusted-id & mon ips for mirrored PVC on secondary cluster
which may have different cluster-id.

This is required for NodeStageVolume().

We also don't need to check for mapping during volume create requests,
so it can be disabled by passing a bool checkClusterIDMapping as false.

GetMonsAndClusterID() is modified to accept bool checkClusterIDMapping
based on which clustermapping is checked to fetch mapped cluster-id and
mon-ips.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-14 08:39:57 +00:00
Humble Chirammal
4be53a27d3 cleanup: replace parentName to snapParentName in checkReservation
at present, eventhough the checkReservation works for both volume
and snapshot, the arg parentName make sense only for snapshot cases
renaming that arg to more approprite

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-14 05:32:54 +00:00
Humble Chirammal
1fee3ec460 cleanup: correct checkReservation return description
it wrongly mention that the return is imageUUID string where actually
it is the imageData struct

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-14 05:32:54 +00:00
Rakshith R
0a7a7f4866 util: call WriteCephConfig() in cephcsi.go
This commit calls WriteCephConfig() in cephcsi.go to
create ceph.conf and keyring if it is not mounted to
be used by all cli calls and conn cmds.

Before this change, rbd-controller/omap-generator did not create
ceph.conf on startup.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-08 16:05:27 +00:00
Madhu Rajanna
8c8f34cf7a rbd: set vaultAuthNamespace to vaultNamespace if empty
When we read the csi-kms-connection-details configmap
vaultAuthNamespace might not be set when we do the
conversion the vaultAuthNamespace might be set to empty
key and this commits check for the empty value of
vaultAuthNamespace and set the vaultAuthNamespace
to vaultNamespace.

setting empty value for vaultAuthNamespace happened due
to Marshalling at https://github.com/ceph/ceph-csi/blob/devel/
internal/kms/vault_tokens.go#L136-L139.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-08 11:18:03 +00:00
Rakshith R
e99dd3dea4 util: read ceph.conf by calling conn.ReadConfigFile(CephConfigPath)
The configurations in cpeh.conf is not picked up by rados connection
automatically, hence we need to call conn.ReadConfigFile before calling
Connect().

Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-07 16:50:12 +00:00
Madhu Rajanna
76f1b42498 cephfs: correct comment for validateExpandVolumeRequest
corrected the function comment for
validateExpandVolumeRequest.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
9fd51d9bec cephfs: add comment for validateCreateVolumeRequest
added function comment for
validateCreateVolumeRequest

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
8caeb409bb cephfs: add comment for validateDeleteVolumeRequest
added function comment for the
validateDeleteVolumeRequest function.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
be7749c90e cleanup: move volumeID to the volumeoptions
volumeID can be moved to the volumeOptions
as most of the volume related helper functions
are available on the volumeoptions.go

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
da70ed50dc cleanup: move execCommandErr to volumemounter
Moved execCommandErr to the volumemounter.go
which is the only caller of this function and
moving the execCommandErr helps in reducing the
util file.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
31696a6ce0 cleanup: move genSnapFromOptions to volumeoptions
moved genSnapFromOptions function to volumeoptions.go
which is more appropriated than util.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
73e2ffe8b8 cleanup: move cephfs csi spec validation to validator
moved the cephfs related validation like
validating the input parameters sent in the
GRPC request to a new file.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Humble Chirammal
4efcc5bf97 cleanup: simplify checkStaticVolume function and remove unwanted vars
checkStaticVolume() in the reconcilePV function has been unwantedly
introducing variables to confirm the pv spec is static or not. This
patch simplify it and make a smaller footprint of the functions.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-07 12:51:30 +00:00
Humble Chirammal
df2d9548ae cephfs: no need to check for zero volume size
At present there is a 'todo' to check for zero volume size
in the createVolume request which in unwanted, ie the pvc
creation with size 0 fail from the kubernetes api validation itself:

For ex:

```
..spec.resources[storage]: Invalid value: "0": must be greater than zero```
```
so we dont need any extra check in the controller server

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-07 04:49:24 +00:00
Prasanna Kumar Kalever
9e55f015de rbd: avoid supplying map options on unmap
Thanks to the random unmap failure on my local machine:

I0901 17:08:37.841890 2617035 cephcmds.go:55] ID: 11 Req-ID:
0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-024983f3-0b47-11ec-8fcb-e671f0b9f58e
an error (exit status 22) occurred while running rbd args: [unmap
rbd-pool/csi-vol-024983f3-0b47-11ec-8fcb-e671f0b9f58e --device-type nbd
--options try-netlink --options reattach-timeout=300 --options
io-timeout=0]

Noticed the map args are also getting passed to/as unmap args, which is not
correct. We have separate things for mapOptions and unmapOptions. This PR
makes sure that the map args are not passed at the time of unmap.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-09-06 15:59:30 +00:00
Humble Chirammal
3f31ca8a3a cleanup: introduce populateVolOptions(), to fill rbdVol from stage req
At present the nodeStageVolume() handle many logic of filling rbdvol
struct based on the request received and this method is complex to
follow. with this patch, filling or populating volOptions has been
segregrated and handled hence make the stage functions' job easy.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-06 07:49:03 +00:00
Humble Chirammal
f0b8a3f626 rbd: use String() method of MirrorImageState in return error
MirrorImageState (type C.rbd_mirror_image_state_t) has a string
method which can be used while returning error in the replication
controller. Previously, we were using int return in the error which
is not the proper usage.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-03 16:02:53 +00:00
Madhu Rajanna
4865061ab9 util: create ceph configuration files if not present
create ceph.conf and keyring files if its not
present in the /et/ceph/ path.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-03 14:14:43 +00:00
Humble Chirammal
1d94c12cd6 cleanup: add checkErrAndUndoReserve() for error check,unreserve omap
all the error check scenarios of genVolFromVolID() and unreserving
omap entries based on the error made deleteVolume method complex,
this patch create a new function which handle the error check and
unrerving omap entries accordingly and finally return the response
to deletevolume/caller.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-03 12:20:04 +00:00
Niels de Vos
60c2afbcca util: NewK8sClient() should not panic on non-Kubernetes clusters
When NewK8sClient() detects and error, it used to call FatalLogMsg()
which causes a panic. There are additional features that can be used on
Kubernetes clusters, but these are not a requirement for most
functionalities of the driver.

Instead of causing a panic, returning an error should suffice. This
allows using the driver on non-Kubernetes clusters again.

Fixes: #2452
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-09-02 11:22:14 +00:00
Humble Chirammal
247795517f cephfs: remove explicit size setting of cloned volume
CephFS csi driver explictly set the size of the cloned volume
to the size of parent volume as cephfs mgr was lacking this
functionality previously. However it has been addressed in cephfs
so we dont need explicit size setting.

Ref#https://tracker.ceph.com/issues/46163

Supported Ceph releases:

Ceph versions equal or above - v16.0.0, v15.2.9, v14.2.12

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-01 09:32:29 +00:00
Madhu Rajanna
b383af20b4 cleanup: move cephfs errors to new util package
As part of the refactoring, moving the cephfs errors file to a new
package.

Updates: #852
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-01 06:50:16 +00:00
Rakshith R
99168dc822 rbd: check for clusterid mapping in RegenerateJournal()
This commit adds fetchMappedClusterIDAndMons() which returns
monitors and clusterID info after checking cluster mapping info.
This is required for regenerating omap entries in mirrored cluster
with different clusterID.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-31 14:30:06 +00:00
Rakshith R
496bcba85c rbd: move GetMappedID() to util package
This commit moves getMappedID() from rbd to util
package since it is not rbd specific and exports
it from there.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-31 14:30:06 +00:00
Niels de Vos
4a3b1181ce cleanup: move KMS functionality into its own package
A new "internal/kms" package is introduced, it holds the API that can be
consumed by the RBD components.

The KMS providers are currently in the same package as the API. With
later follow-up changes the providers will be placed in their own
sub-package.

Because of the name of the package "kms", the types, functions and
structs inside the package should not be prefixed with KMS anymore:

    internal/kms/kms.go:213:6: type name will be used as kms.KMSInitializerArgs by other packages, and that stutters; consider calling this InitializerArgs (golint)

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 16:31:40 +00:00
Niels de Vos
778b5e86de cleanup: move k8s functions to the util/k8s package
By placing the NewK8sClient() function in its own package, the KMS API
can be split from the "internal/util" package. Some of the KMS providers
use the NewK8sClient() function, and this causes circular dependencies
between "internal/utils" -> "internal/kms" -> "internal/utils", which
are not alowed in Go.

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 16:31:40 +00:00
Humble Chirammal
8ea495ab81 rbd: skip volumeattachment processing if pv marked for deletion
if the volumeattachment has been fetched but marked for deletion
the nbd healer dont want to process further on this pv. This patch
adds a check for pv is marked for deletion and if so, make the
healer skip processing the same

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-26 15:04:19 +00:00
Niels de Vos
6d00b39886 cleanup: move log functions to new internal/util/log package
Moving the log functions into its own internal/util/log package makes it
possible to split out the humongous internal/util packages in further
smaller pieces. This reduces the inter-dependencies between utility
functions and components, preventing circular dependencies which are not
allowed in Go.

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-26 09:34:05 +00:00
Niels de Vos
68588dc7df util: fix unit-test for GetClusterMappingInfo()
Unit-testing often fails due to a race condition while writing the
clusterMappingConfigFile from multiple go-routines at the same time.
Failures from `make containerized-test` look like this:

    === CONT  TestGetClusterMappingInfo/site2-storage_cluster-id_mapping
        cluster_mapping_test.go:153: GetClusterMappingInfo() = <nil>, expected data &[{map[site1-storage:site2-storage] [map[1:3]] [map[11:5]]} {map[site3-storage:site2-storage] [map[8:3]] [map[10:5]]}]
    === CONT  TestGetClusterMappingInfo/site3-storage_cluster-id_mapping
        cluster_mapping_test.go:153: GetClusterMappingInfo() = <nil>, expected data &[{map[site3-storage:site2-storage] [map[8:3]] [map[10:5]]}]
    --- FAIL: TestGetClusterMappingInfo (0.01s)
        --- PASS: TestGetClusterMappingInfo/mapping_file_not_found (0.00s)
        --- PASS: TestGetClusterMappingInfo/mapping_file_found_with_empty_data (0.00s)
        --- PASS: TestGetClusterMappingInfo/cluster-id_mapping_not_found (0.00s)
        --- FAIL: TestGetClusterMappingInfo/site2-storage_cluster-id_mapping (0.00s)
        --- FAIL: TestGetClusterMappingInfo/site3-storage_cluster-id_mapping (0.00s)
        --- PASS: TestGetClusterMappingInfo/site1-storage_cluster-id_mapping (0.00s)

By splitting the public GetClusterMappingInfo() function into an
internal getClusterMappingInfo() that takes a filename, unit-testing can
use different files for each go-routine, and testing becomes more
predictable.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-25 16:08:48 +00:00
Prasanna Kumar Kalever
4f40213d8e rbd: fix rbd-nbd io-timeout to never abort
With the tests at CI, it kind of looks like that the IO is timing out after
30 seconds (default with rbd-nbd). Since we have tweaked reattach-timeout
to 300 seconds at ceph-csi, we need to explicitly set io-timeout on the
device too, as it doesn't make any sense to keep
io-timeout < reattach-timeout

Hence we set io-timeout for rbd nbd to 0. Specifying io-timeout 0 tells
the nbd driver to not abort the request and instead see if it can be
restarted on another socket.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Suggested-by: Ilya Dryomov <idryomov@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
3bf17ade7a doc: update code comments about available timeout options
Adding some code comments to make them readable and easy to understand.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 17:09:09 +00:00
Prasanna Kumar Kalever
ea3def0db2 rbd: remove per volume rbd-nbd logfiles on detach
- Update the meta stash with logDir details
- Use the same to remove logfile on unstage/unmap to be space efficient

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
d67e88ccd0 cleanup: embed args into struct and pass it to detachRBDImageOrDeviceSpec
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
474100c1f1 rbd: add a unit test for getCephClientLogFileName()
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Prasanna Kumar Kalever
682b3a980b rbd: rbd-nbd logging the ceph-CSI way
- One logfile per device/volume
- Add ability to customize the logdir, default: /var/log/ceph

Note: if user customizes the hostpath to something else other than default
/var/log/ceph, then it is his responsibility to update the `cephLogDir`
in storageclass to reflect the same with daemon:

```
cephLogDir: "/var/log/mynewpath"
```

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-08-24 07:15:30 +00:00
Humble Chirammal
9ac1391d0f util: correct interface name and remove redundancy
ContollerManager had a typo in it, and if we correct it,
linter  will fail and suggest not to use controller.ControllerManager
as the interface name and package name  is redundant, keeping manager
as the interface name which is the practice and also address the
linter issues.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-19 04:19:42 +00:00
Humble Chirammal
edf511a833 cephfs: make use of subvolumeInfo.state to determine quota
https://github.com/ceph/go-ceph/pull/455/ added `state` field
to subvolume info struct which helps to identify the snapshot
retention state in the caller. This patch make use of the same

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 04:50:46 +00:00
Humble Chirammal
66fa5891b2 cephfs: correct typos in cephfs driver code
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-18 04:50:46 +00:00
Humble Chirammal
5089a4ce5d doc: correct some source code comments in rbd driver code
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-17 06:57:09 +00:00
Madhu Rajanna
5562e46d0f rbd: Cleanup OMAP data for secondary image
If the image is in a secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote cluster
and the local image is getting replayed. Delete the
OMAP data generated as we cannot delete the
secondary image. When the image on the primary
cluster gets deleted/mirroring disabled, the image on
all the remote (secondary) clusters will get
auto-deleted. This helps in garbage collecting
the OMAP, PVC and PV objects after failback operation.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00
Madhu Rajanna
fc0d6f6b8b rbd: return succuss if image is healthy secondary
If the image is in secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote
cluster and the local image is getting replayed.
Return success for the Disabling mirroring as
we cannot disable the mirroring on the secondary
state, when the image on the remote site gets
disabled the image on all the remote (secondary)
will get auto deleted. This helps in garbage
collecting the volume replication kuberentes
artifacts

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00
Madhu Rajanna
35324b2e17 rbd: add helper function to get local state
added helper function to check the local image
state is up+replaying.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-08-16 17:38:25 +00:00