Commit Graph

695 Commits

Author SHA1 Message Date
Humble Chirammal
c584fa20da rbd: use clusterID from volumeContext at nodestage
previously we were retriving clusterID using the monitors field
in the volume context at node stage code path. however it is possible to
retrieve or use clusterID directly from the volume context. This
commit also remove the getClusterIDFromMigrationVolume() function
which was used previously and its tests

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-11 10:06:30 +00:00
Humble Chirammal
4e61156dc4 rbd: change iteration variable name in the migration test to be specific
we reuse or overload the variable name in the test execution at present.
This commit use a different variable name as initialized in each run

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-11 10:06:30 +00:00
Madhu Rajanna
90ecd2d7e8 rbd: use go-ceph to get mirroring info
use go-ceph api to get image mirroring info.

closes #2558

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-07 08:02:06 +00:00
Madhu Rajanna
8ebc0659ab rbd: perform resize of file system for static volume
For static volume, the user will manually mounts
already existing image as a volume to the application
pods. As its a rbd Image, if the PVC is of type
fileSystem the image will be mapped, formatted
and mounted on the node,
If the user resizes the image on the ceph cluster.
User cannot not automatically resize the filesystem
created on the rbd image. Even if deletes and
recreates the kubernetes objects, the new size
will not be visible on the node.

With this changes During the NodeStageVolumeRequest
the nodeplugin will check the size of the mapped rbd
image on the node using the devicePath. and also
the rbd image size on the ceph cluster.

If the size is not matching it will do the file
system resize on the node as part of the
NodeStageVolumeRequest RPC call.

The user need to do below operation to see new size
* Resize the rbd image in ceph cluster
* Scale down all the application pods using the static
PVC.
* Make sure no application pods which are using the
static PVC is running on a node.
* Scale up all the application pods.

Validate the new size in application pod mounted
volume.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-06 13:15:00 +00:00
Madhu Rajanna
fe9020260d rbd: move flattening to helper function
in NodeStage operation we are flattening
the image to support mounting on the older
clients. this commits moves it to a helper
function to reduce code complexity.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-06 13:15:00 +00:00
Madhu Rajanna
cda2abca5d rbd: use NewMetricsBlock to get size
instead of lsblk command use NewMetricsBlock
function from the kubernetes package to get
the size.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-06 13:15:00 +00:00
Rakshith R
ded75eb099 rbd: copyEncryptionConfig for thickProvisioned snap restore too
This commit adds bugfix to copy encryption passphrase for thick
provisioned PVC restored from snapshot.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-05 07:46:57 +00:00
Rakshith R
59b7a26175 rbd: modify copyEncryptionConfig to accept copyOnlyPassphrase arg
During PVC snapshot/clone both kms config and passphrase needs to copied,
while for PVC restore only passphrase needs to be copied to dest rbdvol
since destination storageclass may have another kms config.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-05 07:46:57 +00:00
Humble Chirammal
3c9d7e3cd5 rbd: detect migration volID in DeleteVolume() and delete rbd image
This commit adds the logic to detect a passed in volumeID
is a migrated volume ID and if yes, the driver connect to the
backend cluster and clean/delete the image. The logic
only applied if its a migration volume ID. The migration volume ID
carry the information like mons, pool and image name which is
good enough for the driver to identify and connect to the backend
cluster for its operations.

migration volID format:
<mig>_mons-<monsHash>_image-<imageUID>_<poolHash>

Details on the hash values:

* MonsHash: this carry a hash value (md5sum) which will be acted as the
`clusterID` for the operations in this context.

* ImageUID: this is the unique UUID generated by kubernetes for the created
volume.

* PoolHash: this is an encoded string of pool name.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-04 16:06:31 +00:00
Madhu Rajanna
34a21cdbe3 cleanup: move mount functions to new pkg
moved fuse and kernel mount functions
to a new package.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-23 06:39:37 +00:00
Madhu Rajanna
b1ef842640 cleanup: move core functions to core pkg
as we are refractoring the cephfs code,
Moving all the core functions to a new folder
/pkg called core. This will make things easier
to implement. For now onwards all the core
functionalities will be added to the core
package.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-23 06:39:37 +00:00
Humble Chirammal
4804f47b18 e2e: Add e2e for rbd migration static pvc
This commit adds e2e for rbd migration static PVCs

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-20 09:54:54 +00:00
Humble Chirammal
2e8e8f5e64 rbd: fill clusterID if its a migration nodestage request
the migration nodestage request does not carry the 'clusterID' in it
and only monitors are available with the volumeContext. The volume
context flag 'migration=true' and 'static=true' flags allow us to
fill 'clusterID' from the passed in monitors to the volume Context,so
that rest of the static operations on nodestage can be proceeded as we
do treat static volumes today.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-20 09:54:54 +00:00
Humble Chirammal
1f5963919f util: get clusterID for the passed in mon string
as part of migration support, the clusterID has to be fetched
from passed in mon. Because the intree RBD storage class only
got monitor and not `clusterID` parameter support. However, in
CSI, SC has the `clusterID` parameter support but not mon. Due
to that we have to fetch the clusterID from config file for the
passed in mon and use it in our operations. This adds a helper
function to retrieve clusterID from passed in mon string.

Updates https://github.com/ceph/ceph-csi/issues/2509

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-20 09:54:54 +00:00
Prasanna Kumar Kalever
c9cc36d8db rbd: provide alternatives to preserve the ceph log files
Currently, we delete the ceph client log file on unmap/detach.

This patch provides additional alternatives for users who would like to
persist the log files.

Strategies:
-----------
`remove`: delete log file on unmap/detach
`compress`: compress the log file to gzip on unmap/detach
`preserve`: preserve the log file in text format

Note that the default strategy will be remove on unmap, and these options
can be tweaked from the storage class

Compression size details example:

On Map: (with debug-rbd=20)
---------
$ ls -lh
-rw-r--r-- 1 root root 526K Sep  1 18:15
rbd-nbd-0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-d2e89c87-0b4d-11ec-8ea6-160f128e682d.log

On unmap:
---------
$ ls -lh
-rw-r--r-- 1 root root  33K Sep  1 18:15
rbd-nbd-0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-d2e89c87-0b4d-11ec-8ea6-160f128e682d.gz

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-09-16 13:55:15 +00:00
Prasanna Kumar Kalever
10bbb049f7 cleanup: passing pointers to larger type
Log:
internal/rbd/rbd_attach.go:424:2: hugeParam: dArgs is heavy (88 bytes);
consider passing it by pointer (gocritic)

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-09-16 13:55:15 +00:00
Prasanna Kumar Kalever
ad2c6d2851 util: add gzip helper function
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-09-16 13:55:15 +00:00
Shyamsundar Ranganathan
47dc9cf28d rbd: Report errors when a resync maybe in progress
Currently we return a !ready status if an image
is not found when a replication resync is issued.

We also return a !ready just post issuing a resync.

The change is to ensure we return errors in these
cases for the caller to retry the operation till
we can determine we are actually resyncing, and then
return !ready with nil errors.

Part of addressing:
  https://github.com/csi-addons/volume-replication-operator/issues/101

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
2021-09-15 15:59:22 +00:00
Rakshith R
82d09d81cf util: modify GetMonsAndClusterID() to take clusterID instead of options
This commit:
- modifies GetMonsAndClusterID() to take clusterID instead of options.
- moves out validation of clusterID is set or not out of GetMonsAndClusterID().
- defines ErrClusterIDNotSet new error for reusability.
- add GetClusterID() to obtain clusterID from options.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-14 08:39:57 +00:00
Rakshith R
9d1e98ca60 rbd: check for clusterid mapping in genVolFromVolumeOptions()
This commit adds capability to genVolFromVolumeOptions() to fetch
mapped clusted-id & mon ips for mirrored PVC on secondary cluster
which may have different cluster-id.

This is required for NodeStageVolume().

We also don't need to check for mapping during volume create requests,
so it can be disabled by passing a bool checkClusterIDMapping as false.

GetMonsAndClusterID() is modified to accept bool checkClusterIDMapping
based on which clustermapping is checked to fetch mapped cluster-id and
mon-ips.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-14 08:39:57 +00:00
Humble Chirammal
4be53a27d3 cleanup: replace parentName to snapParentName in checkReservation
at present, eventhough the checkReservation works for both volume
and snapshot, the arg parentName make sense only for snapshot cases
renaming that arg to more approprite

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-14 05:32:54 +00:00
Humble Chirammal
1fee3ec460 cleanup: correct checkReservation return description
it wrongly mention that the return is imageUUID string where actually
it is the imageData struct

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-14 05:32:54 +00:00
Rakshith R
0a7a7f4866 util: call WriteCephConfig() in cephcsi.go
This commit calls WriteCephConfig() in cephcsi.go to
create ceph.conf and keyring if it is not mounted to
be used by all cli calls and conn cmds.

Before this change, rbd-controller/omap-generator did not create
ceph.conf on startup.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-08 16:05:27 +00:00
Madhu Rajanna
8c8f34cf7a rbd: set vaultAuthNamespace to vaultNamespace if empty
When we read the csi-kms-connection-details configmap
vaultAuthNamespace might not be set when we do the
conversion the vaultAuthNamespace might be set to empty
key and this commits check for the empty value of
vaultAuthNamespace and set the vaultAuthNamespace
to vaultNamespace.

setting empty value for vaultAuthNamespace happened due
to Marshalling at https://github.com/ceph/ceph-csi/blob/devel/
internal/kms/vault_tokens.go#L136-L139.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-08 11:18:03 +00:00
Rakshith R
e99dd3dea4 util: read ceph.conf by calling conn.ReadConfigFile(CephConfigPath)
The configurations in cpeh.conf is not picked up by rados connection
automatically, hence we need to call conn.ReadConfigFile before calling
Connect().

Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-07 16:50:12 +00:00
Madhu Rajanna
76f1b42498 cephfs: correct comment for validateExpandVolumeRequest
corrected the function comment for
validateExpandVolumeRequest.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
9fd51d9bec cephfs: add comment for validateCreateVolumeRequest
added function comment for
validateCreateVolumeRequest

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
8caeb409bb cephfs: add comment for validateDeleteVolumeRequest
added function comment for the
validateDeleteVolumeRequest function.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
be7749c90e cleanup: move volumeID to the volumeoptions
volumeID can be moved to the volumeOptions
as most of the volume related helper functions
are available on the volumeoptions.go

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
da70ed50dc cleanup: move execCommandErr to volumemounter
Moved execCommandErr to the volumemounter.go
which is the only caller of this function and
moving the execCommandErr helps in reducing the
util file.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
31696a6ce0 cleanup: move genSnapFromOptions to volumeoptions
moved genSnapFromOptions function to volumeoptions.go
which is more appropriated than util.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Madhu Rajanna
73e2ffe8b8 cleanup: move cephfs csi spec validation to validator
moved the cephfs related validation like
validating the input parameters sent in the
GRPC request to a new file.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-07 14:33:02 +00:00
Humble Chirammal
4efcc5bf97 cleanup: simplify checkStaticVolume function and remove unwanted vars
checkStaticVolume() in the reconcilePV function has been unwantedly
introducing variables to confirm the pv spec is static or not. This
patch simplify it and make a smaller footprint of the functions.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-07 12:51:30 +00:00
Humble Chirammal
df2d9548ae cephfs: no need to check for zero volume size
At present there is a 'todo' to check for zero volume size
in the createVolume request which in unwanted, ie the pvc
creation with size 0 fail from the kubernetes api validation itself:

For ex:

```
..spec.resources[storage]: Invalid value: "0": must be greater than zero```
```
so we dont need any extra check in the controller server

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-07 04:49:24 +00:00
Prasanna Kumar Kalever
9e55f015de rbd: avoid supplying map options on unmap
Thanks to the random unmap failure on my local machine:

I0901 17:08:37.841890 2617035 cephcmds.go:55] ID: 11 Req-ID:
0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-024983f3-0b47-11ec-8fcb-e671f0b9f58e
an error (exit status 22) occurred while running rbd args: [unmap
rbd-pool/csi-vol-024983f3-0b47-11ec-8fcb-e671f0b9f58e --device-type nbd
--options try-netlink --options reattach-timeout=300 --options
io-timeout=0]

Noticed the map args are also getting passed to/as unmap args, which is not
correct. We have separate things for mapOptions and unmapOptions. This PR
makes sure that the map args are not passed at the time of unmap.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-09-06 15:59:30 +00:00
Humble Chirammal
3f31ca8a3a cleanup: introduce populateVolOptions(), to fill rbdVol from stage req
At present the nodeStageVolume() handle many logic of filling rbdvol
struct based on the request received and this method is complex to
follow. with this patch, filling or populating volOptions has been
segregrated and handled hence make the stage functions' job easy.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-06 07:49:03 +00:00
Humble Chirammal
f0b8a3f626 rbd: use String() method of MirrorImageState in return error
MirrorImageState (type C.rbd_mirror_image_state_t) has a string
method which can be used while returning error in the replication
controller. Previously, we were using int return in the error which
is not the proper usage.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-03 16:02:53 +00:00
Madhu Rajanna
4865061ab9 util: create ceph configuration files if not present
create ceph.conf and keyring files if its not
present in the /et/ceph/ path.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-03 14:14:43 +00:00
Humble Chirammal
1d94c12cd6 cleanup: add checkErrAndUndoReserve() for error check,unreserve omap
all the error check scenarios of genVolFromVolID() and unreserving
omap entries based on the error made deleteVolume method complex,
this patch create a new function which handle the error check and
unrerving omap entries accordingly and finally return the response
to deletevolume/caller.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-03 12:20:04 +00:00
Niels de Vos
60c2afbcca util: NewK8sClient() should not panic on non-Kubernetes clusters
When NewK8sClient() detects and error, it used to call FatalLogMsg()
which causes a panic. There are additional features that can be used on
Kubernetes clusters, but these are not a requirement for most
functionalities of the driver.

Instead of causing a panic, returning an error should suffice. This
allows using the driver on non-Kubernetes clusters again.

Fixes: #2452
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-09-02 11:22:14 +00:00
Humble Chirammal
247795517f cephfs: remove explicit size setting of cloned volume
CephFS csi driver explictly set the size of the cloned volume
to the size of parent volume as cephfs mgr was lacking this
functionality previously. However it has been addressed in cephfs
so we dont need explicit size setting.

Ref#https://tracker.ceph.com/issues/46163

Supported Ceph releases:

Ceph versions equal or above - v16.0.0, v15.2.9, v14.2.12

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-09-01 09:32:29 +00:00
Madhu Rajanna
b383af20b4 cleanup: move cephfs errors to new util package
As part of the refactoring, moving the cephfs errors file to a new
package.

Updates: #852
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-09-01 06:50:16 +00:00
Rakshith R
99168dc822 rbd: check for clusterid mapping in RegenerateJournal()
This commit adds fetchMappedClusterIDAndMons() which returns
monitors and clusterID info after checking cluster mapping info.
This is required for regenerating omap entries in mirrored cluster
with different clusterID.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-31 14:30:06 +00:00
Rakshith R
496bcba85c rbd: move GetMappedID() to util package
This commit moves getMappedID() from rbd to util
package since it is not rbd specific and exports
it from there.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-08-31 14:30:06 +00:00
Niels de Vos
4a3b1181ce cleanup: move KMS functionality into its own package
A new "internal/kms" package is introduced, it holds the API that can be
consumed by the RBD components.

The KMS providers are currently in the same package as the API. With
later follow-up changes the providers will be placed in their own
sub-package.

Because of the name of the package "kms", the types, functions and
structs inside the package should not be prefixed with KMS anymore:

    internal/kms/kms.go:213:6: type name will be used as kms.KMSInitializerArgs by other packages, and that stutters; consider calling this InitializerArgs (golint)

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 16:31:40 +00:00
Niels de Vos
778b5e86de cleanup: move k8s functions to the util/k8s package
By placing the NewK8sClient() function in its own package, the KMS API
can be split from the "internal/util" package. Some of the KMS providers
use the NewK8sClient() function, and this causes circular dependencies
between "internal/utils" -> "internal/kms" -> "internal/utils", which
are not alowed in Go.

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-30 16:31:40 +00:00
Humble Chirammal
8ea495ab81 rbd: skip volumeattachment processing if pv marked for deletion
if the volumeattachment has been fetched but marked for deletion
the nbd healer dont want to process further on this pv. This patch
adds a check for pv is marked for deletion and if so, make the
healer skip processing the same

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-08-26 15:04:19 +00:00
Niels de Vos
6d00b39886 cleanup: move log functions to new internal/util/log package
Moving the log functions into its own internal/util/log package makes it
possible to split out the humongous internal/util packages in further
smaller pieces. This reduces the inter-dependencies between utility
functions and components, preventing circular dependencies which are not
allowed in Go.

Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-26 09:34:05 +00:00
Niels de Vos
68588dc7df util: fix unit-test for GetClusterMappingInfo()
Unit-testing often fails due to a race condition while writing the
clusterMappingConfigFile from multiple go-routines at the same time.
Failures from `make containerized-test` look like this:

    === CONT  TestGetClusterMappingInfo/site2-storage_cluster-id_mapping
        cluster_mapping_test.go:153: GetClusterMappingInfo() = <nil>, expected data &[{map[site1-storage:site2-storage] [map[1:3]] [map[11:5]]} {map[site3-storage:site2-storage] [map[8:3]] [map[10:5]]}]
    === CONT  TestGetClusterMappingInfo/site3-storage_cluster-id_mapping
        cluster_mapping_test.go:153: GetClusterMappingInfo() = <nil>, expected data &[{map[site3-storage:site2-storage] [map[8:3]] [map[10:5]]}]
    --- FAIL: TestGetClusterMappingInfo (0.01s)
        --- PASS: TestGetClusterMappingInfo/mapping_file_not_found (0.00s)
        --- PASS: TestGetClusterMappingInfo/mapping_file_found_with_empty_data (0.00s)
        --- PASS: TestGetClusterMappingInfo/cluster-id_mapping_not_found (0.00s)
        --- FAIL: TestGetClusterMappingInfo/site2-storage_cluster-id_mapping (0.00s)
        --- FAIL: TestGetClusterMappingInfo/site3-storage_cluster-id_mapping (0.00s)
        --- PASS: TestGetClusterMappingInfo/site1-storage_cluster-id_mapping (0.00s)

By splitting the public GetClusterMappingInfo() function into an
internal getClusterMappingInfo() that takes a filename, unit-testing can
use different files for each go-routine, and testing becomes more
predictable.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-08-25 16:08:48 +00:00
Prasanna Kumar Kalever
4f40213d8e rbd: fix rbd-nbd io-timeout to never abort
With the tests at CI, it kind of looks like that the IO is timing out after
30 seconds (default with rbd-nbd). Since we have tweaked reattach-timeout
to 300 seconds at ceph-csi, we need to explicitly set io-timeout on the
device too, as it doesn't make any sense to keep
io-timeout < reattach-timeout

Hence we set io-timeout for rbd nbd to 0. Specifying io-timeout 0 tells
the nbd driver to not abort the request and instead see if it can be
restarted on another socket.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Suggested-by: Ilya Dryomov <idryomov@redhat.com>
2021-08-24 17:09:09 +00:00