Commit Graph

1263 Commits

Author SHA1 Message Date
riya-singhal31
304194a0c0 cleanup: migration of volrep to csi-addons
This commit moves the volrep logic from internal/rbd to
internal/csi-addons/rbd.

Signed-off-by: riya-singhal31 <rsinghal@redhat.com>
2023-04-21 13:05:20 +00:00
Niels de Vos
37c8f07ed5 rbd: do not run mkfs on a BlockMode volume
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2023-03-08 16:26:39 +00:00
Niels de Vos
a4678200e5 rbd: allow setting mkfsOptions in the StorageClass
Add `mkfsOptions` to the StorageClass and pass them to the `mkfs`
command while creating the filesystem on the RBD device.

Fixes: #374
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2023-03-08 16:26:39 +00:00
Niels de Vos
13cdb08e61 rbd: cleanup passing mkfs arguments for NodeStageVolume
Storing the default `mkfs` arguments in a map with key per filesystem
type makes this a little more modular. It prepares th code for fetching
the `mkfs` arguments from the VolumeContext.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2023-03-08 16:26:39 +00:00
riya-singhal31
b28b5e6c84 cephfs: use shallow volumes for the ROX accessMode
this commit makes shallow volume as default feature for ROX volumes.

Signed-off-by: riya-singhal31 <rsinghal@redhat.com>
2023-02-21 20:09:13 +00:00
Rakshith R
95682522ee rbd: add capability to automatically enable read affinity
This commit makes use of crush location labels from node
labels to supply `crush_location` and `read_from_replica=localize`
options during rbd map cmd. Using these options, ceph
will be able to redirect reads to the closest OSD,
improving performance.

Signed-off-by: Rakshith R <rar@redhat.com>
2023-02-14 08:29:46 +00:00
Madhu Rajanna
3967e4dae9 cleanup: fix static checks
fix SA1019 static check to replace
io/utils with os package and sets
with generic sets

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2023-02-03 08:55:43 +00:00
Madhu Rajanna
e9e33fb851 cleanup: fix static checks
fix SA1019 static check to replace
io/utils with os package

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2023-02-02 14:53:59 +00:00
Madhu Rajanna
e54a97ba85 rbd: discover if StagingTargetPath in NodeExpandVolume
The StagingTargetPath is an optional entry in
NodeExpandVolumeRequest, We cannot expect it to be
set always and at the same time cephcsi depended
on the StaingTargetPath to retrieve some metadata
information.

This commit will check all the mount ref and identifies
the stagingTargetPath by checking the image-meta.json
file exists and this is a costly operation as we need to
loop through all the mounts and check image-meta.json
in each mount but this is happens only if the
StaingTargetPath is not set in the NodeExpandVolumeRequest

fixes #3623

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2023-01-31 08:20:36 +00:00
Madhu Rajanna
d5278bd6c5 rbd: set disableInUseChecks on rbd volume
set disableInUseChecks on rbd volume struct
as it will be used later to check whether
the rbd image is allowed to mount on multiple
nodes.

fixes: #3604

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2023-01-11 16:24:07 +00:00
Madhu Rajanna
f7796081d3 cephfs: skip expand for BackingSnapshot volume
We should not call ExpandVolume for the BackingSnapshot
subvolume as there wont be any real subvolume created for
it and even if we call it the ExpandVolume will fail
fail as there is no real subvolume exists.

This commits fixes by adjusting the `if` check to ensure
that ExpandVolume will only be called either the
VolumeRequest is to create from a snapshot or volume
and BackingSnapshot is not true.

sample code here https://go.dev/play/p/PI2tNii5tTg

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-12-21 12:29:06 +00:00
Marcel Lauhoff
0bf8646340 cephfs: nolint:gocyclo NewVolumeOptions, NewVolumeOptionsFromVolID
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-11-23 12:21:02 +00:00
Marcel Lauhoff
4788d279a5 cephfs: fscrypt encryption support
Add Ceph FS fscrypt support, similar to the RBD/ext4 fscrypt
integration. Supports encrypted PVCs, snapshots and clones.

Requires kernel and Ceph MDS support that is currently not in any
stable release.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-11-23 12:21:02 +00:00
Humble Chirammal
71c4ae542c rebase: remove protobuf dependency locking
this commit remove the protobuf dependency locking in the module
description.

Also, ptypes.TimestampProto is deprecated and this commit
make use of the timestamppb.New() for the construction.

ParseTime() function has been removed and callers adjusted to the
same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-11-15 00:10:46 +00:00
Madhu Rajanna
d12400aa9c rbd: unset metadata if setmetadata is false
We need to unset the metadata on the clone
and restore PVC if the parent PVC was created
when setmetadata was set to true and it was
set to false when restore and clone pvc was
created.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-11-14 14:41:36 +00:00
Rakshith R
eb21d75ef7 rbd: ignore stdErr for ceph osd blocklist when there is no error
`ceph osd blocklist range add/rm <ip>` cmd is outputting
"blocklisting cidr:10.1.114.75:0/32 until 202..." messages
incorrectly into stdErr. This commit ignores stdErr when err
is nil.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-11-12 04:20:14 +00:00
Humble Chirammal
d70b594946 rbd: remove false error check in getDeviceSize
this removed err condition will be always false as error
is always nil.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-11-09 15:35:45 +00:00
Rakshith R
8650538b78 rbd: setup encryption if rbdVol exits during CreateVol
This commit adds code to setup encryption on a rbdVol
being repaired in a followup CreateVolume request.
This is fixes a bug wherein encryption metadata may not
have been set in previous request due to container restart.

Fixes: #3402

Signed-off-by: Rakshith R <rar@redhat.com>
2022-11-07 12:49:18 +00:00
Madhu Rajanna
07e9dede2c rbd: check volume details from original volumeID
Checking volume details for the existing volumeID
first. if details like OMAP, RBD Image, Pool doesnot
exists try to use clusterIDMapping to look for the
correct informations.

fixes: #2929

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-11-04 06:32:05 +00:00
Madhu Rajanna
3e1f60244e rbd: check for empty lastSyncTime
Sometime the json unmarshal might
get success and return empty time
stamp. add a check to make sure the
time is not zero always.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-11-03 08:10:19 +00:00
Madhu Rajanna
8f25edc888 rbd: return error if last sync time not present
As per the csiaddon spec last sync time is
required parameter in the GetVolumeReplicationInfo
if we are failed to parse the description, return
not found error message instead of nil
which is empty response

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-11-03 08:10:19 +00:00
Madhu Rajanna
07aa9dea5c rbd: update namespace name in rados object
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rados object.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-28 15:50:01 +00:00
Madhu Rajanna
019628c8c2 rbd: update namespace name in metadata
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rbd image metadata.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-28 15:50:01 +00:00
Madhu Rajanna
848e3ee557 rbd: return abnormal in NodeGetVolumeStats
When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-26 09:40:22 +00:00
Madhu Rajanna
44d4546480 cephfs: return abnormal in NodeGetVolumeStats
When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-26 09:40:22 +00:00
Madhu Rajanna
f12fa3ee56 rbd: return GRPC error from GRPC method
GRPC methods should only return GRPC errors
if any error occurs.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-19 08:00:42 +00:00
Madhu Rajanna
302fead713 cephfs: delete subvolume if SetAllMetadata fails
To avoid subvolume leaks if the SetAllMetadata
operations fails delete the subvolume.
If any operation fails after creating the subvolume
we will remove the omap as the omap gets
removed we will need to remove the subvolume to
avoid stale resources.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-18 15:10:18 +00:00
Marcel Lauhoff
5a55419025 cephfs: Add placeholder journal fscrypt support
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
dc7ba684e3 rbd: Use EncryptionTypeNone
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
2abfafdf3f util: Add EncryptionTypeNone and unit tests
Add type none to distinguish disabled encryption (positive result)
from invalid configuration (negative result).

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
da76d8ddae kms: Add GetSecret() to KMIP KMS
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
1f1504479c rbd: Add context to fscrypt errors
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
3e3af4da18 rbd: support file encrypted snapshots
Support fscrypt on RBD snapshots

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
82d92aab4a rbd: Add volume journal encryption support
Add fscrypt support to the journal to support operations like
snapshotting.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
a7ea12eb8e rbd: Handle encryption type default at a more meaningful place
Different places have different meaningful fallback. When parsing
from user we should default to block, when parsing stored config we
should default to invalid and handle that as an error.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
1fa842277a rbd: fscrypt file encryption support
Integrate basic fscrypt functionality into RBD initialization. To
activate file encryption instead of block introduce the new
'encryptionType' storage class key.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
f1f50e0218 fscrypt: fix metadata directory permissions
Call Mount.Setup with SingleUserWritable constant instead of 0o755,
which is silently ignored and causes the /.fscrypt/{policy,protector}/
directories to have mode 000.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
4e38bdac10 fscrypt: fsync encrypted dir after setting policy [workaround]
Revert once our google/fscrypt dependency is upgraded to a version
that includes https://github.com/google/fscrypt/pull/359 gets accepted

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
33c33a8b49 fscrypt: Use constant protector name
Use constant protector name 'ceph-csi' instead of constant prefix
concatenated with the volume ID. When cloning volumes the ID changes
and fscrypt protected directories become inunlockable due to the
protector name change

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
97cb1b6672 fscrypt: Update mount info before create context
NewContextFrom{Mountpoint,Path} functions use cached
`/proc/self/mountinfo` to find mounted file systems by device ID.
Since we run fscrypt as a library in a long-lived process the cached
information is likely to be stale. Stale entries may map device IDs to
mount points of already destroyed RBDs and fail context creation.
Updating the cache beforehand prevents this.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
a52314356e fscrypt: Determine best supported fscrypt policy on node init
Currently fscrypt supports policies version 1 and 2. 2 is the best
choice and was the only choice prior to this commit. This adds support
for kernels < 5.4, by selecting policy version 1 there.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
dd0e1988c0 fscrypt: Fetch passphrase when keyFn is invoked not created
Fetch password when keyFn is invoked, not when it is created. This
allows creation of the keyFn before actually creating the passphrase.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
a6a4282493 fscrypt: Unlock: Fetch keys early
Fetch keys from KMS before doing anything else. This will catch KMS
errors before setting up any fscrypt metadata.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
cfea8d7562 fscrypt: fscrypt integration
Integrate google/fscrypt into Ceph CSI KMS and encryption setup. Adds
dependencies to google/fscrypt and pkg/xattr. Be as generic as
possible to support integration with both RBD and Ceph FS.

Add the following public functions:

InitializeNode: per-node initialization steps. Must be called
before Unlock at least once.

Unlock: All steps necessary to unlock an encrypted directory including
setting it up initially.

IsDirectoryUnlocked: Test if directory is really encrypted

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
2cf8ecc6c7 journal: Store encryptionType in Config struct
Add encryptionType next to kmsID to support both block and file
encryption.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
ce9fbb3474 rbd: Rename encryption to blockEncryption prep for fscrypt
In preparation of fscrypt support for RBD filesystems, rename block
encryption related function to include the word 'block'. Add struct
fields and IsFileEncrypted.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
624905d60d kms: Add basic GetSecret() test
Add rudimentary test to ensure that we can get a valid passphrase from
the GetSecret() feature

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
5df45f1c1b kms: testing: add KMS test dummy registry
Add registry similar to the providers one. This allows testers to
add and use GetKMSTestDummy() to create stripped down provider
instances suitable for use in unit tests.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
cb02a9beb9 kms: Add GetSecret() to metadata KMS
Add GetSecret() to allow direct access to passphrases without KDF and
wrapping by a DEKStore.

This will be used by fscrypt, which has its own KDF and wrapping. It
will allow users to take a k8s secret, for example, and use that
directly as a password in fscrypt.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
0599089de0 util: Add util to fetch encryption type from vol options
Fetch encryption type from vol options. Make fallback type
configurable to support RBD (default block) and Ceph FS (default file)

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
fe4821435e util: Make encryption passphrase size a parameter
fscrypt support requires keys longer than 20 bytes. As a preparation,
make the new passphrase length configurable, but default to 20 bytes.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Madhu Rajanna
69eb6e40dc rbd: return GRPC error message
The error message return from the GRPC
should be of GRPC error messages only
not the normal go errors. This commits
returns GRPC error if setAllMetadata
fails.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-17 15:17:29 +00:00
Madhu Rajanna
01d4a614c3 rbd: delete volume if setallmetadata fails
If any operations fails after the volume creation
we will cleanup the omap objects, but it is missing
if setAllMetadata fails. This commits adds the code
to cleanup the rbd image if metadata operation fails.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-17 15:17:29 +00:00
Madhu Rajanna
b40e8894f8 cephfs: use errors.As instead of errors.Is
As we need to compare the error type instead
of the error value we need to use errors.As
to check the API is implemented or not.

fixes: #3347

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-17 09:11:45 +00:00
Niels de Vos
b7703faf37 util: make inode metrics optional in FilesystemNodeGetVolumeStats()
CephFS does not have a concept of "free inodes", inodes get allocated
on-demand in the filesystem.

This confuses alerting managers that expect a (high) number of free
inodes, and warnings get produced if the number of free inodes is not
high enough. This causes alerts to always get reported for CephFS.

To prevent the false-positive alerts from happening, the
NodeGetVolumeStats procedure for CephFS (and CephNFS) will not contain
inodes in the reply anymore.

See-also: https://bugzilla.redhat.com/2128263
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-10-13 19:02:47 +00:00
Madhu Rajanna
71e5b3f922 rbd: remove dummy image workaround
To address the problem that snapshot
schedules are triggered for volumes
that are promoted, a dummy image was
disabled/enabled for replication.
This was done as a workaround, because the
promote operation was not triggering
the schedules for the image being promoted.

The bugs related to the same have been fixed in
RBD mirroring functionality and hence the
workaround #2656 can be removed from the code base.

ceph tracker https://tracker.ceph.com/issues/53914

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-10 08:22:10 +00:00
Yati Padia
36b061d426 rbd: get description from remote status
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.

Fixes: #3388

Signed-off-by: Yati Padia <ypadia@redhat.com>
Co-authored-by: shyam Ranganathan <srangana@redhat.com>
2022-09-14 12:06:01 +00:00
yati1998
b19705f260 rbd: implements getVolumeReplicationInfo
This commit implements getVolumeReplicationInfo
to get the last sync time and update it in volume
replication CR.

Signed-off-by: yati1998 <ypadia@redhat.com>
2022-09-13 14:17:10 +00:00
Rakshith R
a57859dfa4 rbd: use blocklist range cmd, fallback if it fails
This commit adds blocklist range cmd feature,
while fallbacks to old blocklist one ip at a
time if the cmd is invalid(not available).

Signed-off-by: Rakshith R <rar@redhat.com>
2022-09-13 13:10:32 +00:00
Prashanth Dintyala
2a6487cbf5 rbd: create token and use it for vault SA everytime possible
use TokenRequest API by default for vault SA even with K8s versions < 1.24

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>
2022-09-09 10:13:32 +00:00
Madhu Rajanna
76064d8e34 cephfs: retry subvolumegroup creation
Incase the  subvolumegroup is deleted
and recreated we need to restart the
cephcsi provisioner pod to clear cache
that cephcsi maintains. With this PR
if cephcsi sees NotFound error duing
subvolume creation it will reset the cache
for that filesystem so that in next RPC
call cephcsi will try to create the
subvolumegroup again

Ref: https://github.com/rook/rook/issues/10623

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-09-07 18:24:30 +00:00
Madhu Rajanna
e56621cd66 cephfs: fix subvolumegroup creation for multiple fs
In a cluster we can have multiple filesystem
for that we need to have a map of
subvolumegroups to check filesystem is created
nor not.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-09-07 18:24:30 +00:00
Madhu Rajanna
71dbc7dbb4 rbd: map only primary image
If the image is mirroring enabled
and primary consider it for mapping,
if the image is mirroring enabled but
not primary yet. return error message
until the image is marked as primary.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-09-06 10:40:12 +00:00
Madhu Rajanna
038462ff43 cephfs: return success if metadata operation not supported
If the ceph cluster is of older version and doesnot
support metadata operation, Instead of failing
the request return the success if metadata
operation is not supported.

fixes #3347

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-08-29 18:37:53 +00:00
Rakshith R
40134772a7 rbd: modify stripSecret mechanism in logGRPC()
This commit updates csi-addons spec version
and modifies logging to strip replication
request secret using csi.StripSecret, then
with replication.protosanitizer if the former
fails. This is done in order to make sure
we strip csi and replication format of secrets.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-29 11:18:15 +00:00
Rakshith R
f47839d73d rbd: improve kmip verifyResponse() error message
This commit uses %q instead %v in error messages
and adds result reason and message in kmip
verifyresponse().

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-24 07:58:57 +00:00
Rakshith R
eaa0e14cb2 rbd: fix bug in kmip kms Decrypt function
This commit fixes a bug in kmip kms Decrypt
function, where emd.DEK was fed in a Nonce
instead of emd.Nonce by mistake.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-24 07:58:57 +00:00
Niels de Vos
b697b9b0d9 cleanup: replace github.com/pborman/uuid with github.com/google/uuid
The github.com/google/uuid package is used by Kubernetes, and it is part
of the vendor/ directory already. Our usage of github.com/pborman/uuid
can be replaced by github.com/google/uuid, so that
github.com/pborman/uuid can be removed as a dependency.

Closes: #3315
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-22 14:34:25 +00:00
Rakshith R
19e4146fab rbd: add replication capability & service to csiaddons server
csi-addons server will advertise replication capability and
replication service will run with csi-addons server too.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-18 08:19:20 +00:00
Rakshith R
0c33a33d5c rbd: add kmip encryption type
The Key Management Interoperability Protocol (KMIP)
is an extensible communication protocol
that defines message formats for the manipulation
of cryptographic keys on a key management server.
Ceph-CSI can now be configured to connect to
various KMS using KMIP for encrypting RBD volumes.

https://en.wikipedia.org/wiki/Key_Management_Interoperability_Protocol

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-18 07:41:42 +00:00
Madhu Rajanna
dde21543bd cephfs: fix staticcheck comment
getting is unused for linter "staticcheck"
(nolintlint) error message due to wrong
comment format. this the format now with
`//directive // comment`

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-08-10 17:51:26 +00:00
Rakshith R
d39d2cffcc cleanup: use index instead of value while iterating
This commit cleans up for loop to use index to access
value instead of copying value into a new variable
while iterating.
```
internal/util/csiconfig.go:103:2: rangeValCopy: each \
iteration copies 136 bytes (consider pointers or indexing) \
(gocritic)
        for _, cluster := range config {
```

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-09 13:36:03 +00:00
Rakshith R
3d3c029471 nfs: add nodeserver within cephcsi
This commit adds nfs nodeserver capable of
mounting nfs volumes, even with pod networking
using NSenter design similar to rbd and cephfs.
NodePublish, NodeUnpublish, NodeGetVolumeStats
and NodeGetCapabilities have been implemented.

The nodeserver implementation has been inspired
from https://github.com/kubernetes-csi/csi-driver-nfs,
which was previously used for mounted cephcsi exported
nfs volumes. The current implementation is also
backward compatible for the previously created
PVCs.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-09 13:36:03 +00:00
Shyamsundar Ranganathan
c2280011d1 rbd: Report remote peer readiness if Up and status.Unknown
Current code uses an !A && !B condition incorrectly to
test A:Up and B:status for a remote peer image.

This should be !A || !B as we require both conditions to
be in the specified state (Up: true, and status Unknown).

This is corrected by this commit, and further fixes:
- check and return ready only when a remote site is
found in the status output
- check if all peer sites are ready, if multiple are found
and return ready appropriately

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
2022-08-09 05:32:15 +00:00
Madhu Rajanna
8d7b6ee59f rbd: consider mirror deamon state for ResyncVolume
During ResyncVolume we check if the image
is in an error state, and we resync.
After resync, the image will move to
either the `Error` or the `Resyncing` state.
And if the image is in the above two
conditions, we will return a successful
response and Ready=false so that the
consumer can wait until the volume is
ready to use. If the image is in any
other state we return an error message
to indicate the syncing is not going on.
The whole resync and image state change
depends on the rbd mirror daemon. If the
mirror daemon is not running, the image
can be in Resyncing or Unknown state.
The Ramen marks the volume replication as
secondary, and once the resync starts, it
will delete the volume replication CR as a
cleanup process.

As we dont have a check for the rbd mirror
daemon, we are returning a resync success
response and Ready=false. Due to this false
response Ramen is assuming the resync started
and deleted the volume replication CR, and
because of this, the cluster goes into a bad
state and needs manual intervention.

fixes #3289

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-08-08 13:26:15 +00:00
Niels de Vos
83df1eae53 rebase: k8s.io/mount-utils/IsNotMountPoint() is deprecated
IsNotMountPoint() is deprecated and Mounter.IsMountPoint() is
recommended to be used instead.

Reported-by: golangci/staticcheck
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
10b2277330 util: use k8s.io/mount-utils/NewWithoutSystemd() to prevent logging
NewWithoutSystemd() has been introduced in the k8s.io/mount-utils
package so that systemd is not called while executing functions. This
offers consumers the ability to prevent confusing and scary messages
from getting logged.

See-also: kubernetes/kubernetes#111218
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
3a200b6976 rbd: use IsLikelyNotMountPoint() to prevent systemd log messages
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
0a173a8a9e nfs: make DeleteVolume (more) idempotent
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-03 19:43:16 +00:00
Humble Chirammal
bc9ad3d9f1 rbd: add dummy attacher implementation
previously, it was a requirement to have attacher sidecar for CSI
drivers and there had an implementation of dummy mode of operation.
However skipAttach implementation has been stabilized and the dummy
mode of operation is going to be removed from the external-attacher.
Considering this driver  work on volumeattachment objects for NBD driver
use cases, we have to implement dummy controllerpublish and unpublish
and thus keep supporting our operations even in absence of dummy mode
of operation in the sidecar.

This commit make a NOOP controller publish and unpublish for RBD driver.

CephFS driver does not require attacher and it has already been made free
from the attachment operations.

    Ref# https://github.com/ceph/ceph-csi/pull/3149
    Ref# https://github.com/kubernetes-csi/external-attacher/issues/226

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-08-03 00:25:49 +00:00
Prasanna Kumar Kalever
30244bf11b cephfs: snapshots honor --setmetadata option
`--setmetadata` is false by default, honoring it
will keep the metadata disabled by default

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
14d6211d6d cephfs: subvolumes honor --setmetadata option
`--setmetadata` is false by default, honoring it
will keep the metadata disabled by default

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
de7128b3a2 cephfs: Add clusterName as metadata on snapshots
Example:
sh-4.4$ ceph fs subvolume snapshot metadata ls myfs csi-vol-ba248f9e-0e75-11ed-b774-8e97192ff5ec \
			csi-snap-ce24e3bb-0e75-11ed-b774-8e97192ff5ec --group_name csi
{
    "csi.ceph.com/cluster/name": "\"K8s-cluster-1\"",
    "csi.storage.k8s.io/volumesnapshot/name": "cephfs-pvc-snapshot",
    "csi.storage.k8s.io/volumesnapshot/namespace": "rook-ceph",
    "csi.storage.k8s.io/volumesnapshotcontent/name": "snapcontent-2e89e1b2-e6e9-48fe-b365-edb493d7022e"
}

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
856d7c264c cephfs: handle metadata op-failures with unsupported ceph versions
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
5f36f7e8bd cephfs: update subvolume snapshot metadata if snapshot already exists.
Make sure to set metadata when subvolume snapshot exist, i.e. if the
provisioner pod is restarted while createSnapShot is in progress, say it
created the subvolume snapshot but didn't yet set the metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
7c9259a45e cephfs: set metadata on the subvolume snapshot on create
Set snapshot-name/snapshot-namespace/snapshotcontent-name details
on subvolume snapshots as metadata on create.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
8c0dd482fa cephfs: add set/Remove subvolume snapshot metadata utility functions
Add utility functions to set/Remove
snapshot-name/snapshot-namespace/snapshotcontent-name metadata on
subvolume snapshots.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
51099d60fe cephfs: handle metadata op-failures with unsupported ceph versions
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
11d51ed9b0 cephfs: unset cluster Name metadata
unsets the cluster name metadata key and value on the subvolume

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
21d811096b cephfs: set cluster Name as metadata on the subvolume
This change helps read the cluster name from the cmdline args,
the provisioner will set the same on the subvolume.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
466bdf97b2 cephfs: set metadata on restart of provisioner pod
Make sure to set metadata when subvolume exist, i.e. if the provisioner pod
is restarted while createVolume is in progress, say it created the subvolume
but didn't yet set the metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
6bcb8ecc68 cephfs: set PV/PVC details on the subvolume as metadata on create
This helps Monitoring solutions without access to Kubernetes clusters to
display the details of the PV/PVC/NameSpace in their dashboard.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
ecf03eb6ae cephfs: add set/Get/List/Remove metadata utility functions
Add utility functions to set/Get/List/Remove PV/PVC/PVCNamespace metadata
on subvolume.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Madhu Rajanna
8c5563a9bc rbd: remove checkHealthyPrimary check
After Failover of workloads to the secondary
cluster when the primary cluster is down,
RBD Image is not marked healthy, and VR
resources are not promoted to the Primary,
In VolumeReplication, the `CURRENT STATE`
remains Unknown and doesn't change to Primary.

This happens because the primary cluster went down,
and we have force promoted the image on the
secondary cluster. and the image stays in
up+stopping_replay or could be any other states.
Currently assumption was that the image will
always be `up+stopped`. But the image will be in
`up+stopped` only for planned failover and it
could be in any other state if its a forced
failover. For this reason, removing
checkHealthyPrimary from the PromoteVolume RPC call.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-27 09:04:27 +00:00
Niels de Vos
011d4fc81c cleanup: create k8s.io/mount-utils Mounter only once
Recently the k8s.io/mount-utils package added more runtime dectection.
When creating a new Mounter, the detect is run every time. This is
unfortunate, as it logs a message like the following:

```
mount_linux.go:283] Detected umount with safe 'not mounted' behavior
```

This message might be useful, so it probably good to keep it.

In Ceph-CSI there are various locations where Mounter instances are
created. Moving that to the DefaultNodeServer type reduces it to a
single place. Some utility functions need to accept the additional
parameter too, so that has been modified as well.

See-also: kubernetes/kubernetes#109676
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-21 07:14:43 +00:00
takeaki-matsumoto
1025871021 cephfs: Support mount option on nodeplugin
add mount options on nodeplugin side

Signed-off-by: takeaki-matsumoto <takeaki.matsumoto@linecorp.com>
2022-07-18 22:04:12 +00:00
Madhu Rajanna
ceb88d6498 cephfs: remove extra check for restore size
Looks like cephfs snapshot size is buggy and its
getting removed in ceph fs. we cannot get the size
of the snapshot during CreateVolume call, so we cannot
do any size check at CreateVolume to check if the
restore size is smaller or not.

As we are removing this check it also fixes #3147
but we dont have any validation at CSI level for
smaller restore we need to depend on kubernetes
external-provisioner for it.

fixes: #3147

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-18 10:04:14 +00:00
Madhu Rajanna
f171143135 cephfs: round to cephfs size to multiple of 4Mib
Due to the bug in the df stat we need to round off
the subvolume size to align with 4Mib.

Note:- Minimum supported size in cephcsi is 1Mib,
we dont need to take care of Kib.

fixes #3240

More details at https://github.com/ceph/ceph/pull/46905

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-13 18:32:40 +00:00
Humble Chirammal
1856647506 cephfs: go with default permissions while creating subvolumes
While creating subvolumes, CephFS driver set the mode to `777`
and pass it along to go ceph apis which cause the subvolume
permission to be on 777, however if we create a subvolume
directly in the ceph cluster, the default permission bits are
set which is 755 for the subvolume. This commit try to stick
to the default behaviour even while creating the subvolume.

This also means that we can work with fsgrouppolicy set to
`File` in csiDriver object which is also addressed in this commit.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-07-13 06:49:58 +00:00
Benoît Knecht
507844c9b1 rbd: Use rados namespace when getting clone depth
When the Ceph user is restricted to a specific namespace in the pool, it is
crucial that evey interaction with the cluster is done within that namespace.
This wasn't the case in `getCloneDepth()`.

This issue was causing snapshot creation to fail with

> Failed to check and update snapshot content: failed to take snapshot of the
> volume X: "rpc error: code = Internal desc = rbd: ret=-1, Operation not
> permitted"

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2022-07-07 22:20:29 +00:00
Niels de Vos
14ba1498bf util: reduce systemd related errors while mounting
There are regular reports that identify a non-error as the cause of
failures. The Kubernetes mount-utils package has detection for systemd
based environments, and if systemd is unavailable, the following error
is logged:

    Cannot run systemd-run, assuming non-systemd OS
    systemd-run output: System has not been booted with systemd as init
    system (PID 1). Can't operate.
    Failed to create bus connection: Host is down, failed with: exit status 1

Because of the `failed` and `exit status 1` error message, users might
assume that the mounting failed. This does not need to be the case. The
container-images that the Ceph-CSI projects provides, do not use
systemd, so the error will get logged with each mount attempt.

By using the newer MountSensitiveWithoutSystemd() function from the
mount-utils package where we can, the number of confusing logs get
reduced.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-04 10:02:54 +00:00
Niels de Vos
a1ed6207f6 cephfs: report detailed error message on clone failure
go-ceph provides a new GetFailure() method to retrieve details errors
when cloning failed. This is now included in the `cephFSCloneState`
struct, which was a simple string before.

While modifying the `cephFSCloneState` struct, the constants have been
removed, as go-ceph provides them as well.

Fixes: #3140
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-06-30 19:33:41 +00:00
Yati Padia
5c40f1ef33 rbd: remove the clone in case of failure
This commit removes the clone incase
unsetAllMetadata or copyEncryptionConfig or
expand fails for createVolumeFromSnapshot
and CreateSnapshot.
It also removes the clone in case of
any failure in createCloneFromImage.

issue: #3103

Signed-off-by: Yati Padia <ypadia@redhat.com>
2022-06-30 05:50:16 +00:00
Prasanna Kumar Kalever
9fa3c8382b cleanup: reduce struct padding
internal/rbd/rbd_util.go:89:15: struct of size 312 bytes could be of
size 304 bytes:
``
struct{
	RbdImageName   	string,
	ImageID        	string,
	VolID          	string,
	Monitors       	string,
	JournalPool    	string,
	Pool           	string,
	RadosNamespace 	string,
	ClusterID      	string,
	RequestName    	string,
	NamePrefix     	string,
	ParentName     	string,
	ParentPool     	string,
	ClusterName    	string,
	Owner          	string,
	VolSize        	int64,
	StripeCount    	uint64,
	StripeUnit     	uint64,
	ObjectSize     	uint64,
	ImageFeatureSet	github.com/ceph/go-ceph/rbd.FeatureSet,
	encryption
*github.com/ceph/ceph-csi/internal/util.VolumeEncryption,
	CreatedAt
*google.golang.org/protobuf/types/known/timestamppb.Timestamp,
	conn
*github.com/ceph/ceph-csi/internal/util.ClusterConnection,
	ioctx          	*github.com/ceph/go-ceph/rados.IOContext,
	Primary        	bool,
	EnableMetadata 	bool,
}
`` (maligned)
type rbdImage struct {
              ^}`
make: *** [Makefile:118: go-lint] Error 1

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Prasanna Kumar Kalever
29a3f4acf6 cleanup: ReconcilePersistentVolume consider passing it by pointer
Address: hugeParam linter

internal/controller/persistentvolume/persistentvolume.go:59:7:
hugeParam: r is heavy (80 bytes); consider passing it by pointer
(gocritic)
[...]
internal/controller/persistentvolume/persistentvolume.go:135:7:
hugeParam: r is heavy (80 bytes); consider passing it by pointer
(gocritic)
func (r ReconcilePersistentVolume) reconcilePV(ctx context.Context, obj
runtime.Object) error {}

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Prasanna Kumar Kalever
caf4090657 rbd: provide option to disable setting metadata on rbd images
As we added support to set the metadata on the rbd images created for
the PVC and volume snapshot, by default metadata is set on all the images.

As we have seen we are hitting issues#2327 a lot of times with this,
we start to leave a lot of stale images. Currently, we rely on
`--extra-create-metadata=true` to decide to set the metadata or not,
we cannot set this option to false to disable setting metadata because we
use this for encryption too.

This changes is to provide an option to disable setting the image
metadata when starting cephcsi.

Fixes: #3009
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Madhu Rajanna
8a47904e8f rbd: add unit test for checkHealthyPrimary
Removed the code in checkHealthyPrimary which
makes the ceph call, passing it as input now.
Added unit test for checkHealthyPrimary function

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Madhu Rajanna
53e76fab69 rbd: fix checkHealthyPrimary to consider up+stopped state
we need to check for image should be in up+stopped state
not anyone of the state for that the we need to use
OR check not the AND check.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Madhu Rajanna
704cb5c941 revert: rbd: consider remote image health for primary
When the image is force promoted to primary on the
cluster the remote image might not be in replaying
state because due to the split brain state. This
PR reverts back the commit
c3c87f2ef3. Which we added
to check the remote image status.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Prasanna Kumar Kalever
1da446d2f2 rbd: healer detect Kubernetes version for right StagingTargetPath
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: #3176
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-24 12:23:29 +00:00
Madhu Rajanna
3acaa018db rbd: issue resync only if the force flag is set
During failover we do demote the volume on the primary
as the image is still not promoted yet on the remote cluster,
there are spurious split-brain errors reported by RBD,
the Cephcsi resync will attempt to resync from the "known"
secondary and that will cause data loss

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-23 13:28:18 +00:00
Madhu Rajanna
7a2dd4c3cf rbd: create token and use it for vault SA
create the token if kubernetes version in
1.24+ and use it for vault sa.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Rakshith R <rar@redhat.com>
2022-06-17 11:37:59 +00:00
Robert Vasek
fd7559a903 cephfs: added support for snapshot-backed volumes
This commit implements most of
docs/design/proposals/cephfs-snapshot-shallow-ro-vol.md design document;
specifically (de-)provisioning of snapshot-backed volumes, mounting such
volumes as well as mounting pre-provisioned snapshot-backed volumes.

Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-06-16 09:44:27 +00:00
Robert Vasek
0807fd2e6c journal: added csi.volume.backingsnapshotid image attribute
Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-06-16 09:44:27 +00:00
Madhu Rajanna
4b57cc3ec5 rbd: add support for rbd striping
RBD supports creating rbd images with
object size, stripe unit and stripe count
to support striping. This PR adds the support
for the same.

More details about striping at
https://docs.ceph.com/en/quincy/man/8/rbd/#striping

fixes: #3124

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-09 18:59:00 +00:00
Prasanna Kumar Kalever
09a8e5e9e6 rbd: unset cluster Name metadata
unsets the cluster name metadata key and value on the RBD image

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-08 16:23:59 +00:00
Prasanna Kumar Kalever
2880c25fd6 rbd: set cluster Name as metadata on the image
This change helps read the cluster name from the cmdline args,
the provisioner will set the same on the RBD images.

Fixes: #2973
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-08 16:23:59 +00:00
Prasanna Kumar Kalever
deb003e605 cleanup: use prefix instead of hardcoding csiParameterPrefix
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-08 16:23:59 +00:00
Madhu Rajanna
1952a9b4b3 ci: fix all linter errors found in golangci-lint
Fixing all the linter errors found in golang-ci
lint v1.46.2

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-03 12:55:54 +00:00
Madhu Rajanna
c9943320ac cephfs: skip NetNamespaceFilePath if the volume is pre-provisioned
In case of pre-provisioned volume the clusterID is
not set in the volume context as the clusterID is missing
we cannot extract the NetNamespaceFilePath from the
configuration file. For static volume and dynamically
provisioned volume the clusterID is set.

Note:- This is a special case to support mounting PV
without clusterID parameter.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-03 07:25:25 +00:00
Rakshith R
7688306f87 rbd: use vaultAuthPath variable name in error msg
Before the change, the error msg was the following:
```
failed to set VAULT_AUTH_MOUNT_PATH in Vault config: path is empty
```
`vaultAuthPath` is the actual variable name set by the
user. The error message will now be the following:
```
failed to set "vaultAuthPath" in vault config: path is empty
```

Signed-off-by: Rakshith R <rar@redhat.com>
2022-05-26 07:37:48 +00:00
Rakshith R
894c20f792 nfs: add support for pvc-pvc clone
This commit adds support for pvc-pvc clone.
Only capability needed to be advertised, the
underlying support is already provided by cephfs
backend.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-05-24 18:13:02 +00:00
Rakshith R
24515b509f nfs: add support for create & delete snapshot
This commits adds support for creation and
deletion of nfs snapshots based on cephfs.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-05-24 18:13:02 +00:00
Prasanna Kumar Kalever
6470cf3343 rbd: fix bug handling GetKrbdSupportedFeatures()
continue running rbd driver when /sys/bus/rbd/supported_features file is
missing, do not bailout.

Fixes: #2678
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-15 15:10:08 +00:00
Prasanna Kumar Kalever
83cc1b0e58 rbd: handle when krbdFeatures is zero
krbdFeatures is set to zero when kernel version < 3.8, i.e. in  case where
/sys/bus/rbd/supported_features is absent and we are unable to prepare
the krbd attributes based on kernel version.

When krbdFeatures is set to zero fallback to NBD only when autofallback
is turned ON.

Fixes: #2678
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-15 15:10:08 +00:00
Prasanna Kumar Kalever
e53fd87154 rbd: prepare krbd feature attrs if supported_features file is absent
Upstream /sys/bus/rbd/supported_features is part of Linux kernel v4.11.0
Prepare the attributes and use them in case if
/sys/bus/rbd/supported_features is missing.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-15 15:10:08 +00:00
Prasanna Kumar Kalever
27f503c144 rbd: unset parent PVC metadata on CreateVolume From Volume
Unset the parent PVC metadata on the temp clone rbd image

Fixes: #2970
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-12 15:54:09 +00:00
Prasanna Kumar Kalever
e0f34a6d60 rbd: unset snapshot metadata on CreateVolume From snapshot
Unset the snapshot metadata from the rbd image created from the snapshot

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-12 15:54:09 +00:00
Prasanna Kumar Kalever
d89c5fb39f rbd: unset PVC metadata on CreateSnapshot
Unset the PVC metadata on the rbd image created for the snapshot

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-12 15:54:09 +00:00
Prasanna Kumar Kalever
bac33262ae rbd: add unset volume/snapshot metadata utility functions
Added
GetVolumeMetadataKeys()
GetSnaoshotMetadataKeys()
unsetVolumeMetadata() and
unsetSnapshotMetadata()

functions.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-12 15:54:09 +00:00
Prasanna Kumar Kalever
1fd5277b3c cleanup: simplify setVolumeMetadata and rename it
Move k8s.GetVolumeMetadata() out of setVolumeMetadata() and rename it to
setAllMetadata() so that the same can be used for setting volume and
snapshot metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-12 15:54:09 +00:00
Niels de Vos
36e51402cb nfs: support ExpandVolume CSI procedure
There is not much the NFS-provisioner needs to do to expand a volume,
everything is handled by the CephFS components.

NFS does not need a resize on the node, so only ControllerExpandVolume
is required.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-05-10 17:43:59 +00:00
Madhu Rajanna
70674565df rbd: consider rbd as default mounter if not set
For the default mounter the mounter option
will not be set in the storageclass and as it is
not available in the storageclass same will not
be set in the volume context, Because of this the
mapOptions are getting discarded. If the mounter
is not set assuming it's an rbd mounter.

Note:- If the mounter is not set in the storageclass
we can set it in the volume context explicitly,
Doing this check-in node server to support backward
existing volumes and the check is minimal we are not
altering the volume context.

fixes: #3076

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-05-09 20:00:11 +00:00
Marcus Röder
a95a6213eb util: support systems using the new cgroup v2 structure
With cgroup v2, the location of the pids.max file changed and so did the
/proc/self/cgroup file

new /proc/self/cgroup file
`
0::/user.slice/user-500.slice/session-14.scope
`

old file:
`
11:pids:/user.slice/user-500.slice/session-2.scope
10:blkio:/user.slice
9:net_cls,net_prio:/
8:perf_event:/
...
`

There is no directory per subsystem (e.g. /sys/fs/cgroup/pids) any more, all
files are now in one directory.

fixes: https://github.com/ceph/ceph-csi/issues/3085

Signed-off-by: Marcus Röder <m.roeder@yieldlab.de>
2022-05-07 20:38:48 +00:00
Rakshith R
f1ccc4eced rbd: support pvc-pvc clone with different sc & encryption
This commit makes modification so as to allow pvc-pvc clone
with different storageclass having different encryption
configs.
This commit also modifies `copyEncryptionConfig()` to
include a `isEncrypted()` check within the function.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-05-06 10:32:21 +00:00
Rakshith R
bd57feb26e rbd: use vaultAuthPath variable name in error msg
Before the change, the error msg was the following:
```
failed to set VAULT_AUTH_MOUNT_PATH in Vault config: path is empty
```
`vaultAuthPath` is the actual variable name set by the
user. The error message will now be the following:
```
failed to set "vaultAuthPath" in vault config: path is empty
```

Signed-off-by: Rakshith R <rar@redhat.com>
2022-05-05 05:49:31 +00:00
Niels de Vos
9d7faf850f nfs: delete the CephFS volume when the export is already removed
In case the NFS-export has already been removed from the NFS-server, but
the CSI Controller was restarted, a retry to remove the NFS-volume will
fail with an error like:

> GRPC error: ....: response status not empty: "Export does not exist"

When this error is reported, assume the NFS-export was already removed
from the NFS-server configuration, and continue with deleting the
backend volume.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-05-04 21:31:06 +00:00
Madhu Rajanna
d2bc9743f7 cephfs: add netNamespaceFilePath for CephFS
as same host directory is not shared between
the cephfs and the rbd plugin pod. we need
to keep the netNamespaceFilePath separately
for both cephfs and rbd. CephFS plugin will
use this path to execute mount -t commands.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-19 12:28:46 +00:00
Madhu Rajanna
eb4bfb7326 cleanup: use block comment for ClusterInfo example
Adjusted the mix of tabs and the spaces and also
used block comment for better readability.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-19 12:28:46 +00:00
Madhu Rajanna
b4acbd08a5 rbd: move radosNamespace to RBD section
As radosNamespace is more specific to
RBD not the general ceph configuration. Now
we introduced a new RBD section for RBD specific
options, Moving the radosNamespace to RBD section
and keeping the radosNamespace still under the
global ceph level configration for backward
compatibility.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-19 12:28:46 +00:00
Madhu Rajanna
766346868e util: Add RBD specific options in clusterInfo
As the netNamespaceFilePath can be separate for
both cephfs and rbd adding the netNamespaceFilePath
path for RBD, This will help us to keep RBD and
CephFS specific options separately.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-19 12:28:46 +00:00
Niels de Vos
2b71aac752 nfs: return gRPC status from CephFS CreateVolume failure
The NFS Controller returns a non-gRPC error in case the CreateVolume
call for the CephFS volume fails. It is better to return the gRPC-error
that the CephFS Controller passed along.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-04-19 08:23:16 +00:00
Humble Chirammal
fcd0f4713a cleanup: correct typos in test description and source code
this commit correct typos in various places.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-04-18 10:29:08 +00:00
Humble Chirammal
4c4879ba8b cleanup: remove import alias for fence library
this commit remove unneeded import alias of fence library
from the network_fence test.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-04-18 10:29:08 +00:00
Madhu Rajanna
c245436ec4 util: fix logging in ExecuteCommandWithNSEnter
log the nsenter and its argument after executing
the command with the nsenter CLI.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-14 12:17:21 +00:00
Niels de Vos
28369702d2 nfs: use go-ceph API for creating/deleting exports
Recent versions of Ceph allow calling the NFS-export management
functions over the go-ceph API.

This seems incompatible with older versions that have been tested with
the `ceph nfs` commands that this commit replaces.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-04-14 08:01:45 +00:00
Madhu Rajanna
d886ab0d66 rbd: use leases for leader election
use leases for leader election instead
of the deprecated configmap based leader
election.

This PR is making leases as default leader election
refer https://github.com/kubernetes-sigs/
controller-runtime/pull/1773, default from configmap
to configmap leases was done with
https://github.com/kubernetes-sigs/
controller-runtime/pull/1144.

Release notes https://github.com/kubernetes-sigs/
controller-runtime/releases/tag/v0.7.0

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-14 06:46:50 +00:00
Madhu Rajanna
64a9b1fa59 rbd: consider remote image health for primary
To consider the image is healthy during the Promote
operation currently we are checking only the image
state on the primary site. If the network is flaky
or the remote site is down the image health is
not as expected. To make sure the image is healthy
across the clusters check the state on both local
and the remote clusters.

some details:
https://bugzilla.redhat.com/show_bug.cgi?id=2014495

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-13 08:37:23 +00:00
Madhu Rajanna
dffb6e72c2 rbd: check nbd tool features only for rbd driver
calling setRbdNbdToolFeatures inside an init
gets called in main.go for both cephfs and rbd
driver. instead of calling it in init function
calling this in rbd driver.go as this is specific
to rbd.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-11 21:18:27 +00:00
Humble Chirammal
959df4dbac doc: correct typos in struct field comments and release.md
corrected strings in the release guide and util server.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-04-11 06:23:25 +00:00
Prasanna Kumar Kalever
41fe2c7dda rbd: set metadata on the snapshot
Set snapshot-name/snapshot-namespace/snapshotcontent-name details
on RBD backend snapshot image as metadata on snapshot

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-08 15:43:14 +00:00
Prasanna Kumar Kalever
0ef79c6fc0 rbd: set metadata on restart of provisioner pod
Make sure to set metadata when image exist, i.e. if the provisioner pod
is restarted while createVolume is in progress, say it created the image
but didn't yet set the metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-08 15:43:14 +00:00
Prasanna Kumar Kalever
ae5925f04c rbd: update PV/PVC metadata on a reattach of PV
Example if a PVC was delete by setting `persistentVolumeReclaimPolicy` as
`Retain` on PV, and PV is reattached to a new PVC, we make sure to update
PV/PVC image metadata on a PV reattach.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-08 15:43:14 +00:00
Prasanna Kumar Kalever
0119d69ab2 rbd: set PV/PVC details on the image as metadata on create
This helps Monitoring solutions without access to Kubernetes clusters to
display the details of the PV/PVC/NameSpace in their dashboard.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-08 15:43:14 +00:00
Prasanna Kumar Kalever
4d750ed0e5 rbd: add set/Get VolumeMetadata() utility function
Define and use PV and PVC metadata keys used by external provisioner.
The CSI external-provisioner (v1.6.0+) introduces the
--extra-create-metadata flag, which automatically sets map<string, string>
parameters in the CSI CreateVolumeRequest.

Add utility functions to set/Get PV/PVC/PVCNamespace metadata on image

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-08 15:43:14 +00:00
Madhu Rajanna
7b2aef0d81 util: add support for the nsenter
add support to run rbd map and mount -t
commands with the nsenter.

complete design of pod/multus network
is added here https://github.com/rook/rook/
blob/master/design/ceph/multus-network.md#csi-pods

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-08 10:23:21 +00:00
Prasanna Kumar Kalever
d760d0ab6d rbd: check for cookie support from kernel
Currently we only check if the rbd-nbd tool supports cookie feature.
This change will also defend cookie addition based on kernel version

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-04 09:51:13 +00:00
Madhu Rajanna
f8bbd2f60f cephfs: fix omap deletion in DeleteSnapshot
The omap is stored with the requested
snapshot name not with the subvolume
snapshotname. This fix uses the correct
snapshot request name to cleanup the omap
once the subvolume snapshot is deleted.

fixes: #2974

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-31 13:46:03 +00:00
Niels de Vos
1da19680b4 nfs: support new and old NFS-management commands
The `ceph nfs export ...` commands have changed in recent Ceph releases.
Use the most recent command as a default, fall back to the older command
when an error is reported.

This shoud make the NFS-provisioner work on any current Ceph version.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-03-31 11:28:40 +00:00
Madhu Rajanna
f90408be4d rbd: increase force promote timeout to 2 minutes
Increase the timeout to 2 minutes to give enough time
for rollback to complete.
As rollback is performed by the force-promote command it,
at times, may take more than a minute
(based on dirty blocks that need to be rolled
back approximately) to rollback.

The added extra 1 minute is useful though to avoid
multiple calls to complete the rollback and in
extremely corner cases to avoid failures in the
first instance of the call when the mirror watcher
is not yet removed (post scaling down the
RBD mirror instance)

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-30 13:46:27 +00:00
Thibaut Blanchard
e874c9c11b rbd: fix topology snapshot pool
Restoring a snapshot with a new PVC results with a wrong
dataPoolName in case of initial volume linked
to a storageClass with topology constraints and erasure coding.

Signed-off-by: Thibaut Blanchard <thibaut.blanchard@gmail.com>
2022-03-30 04:40:30 +00:00
Niels de Vos
885295fcc9 nfs: store the NFS-cluster name in the journal
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-03-28 11:23:17 +00:00
Niels de Vos
3b4d193ca8 journal: add StoreAttribute/FetchAttribute
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-03-28 11:23:17 +00:00
Niels de Vos
010fd816dd nfs: store the calling Context in NFSVolume
NFSVolume instances are short lived, they only extist for a certain gRPC
procedure. It is easier to store the calling Context in the NFSVolume
struct, than to pass it to some of the functions that require it.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-03-28 11:23:17 +00:00
Niels de Vos
6d83df9cc9 nfs: add basic provisioner with create/delete procedures
These NFS Controller and Identity servers are the base for the new
provisioner. The functionality is currently extremely limited, follow-up
PRs will implement various CSI procedures.

CreateVolume is implemented with the bare minimum. This makes it
possible to create a volume, and mount it with the
kubernetes-csi/csi-driver-nfs NodePlugin.

DeleteVolume unexports the volume from the Ceph managed NFS-Ganesha
service. In case the Ceph cluster provides multiple NFS-Ganesha
deployments, things might not work as expected. This is going to be
addressed in follow-up improvements.

Lots of TODO comments need to be resolved before this can be declared
"production ready". Unit- and e2e-tests are missing as well.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-03-28 11:23:17 +00:00
Robert Vasek
f6ae612003 util: added reference tracker
RT, reference tracker, is key-based implementation of a reference counter.
Unlike an integer-based counter, RT counts references by tracking unique
keys. This allows accounting in situations where idempotency must be
preserved. It guarantees there will be no duplicit increments or decrements
of the counter.

Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-03-27 19:24:26 +00:00
Rakshith R
40de75e0db rbd: modify oidc token file path according to FHS 3.0
OIDC token file path has been modified from
`/var/run/secrets/token` to `/run/secrets/tokens`.
This has been done to ensure compliance with
FHS 3.0.

refer:
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s13.html

Signed-off-by: Rakshith R <rar@redhat.com>
2022-03-23 13:29:35 +00:00
Madhu Rajanna
8c5e414d53 rbd: do not read pvc namespace from volume attributes
Below are the 3 different cases where we need
the PVC namespace for encryption

* CreateVolume:- Read the namespace from the
createVolume parameters and store it in the omap
* NodeStage:- Read the namespace from the omap
not from the volumeContext
* Regenerate:- Read the pvc namespace from the claimRef
not from the volumeAttributes.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-21 08:54:43 +00:00
Madhu Rajanna
77011fbc61 cephfs: remove kubernetes csi prefixed parameters
remove kubernetes csi prefixed parameters
from the volumeContext as we dont want
to store it in the PV VolumeAttributes.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-21 08:54:43 +00:00
Madhu Rajanna
a7315a04c1 rbd: remove kubernetes csi prefixed parameters
remove kubernetes csi prefixed parameters
from the volumeContext as we dont want
to store it in the PV VolumeAttributes.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-21 08:54:43 +00:00
Madhu Rajanna
366c2ace31 util: add helper to get pvcnamespace from input
added helper function to return the pvc namespace
name from the input parameters.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-21 08:54:43 +00:00
Madhu Rajanna
772fe8d6c8 util: add helper function to strip kube parameters
added helper function to strip the kubernetes
specific parameters from the volumeContext as
volumeContext is storaged in the PV volumeAttributes

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-21 08:54:43 +00:00
Rakshith R
a56f9a0c05 rbd: flatten datasource image before creating volume
This commit ensures that parent image is flattened before
creating volume.
- If the data source is a PVC, the underlying image's parent
  is flattened(which would be a temp clone or snapshot).
  hard & soft limit is reduced by 2 to account for depth that
  will be added by temp & final clone.

- If the data source is a Snapshot, the underlying image is
  itself flattened.
  hard & soft limit is reduced by 1 to account for depth that
  will be added by the clone which will be restored from the
  snapshot.

Flattening step for resulting PVC image restored from snapshot is removed.
Flattening step for temp clone & final image is removed when pvc clone is
being created.

Fixes: #2190

Signed-off-by: Rakshith R <rar@redhat.com>
2022-03-18 10:27:27 +00:00
Madhu Rajanna
d357bebbc2 cephfs: disallow creating small volumes from snapshot/volume
as per the CSI standard the size is optional parameter,
as we are allowing the clone to a bigger size
today we need to block the clone to a smaller size
as its a have side effects like data corruption etc.

Note:- Even though this check is present in kubernetes
sidecar as CSI is CO independent adding the check
here.

fixes: #2718

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-17 05:07:26 +00:00
Humble Chirammal
525ff5d97f rbd: remove unimplemented responses for node operations
These RPCs( nodestage,unstage,volumestats) are
implemented RPCs for our drivers atm. This commit removes
the `unimplemented` responses from the common/default
server initialization routins.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-03-16 15:27:48 +00:00
Humble Chirammal
66e7f3525f cleanup: remove unimplemented controller expand,snapshot RPCs
These RPCs ( controller expand, create and delete snapshots) are
no longer unimplmented and we dont have to declare these as with
`unimplemented` states. This commit remove the same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-03-16 15:27:48 +00:00
Rakshith R
4f0bb2315b rbd: add aws-sts-metdata encryption type
With Amazon STS and kubernetes cluster is configured with
OIDC identity provider, credentials to access Amazon KMS
can be fetched using oidc-token(serviceaccount token).
Each tenant/namespace needs to create a secret with aws region,
role and CMK ARN.
Ceph-CSI will assume the given role with oidc token and access
aws KMS, with given CMK to encrypt/decrypt DEK which will stored
in the image metdata.

Refer: https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html
Resolves: #2879

Signed-off-by: Rakshith R <rar@redhat.com>
2022-03-16 07:29:56 +00:00
Prasanna Kumar Kalever
3eb0fa5e21 rbd: fix parsing mapOptions
Currently, we support

mapOption: "krbd:v1,v2,v3;nbd:v1,v2,v3"

- By omitting `krbd:` or `nbd:`, the option(s) apply to
  rbdDefaultMounter which is krbd.
- A user can _override_ the options for a mounter by specifying `krbd:`
  or `nbd:`.
  mapOption: "v1,v2,v3;nbd:v1,v2,v3"
  is effectively the same as the 1st example.
- Sections are split by `;`.
- If users want to specify common options for both `krbd` and `nbd`,
  they should mention them twice.

But in case if the krbd or nbd specifc options contian `:` within them,
then the parsing is failing now.

E0301 10:19:13.615111 7348 utils.go:200] ID: 63 Req-ID:
0001-0009-rook-ceph-0000000000000001-fd37c41b-9948-11ec-ad32-0242ac110004
GRPC error: badly formatted map/unmap options:
"krbd:read_from_replica=localize,crush_location=zone:zone1;"

This patch fix the above case where the options itself contain `:`
delimitor
ex: krbd:v1,v2,v3=v31:v32;nbd:v1,v2,v3"

Please note, if you are using such options which contain `:` delimiter,
then it is mandatory to specify the mounter-type.

Fixes: #2910
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-03-14 15:21:25 +00:00
Madhu Rajanna
78ec859dc6 cleanup: remove unwanted print
Removing unwanted print from the code

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-11 05:40:32 +00:00
Robert Vasek
80dda7cc30 cephfs: detect corrupt ceph-fuse mounts and try to remount
Mounts managed by ceph-fuse may get corrupted by e.g. the ceph-fuse process
exiting abruptly, or its parent container being terminated, taking down its
child processes with it.

This commit adds checks to NodeStageVolume and NodePublishVolume procedures
to detect whether a mountpoint in staging_target_path and/or target_path is
corrupted, and remount is performed if corruption is detected.

Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-03-10 06:05:52 +00:00
Robert Vasek
aa6297e164 cleanup: refactor helper functions in nodeserver.go
Refactored a couple of helper functions for easier resue.

* Code for building store.VolumeOptions is factored out into a separate function.

* Changed args of getCredentailsForVolume() and NodeServer.mount() so that
  instead of passing in whole csi.NodeStageVolumeRequest, only necessary
  properties are passed explicitly. This is to allow these functions to be
  called outside of NodeStageVolume() where NodeStageVolumeRequest is not
  available.

Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-03-10 06:05:52 +00:00
Rakshith R
3a64ee48c3 rbd: return unimplemented error for block-mode reclaimspace req
blkdiscard cmd discards all data on the block device which
is not desired. Hence, return unimplemented code if the
volume access mode is block.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-03-03 19:00:49 +00:00
Niels de Vos
1f012004a6 util: configure tenants vaultAuthNamespace if not set
When a tenant provides a configuration that includes the
`vaultNamespace` option, the `vaultAuthNamespace` option is still taken
from the global configuration. This is not wanted in all cases, as the
`vaultAuthNamespace` option defauls to the `vaultNamespace` option which
the tenant may want to override as well.

The following behaviour is now better defined:

1. no `vaultAuthNamespace` in the global configuration:
   A tenant can override the `vaultNamespace` option and that will also
   set the `vaultAuthNamespace` option to the same value.

2. `vaultAuthNamespace` and `vaultNamespace` in the global configuration:
   When both options are set to different values in the global
   configuration, the tenant `vaultNamespace` option will not override
   the global `vaultAuthNamespace` option. The tenant can configure
   `vaultAuthNamespace` with a different value if required.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-03-02 08:36:33 +00:00
Madhu Rajanna
d5c98f81a2 rbd: make image features as optional parameter
Makes the rbd images features in the storageclass
as optional so that default image features of librbd
can be used. and also kept the option to user
to specify the image features in the storageclass.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-02-28 13:10:03 +00:00
Madhu Rajanna
fb3835691f rbd: add support for deep-flatten image feature
as deep-flatten is long supported in ceph and its
enabled by default in the librbd, providing an option
to enable it in cephcsi for the rbd images we are
creating.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-02-28 13:10:03 +00:00
Madhu Rajanna
e9802c4940 cephfs: refactor cephfs core functions
This commits refactors the cephfs core
functions with interfaces. This helps in
better code structuring and writing the
unit test cases.

update #852

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-02-22 20:39:23 +00:00
Madhu Rajanna
46378f3bfc rbd: log stderror when running modprobe
logging the error is not user-friendly and
it contains system error message. Log the
stderr which is user-friendly error message
for identifying the problem.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-02-14 15:03:31 +00:00
Sébastien BERNARD
ee8fb3f05f rbd: Fix dataPool in createVolumeResponse
Return the dataPool used to create the image instead of the default one
provided by the createVolumeRequest.
In case of topologyConstrainedDataPools, they may differ.
Don't add datapool if it's not present

Signed-off-by: Sébastien Bernard <sebastien.bernard@sfr.com>
2022-02-10 11:44:22 +00:00
Humble Chirammal
8f6a7da538 cephfs: dont set explicit permissions on the volume
At present we are node staging with worldwide permissions which is
not correct. We should allow the CO to take care of it and make
the decision. This commit also remove `fuseMountOptions` and
`KernelMountOptions` as they are no longer needed

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-02-09 17:30:29 +00:00
Madhu Rajanna
2943555904 cephfs: fix omap deletion in DeleteSnapshot
the omap is stored with the requested
snapshot name not with the subvolume
snapshotname. This fix uses the correct
snapshot request name to cleanup the omap
once the subvolume snapshot is deleted.

fixes: #2832

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-02-08 20:37:53 +00:00
Humble Chirammal
ad6a3d7575 rbd: remove kp-metadata register functions of HPCS/Key Protect
This commit removes `kp-metadata` registration from existing HPCS
or Key Protect code as per the plan.

Fix #2816

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-02-08 18:27:03 +00:00
Humble Chirammal
1c3baa0722 rbd: add AAD(additionalAuthData) while unwrapping the DEK
As we are using optional additional auth data while wrapping
the DEK, we have to send the same additionally while unwrapping.

Error:
```
 failed to unwrap the DEK: kp.Error: ..(INVALID_FIELD_ERR)',
 reasons='[INVALID_FIELD_ERR: The field `ciphertext` must be: the
 original base64 encoded ciphertext from the wrap operation
```

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-02-08 03:06:30 +00:00
Niels de Vos
f6894909d7 util: use vaultNamespace if vaultAuthNamespace is not set
When a tenant configures `vaultNamespace` in their own ConfigMap, it is
not applied to the Vault configuration, unless `vaultAuthNamespace` is
set as well. This is unexpected, as the `vaultAuthNamespace` usually is
something configured globally, and not per tenant.

The `vaultAuthNamespace` is an advanced option, that is often not needed
to be configured. Only when tenants have to configure their own
`vaultNamespace`, it is possible that they need to use a different
`vaultAuthNamespace`. The default for the `vaultAuthNamespace` is now
the `vaultNamespace` value from the global configuration. Tenants can
still set it to something else in their own ConfigMap if needed.

Note that Hashicorp Vault Namespaces are only functional in the
Enterprise version of the product. Therefor this can not be tested in
the Ceph-CSI e2e with the Open Source version of Vault.

Fixes: https://bugzilla.redhat.com/2050056
Reported-by: Rachael George <rgeorge@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-02-07 08:20:48 +00:00
Rakshith R
3203673d17 cleanup: remove ceph.conf WA options which are already fixed
This commit removes ceph.conf WA options:
```
     # Workaround for http://tracker.ceph.com/issues/23446
     fuse_set_user_groups = false

     # ceph-fuse which uses libfuse2 by default has write buffer size of 2KiB
     # adding 'fuse_big_writes = true' option by default to override this limit
     # see https://github.com/ceph/ceph-csi/issues/1928
     fuse_big_writes = true
```
Since they are already fixed.

Refer: https://tracker.ceph.com/issues/44885
Refer: https://tracker.ceph.com/issues/23446
Closes: #2825

Signed-off-by: Rakshith R <rar@redhat.com>
2022-02-04 15:42:32 +00:00
Madhu Rajanna
28fef9b379 cleanup: remove thick provisioning code
This commit removes the thick provisioning
code as thick provisioning is deprecated in
cephcsi 3.5.0.

fixes: #2795

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-28 11:17:15 +00:00
Humble Chirammal
4ee4fdfebd rbd: unexport SecretsKMS from KMS implementation
This commit unexport SecretsKMS from KMS implementation.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-28 06:55:12 +00:00
Humble Chirammal
4058246637 rbd: unexport vaultTokenSA struct from KMS implementation
This commit unexport the vaultTokenSA from the vault KMS
implementation

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-28 06:55:12 +00:00
Humble Chirammal
b75c562217 rbd: Unexport VaultTenantSA struct from KMS implementation
This commit unexport VaultTenantSA struct from KMS implemenation
of Vault KMS.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-28 06:55:12 +00:00
Humble Chirammal
c8a3b9352e rbd: Unexport SecretsMetadataKMS struct
This commit unexport SecretsMetadataKMS struct from KMS
implementation

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-28 06:55:12 +00:00
Humble Chirammal
3f18d6e4b4 rbd: Unexport IntegratedDEK struct from kms
This commit unexport IntegratedDEK struct from KMS
implementation

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-28 06:55:12 +00:00
Humble Chirammal
6141aabcd2 rbd: unexport KeyProtect kms struct
At present the KMS structs are exported and ideally we should be
able to work without exporting the same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-28 06:55:12 +00:00
Humble Chirammal
a86121f756 rbd: unexport aws kms structs
At present the KMS structs are exported and ideally we should be
able to work without exporting the same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-28 06:55:12 +00:00
Madhu Rajanna
992d257530 cephfs: fix error logging in filesystem.go
fix error message logging in filesystem.go

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-27 14:31:12 +00:00
Madhu Rajanna
14c008c419 cleanup: use interface in filesystem.go
Currently, we are using methods and all the methods
makes a network call to fetch details from the ceph
clusters, its difficult to write test cases for
these functions, if we move to the interfaces
we can make use of mock to write unit testing
for the caller functions.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-27 14:31:12 +00:00
Humble Chirammal
f822600689 rbd: change the keyprotect metadata name to ibmkeyprotect
To be consistent with other components and also to explictly
state it belong to `ibm keyprotect` service introducing this
change

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-26 02:28:05 +00:00
Humble Chirammal
7ff048bf1e e2e: add podsecuritycontext fsgroup for normal user validation
considering the pod has run as normal user, the fsgroup has also
set to the same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-25 16:25:11 +00:00
Humble Chirammal
bf4ba0ec84 rbd: dont attempt explicit permission mod change from the RBD driver
currently we are overriding the permission to `0o777` at time of node
stage which is not the correct action. That said, this permission
change causes an extra permission correction at time of nodestaging
by the CO while the FSGROUP change policy has been set to
`OnRootMismatch`.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-25 16:25:11 +00:00
Madhu Rajanna
8096dd47e4 cleanup: remove unwanted type declaration
removed unwanted int64 type declaration to
fix style check.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
9c841c83d4 cleanup: rename errorPair to pairError
to fix the errname check renaming the
struct.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
4938fc2ff4 cleanup: use 0o600 intead of 0600
as we are using 0o600 in multiple files
use the same in all files which also fixes
go lint issue.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
c67bacdb11 cleanup: use %s instead of %w for t.Errorf
As t.Errorf does not support error-wrapping
directive using %s.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
813f6c30cc cleanup: use WriteString instead of Write
use WriteString instead of Write  for the temp
files.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
aba6979d29 cleanup: use os.ReadFile to read file
as ioutil.ReadFile is deprecated and
suggestion is to use os.ReadFile as
per https://pkg.go.dev/io/ioutil updating
the same.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
562dff0d19 cleanup: use os.WriteFile to write files
as ioutil.WriteFile is deprecated and
suggestion is to use os.WriteFile as
per https://pkg.go.dev/io/ioutil updating
the same.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
ba5809e191 rbd: make rbdImage as received for internal methods
Currently most of the internal methods have the
rbdVolume as the received. As these methods
are completely internal and requires only
the fields of the rbdImage use rbdImage
as the receiver instead of rbdVolume.

updates #2742

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-17 12:15:21 +00:00
Madhu Rajanna
2daf2f9f0c cephfs: log error message if clone fails
During CreateVolume from snapshot/volume,
its difficult to identify if the clone is
failed and a new clone is created. In case
of clone failure logging the error message
for better debugging.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-17 09:43:09 +00:00
Madhu Rajanna
d293d91c07 rbd: disallow creating small size volume from volume
as per the CSI standard the size is optional parameter,
as we are allowing the clone to a bigger size
today we need to block the clone to a smaller size
as its a have side effects like data corruption etc.

Note:- Even though this check is present in kubernetes
sidecar as CSI is CO independent adding the check
here.

updates: #2718

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-17 07:00:00 +00:00
Madhu Rajanna
ceafca6ddf rbd: disallow creating small size volume from snapshot
as per the CSI standard the size is optional parameter,
as we are allowing the restore to a bigger size
today we need to block the restore to a smaller size
as its a have side effects like data corruption.

Note:- Even though this check is present in kubernetes
sidecar as CSI is CO independent adding the check
here.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-17 07:00:00 +00:00
Madhu Rajanna
ef14ea7723 cephfs: resize cloned, restored volume if required
Currently, as a workaround, we are calling
the resize volume on the cloned, restore volumes
to adjust the cloned, restored volumes.
With this fix, we are calling the resize volume
only if there is a size mismatch with requested
and the volume from which the new volume needs
to be created.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-12 10:44:11 +00:00
Humble Chirammal
4a69378698 rbd: introduce a helper function to detect multi writer,block & rwofile
SINGLE_NODE_WRITER capability ambiguity has been fixed in csi spec v1.5
which allows the SP drivers to declare more granular WRITE capability in form
of SINGLE_NODE_SINGLE_WRITER or SINGLE_NODE_MULTI_WRITER.

These are not really new capabilities rather capabilities introduced to
get the desired functionality from CO side based on the capabilities SP
driver support for various CSI operations, this new capabilities also help
to address new access mode RWOP (readwriteoncepod).

This commit adds a helper function which identity the request is of
multiwriter mode and also validates whether it is filesystem mode or
block mode. Based on the inspection it fails to allow multi write
requests for filesystem mode and only allow multi write request against
block mode.

This commit also adds unit tests for isMultiWriterBlock function which
validates various accesstypes and accessmodes.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-11 19:40:22 +00:00
Humble Chirammal
68350e8815 cephfs: add SINGLE_NODE_{SINGLE/MULTI}_WRITER capability
SINGLE_NODE_WRITER capability ambiguity has been fixed in csi spec v1.5
which allows the SP drivers to declare more granular WRITE capability.
These are not really new capabilities rather capabilities introduced to
get the desired functionality from CO side based on the capabilities SP
driver support for various CSI operations.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-11 19:40:22 +00:00
Humble Chirammal
3730a462f4 rbd: add SINGLE_NODE{SINGLE_MULTI}_WRITER capabilities
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-11 19:40:22 +00:00
Humble Chirammal
bc354b6fb5 rbd: add BaseURL and tokenURL configuration
This commit adds optional BaseURL and TokenURL configuration to
key protect/hpcs configuration and client connections, if not
provided default values are used.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-11 21:12:56 +05:30
Yug Gupta
9d34809425 rbd: add NetworkFence operation
Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2022-01-07 14:48:12 +00:00
Yug Gupta
fa5866deec ci: add unit test for NetworkFence grpc calls
Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2022-01-07 14:48:12 +00:00
Yug Gupta
29782bf377 rbd: implement UnfenceClusterNetwork
implement UnfenceClusterNetwork grpc call
which allows to unblock the access to a
CIDR block by removing it from network fence.

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2022-01-07 14:48:12 +00:00
Yug Gupta
ebd8a762f0 rbd: implement FenceClusterNetwork
implement FenceClusterNetwork grpc call which
allows to blocks access to a CIDR block by
creating a network fence.

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2022-01-07 14:48:12 +00:00
Yug Gupta
ab15053fef ci: add unit test for networkfencing util
Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2022-01-07 14:48:12 +00:00
Yug Gupta
7d5879ad81 rbd: add network fencing utils
Convert the CIDR block into a range of IPs,
and then add network fencing via "ceph osd blocklist"
for each IP in that range.

Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
2022-01-07 14:48:12 +00:00
Rakshith R
384ab42ae7 cleanup: use %q instead of %s for logging
Signed-off-by: Rakshith R <rar@redhat.com>
2022-01-06 12:28:18 +00:00
Rakshith R
c19264e996 rbd: add function (cc *ClusterConnection) GetTaskAdmin()
This function returns new go-ceph TaskAdmin to add
tasks on rbd volumes.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-01-06 12:28:18 +00:00
Rakshith R
420aa9ec57 rbd: remove redundant rbdVol.getTrashPath() function
This commit removes rbdVol.getTrashPath() function
since it is no longer being used due to introduction
of go-ceph rbd admin task api for deletion.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-01-06 12:28:18 +00:00
Rakshith R
9adb25691c rbd: remove redundant util.Credentials arg from flattenRbdImage()
With introduction of go-ceph rbd admin task api, credentials are
no longer required to be passed as cli cmd is not invoked.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-01-06 12:28:18 +00:00
Rakshith R
7b0f051fd4 rbd: remove redundant rbdVolume.connect() in flattenRbdImage()
This commit removes `rv.Connect(cr)` since the rbdVolume should
have an active connection in this stage of the function call.

`rv.getCloneDepth(ctx)` will work after a connect to the cluster.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-01-06 12:28:18 +00:00
Rakshith R
ad3c334a3a rbd: use go-ceph rbd admin task api instead of cli
This commit adds support to go-ceph rbd task api
`trash remove` and `flatten` instead of using cli
cmds.

Fixes: #2186

Signed-off-by: Rakshith R <rar@redhat.com>
2022-01-06 12:28:18 +00:00
Humble Chirammal
5aa1e4d225 rbd: change the configmap of HPCS/KP key names to reflect the IBM string
considering IBM has different crypto services (ex: SKLM) in place, its
good to keep the configmap key names with below format

`IBM_KP_...` instead of `KP_..`

so that in future, if we add more crypto services from IBM we can keep
similar schema specific to that specific service from IBM.

Ex: `IBM_SKLM_...`

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-01-05 06:08:19 +00:00
Niels de Vos
8eaf1abbdc util: add common logging to csi-addons gRPC
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-23 17:43:23 +00:00
Niels de Vos
bb5d3b7257 cleanup: refactor gRPC middleware into NewMiddlewareServerOption
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-23 17:43:23 +00:00
Niels de Vos
e574c807f0 rbd: expose CSI-Addons ReclaimSpace operations
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-23 17:43:23 +00:00
Niels de Vos
c274649b80 rbd: implement NodeReclaimSpace
By calling fstrim/blkdiscard on the volume, space consumption should get
reduced.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-23 17:43:23 +00:00
Niels de Vos
7d36c5a9d1 rbd: implement CSI-Addons ControllerReclaimSpace
The CSI Controller (provisioner) can call `rbd sparsify` to reduce the
space consumption of the volume.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-23 17:43:23 +00:00
Madhu Rajanna
e4b7943bac rbd: add workaround for force promote
use ExecCommandWithTimeout with timeout
of 1 minute for the promote operation.
If the command doesnot returns error/response
in 1 minute the process will be killed
and error will be returned to the user.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 13:36:21 +00:00
Madhu Rajanna
95e9595c1f util: add helper ExecCommandWithTimeout function
added ExecCommandWithTimeout helper function
to execute the commands with the timeout option,
if the command does not return any response with
in the timeout time the process will be terminated
and error will be returned back to the user.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 13:36:21 +00:00
Madhu Rajanna
9499e73b93 rbd: correct logging in createBackingImage
after creating the rbd image log the image
details corresponding for the request along
with the request name.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
549bfedc94 rbd: remove extra logging from createBackingImage
we are already logging the rbd image details
and the snapshot details after creating the
clone.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
8c9105f09e rbd: remove extra getImageInfo API call
as getImageInfo is already called inside
cloneRbdImageFromSnapshot function right
after creating the clone. remove the extra
API call to get the details again.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
ff91b7edbd rbd: get image details after creating clone
after creating the clone get the current
image details like size, creationTime,
imageFeatures etc from the ceph cluster.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
edcb2b529b rbd: move core fields to rbdImage struct
moved ParentName, ParentPool and ImageFeatureSet
fields to the rbdImage struct as these are the
first citizens on the rbdImage.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
c6b288779a rbd: correct logging for clone
log the rbdVolume and the rbdSnapshot
after creating the clone from snapshot.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
3169c8e23a rbd: expand filesystem during NodeStageVolume
If the volume with a bigger size is created
from a snapshot or from another volume we
need to exapand the filesystem also in the
csidriver as nodeExpand request is not triggered
for this one, During NodeStageVolume we can
expand the filesystem by checking filesystem
needs expansion or not.

If its a encrypted device, check the device
size of rbd device and the LUKS device if required
the device will be expanded before
expanding the filesystem.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
69ae19e0cb rbd: resize the volume created from snapshot
If the requested volume size is greater than
the snapshot size, resize the cloned volume
after creating a clone from a snapshot.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
a28a4a4285 rbd: resize the volume created from volume
If the requested volume size is greater than
the parent volume size, resize the cloned volume
after creating a final clone from a parent volume.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
f7f662678a rbd: consider ErrImageNotFound during DeleteSnapshot
added a check to consider ErrImageNotFound error
during DeleteSnapshot operation, if the error
is ErrImageNotFound we need to ensure that image
is removed from the trash and also the rados
OMAP data is removed.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
da60d221df rbd: update size for rbdSnapshot struct
we need actual size of the rbdVolume
created for the snapshot, as we are not
storing the size of the snapshot in OMAP
we need to fetch the size from ceph cluster
and update the same on rbdSnapshot  struct.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
6a82baf5d3 rbd: remove SizeBytes from rbdSnapshot struct
as we are moving the VolSize to rbdImage struct
we should reuse the same instead of maintaining
one more field in rbdSnapshot struct.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
b1a0bb4714 rbd: move VolSize to rbdImage struct
move the Volsize to the rbdImage struct
as size is more applicable for rbdImage
as rbdImage is used for both rbdVolume
and rbdSnapshot.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
a0829e9e93 rbd: remove json tag from rbdVolume struct
as we are no longer supporting the v1.x
version of cephcsi. removing the json tag
used to store rbd volume details in configmap.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
124281519f rbd: add RequestedVolSize to rbdVolume struct
when doing the internal operation to get the
latest details the rbd image size is also getting
updated and this will update the volume size also
without actual requested size we cannot do the
resize operation for bigger clones. This commit
adds a new field called RequestedVolSize to rbdVolume
struct to hold the user requested size.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
22365ab77f cleanup: add cleanup helper for incorrect thick volume
added a new helper function called cleanupThickClone
to cleanup the snapshot and clone if the thick
provisioning is not fully completed.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Madhu Rajanna
ca29328554 csi: remove size check when creating volume
remove the  bigger size validation when
creating a volume from a snapshot or when
creation a clone from a volume as we resized
the volume after cloning.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-23 03:47:00 +00:00
Humble Chirammal
b9a8d37c3d rbd: enable expand operation for intree volumes
This commit enable the resize operation[1] for in-tree volumes.
new helper has been introduced here to aid the enablement or to
make it clean with existing code base.

[1] https://github.com/ceph/ceph-csi/blob/devel/docs/design/proposals/intree-migrate.md?plain=1#L66

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-12-22 19:33:05 +00:00
Madhu Rajanna
810e285c50 rbd: reset dummy image id
dummy image rbdVolume struct is derived
from the actual one rbdVolume of the
volumeID sent in the EnableVolumeReplication
request. and the dummy rbdVolume struct contains
the image id of the actual volume because
of that when we are repairing the dummy
image the image is sent to trash but not
deleted due to the wrong image ID. resetting
the image id will makes sure the image id
is fetching from ceph cluster and same
image id will be used for manager operation.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-21 17:39:07 +00:00
Humble Chirammal
b904c446d6 rbd: add kms unit test for key protect server
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-12-21 17:09:50 +00:00
Humble Chirammal
9200bc7a00 rbd: Implement Key Protect KMS integration for Ceph CSI
This commit adds the support for HPCS/Key Protect IBM KMS service
to Ceph CSI service. EncryptDEK() and DecryptDEK() of RBD volumes are
done with the help of key protect KMS server by wrapping and unwrapping
the DEK and by using the DEKStoreMetadata.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-12-21 17:09:50 +00:00
Madhu Rajanna
12e8e46bcf revert: remove explicit size setting of cloned volume
The ceph changes  are done on the both server and the
client side this change is not enough for remove
setting the size of cloned volumes.
this caused the regression like #2719 #2720 #2721 #2722.

This reverts commit 3565a342d5.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-21 14:15:46 +00:00
Humble Chirammal
88911eb4e9 rbd: add migration secret support to controllerserver functions
This commit adds the migration secret request validation to expand,
create controller functions.

Ref # https://github.com/ceph/ceph-csi/issues/2509

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-12-20 07:34:43 +00:00
Niels de Vos
30333378ef cleanup: add IsBlockMultiNode() helper
IsBlockMultiNode() is a new helper that takes a slice of
VolumeCapability objects and checks if it includes multi-node access
and/or block-mode support.

This can then easily be used in other services that need checking for
these particular capabilities, and preventing multi-node block-mode
access.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-17 07:31:55 +00:00
Madhu Rajanna
50d6ea825c rbd: remove retrieving volumeHandle from PV annotation
we have added clusterID mapping to identify the volumes
in case of a failover in Disaster recovery in #1946.
with #2314 we are moving to a configuration in
configmap for clusterID and poolID mapping.
and with #2314 we have all the required information
to identify the image mappings.
This commit removes the workaround implementation done
in #1946.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-17 03:38:29 +00:00
Niels de Vos
203920d8f4 rbd: move driver component into the rbd/driver package
The rbd package contains several functions that can be used by
CSI-Addons Service implmentations. Unfortunately it is not possible to
do this, as the rbd-driver needs to import the csi-addons/rbd package to
provide the CSI-Addons server. This causes a circular import when
services use the rbd package:

 - rbd/driver.go import csi-addons/rbd
 - csi-addons/rbd import rbd (including the driver)

By moving rbd/driver.go into its own package, the circular import can be
prevented.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-10 07:35:26 +00:00
Niels de Vos
44d69502bc rbd: export HexStringToInteger()
HexStringToInteger() used to return a uint64, but everywhere else uint
is used. Having HexStringToInteger() return a uint as well makes it a
little easier to use when setting it with SetGlobalInt().

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-10 07:35:26 +00:00
Niels de Vos
8b531f337e rbd: add functions for initializing global variables
When the rbd-driver starts, it initializes some global (yuck!) variables
in the rbd package. Because the rbd-driver is moved out into its own
package, these variables can not easily be set anymore.

Introcude SetGlobalInt(), SetGlobalBool() and InitJournals() so that the
rbd-driver can configure the rbd package.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-10 07:35:26 +00:00
Niels de Vos
3eeac3d36c rbd: export RunVolumeHealer() so that rbd/driver can start it
The rbd-driver calls rbd.runVolumeHealer() which is not available
outside the rbd package. By moving the rbd-driver into its own package,
RunVolumeHealer() needs to be exported.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-10 07:35:26 +00:00
Niels de Vos
5baf9811f9 rbd: export NodeServer.mounter outside of the rbd package
NodeServer.mounter is internal to the NodeServer type, but it needs to
be initialized by the rbd-driver. The rbd-driver is moved to its own
package, so .Mounter needs to be available from there in order to set
it.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-10 07:35:26 +00:00
Niels de Vos
8d09134125 rbd: export GenVolFromVolID() for consumption by csi-addons
genVolFromVolID() is used by the CSI Controller service to create an
rbdVolume object from a CSI volume_id. This function is useful for
CSI-Addons Services as well, so rename it to GenVolFromVolID().

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-10 07:35:26 +00:00
Niels de Vos
e76bffe353 cleanup: import k8s.io/mount-utils instead of k8s.io/utils/mount
k8s.io/utils/mount has moved to k8s.io/mount-utils, and Ceph-CSI uses
that already in most locations. Only internal/util/util.go still imports
the old path.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-09 17:58:34 +00:00
Madhu Rajanna
8081ac8251 rbd: add new image features for dummy image
The dummy image will be created with 1Mib size.
during the snapshot transfer operation the 1Mib
will be transferred even if the dummy image doesnot
contains any data. adding the new image features
`fast-diff,layering,obj-map,exclusive-lock`on the
dummy image will ensure that only the diff is
transferred to the remote cluster.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-07 17:34:14 +00:00
Madhu Rajanna
9a4533e549 rbd: create 1MiB size dummy image
we added a workaround for rbd scheduling by creating
a dummy image in #2656. with the fix we are creating
a dummy image of the size of the first actual rbd
image which is sent in EnableVolumeReplication request
if the actual rbd image size is 1TiB we are creating
a dummy image of 1TiB which is not good. even though
its a thin provisioned rbd images this is causing
issue for the transfer of the snapshot during
the mirroring operation.

This commit recreates the rbd image with 1MiB size
which is the smaller supported size in rbd.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-07 17:34:14 +00:00
Konstantin Shalygin
7411773f73 rbd: added RBD features support for krbd
Added support for `object-map, fast-diff`

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
2021-12-07 07:38:24 +00:00
Madhu Rajanna
64ce5e0949 rbd: check local image state during promote operation
rbd mirroring CLI calls are async and it doesn't wait
for the operation to be completed. ex:- `rbd mirror image enable`
it will enable the mirroring on the image but it doesn't
ensure that the image is mirroring enabled and healthy
primary. The same goes for the promote volume also.
This commits adds a check-in PromoteVolume to make sure
the image in a healthy state i.e `up+stopped`.

note:- not considering any intermediate states to make
sure the image is completely healthy before responding
success to the RPC call.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-12-01 20:19:05 +00:00
Prasanna Kumar Kalever
e7d8834149 rbd: enabe journal based mirroring
Journal-based RADOS block device mirroring ensures point-in-time
consistent replicas of all changes to an image, including reads and
writes, block device resizing, snapshots, clones, and flattening.

Journaling-based mirroring records all modifications to an image in the
order in which they occur. This ensures that a crash-consistent mirror
of an image is available.

Mirroring when configured in journal mode, mirroring will
utilize the RBD journaling image feature to replicate the image
contents. If the RBD journaling image feature is not yet enabled on the
image, it will be automatically enabled.

Fixes: #2018
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-12-01 14:12:30 +00:00
Niels de Vos
ab76459e87 rbd: implement CSI-Addons Identity Service
Depending on the way Ceph-CSI is deployed, the capabilities will be
configured for the GetCapabilities procedure. The other procedures are
more straight-forward.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-12-01 06:31:09 +00:00
Niels de Vos
20727bd41a cleanup: reduce complexity of rbd.Driver.Run()
After adding the new CSI-Addons Server, golang-ci complains that
driver.Run() is too complex. By moving the profiling checks and starting
of the go-routines in their own function, golang-ci is happy again.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-30 11:48:40 +00:00
Niels de Vos
b3910f2b4a rbd: enable CSI-Addons Server and Identity Service
Add a new endpoint for the CSI-Addons Service and enable the Identity
Service for the RBD plugin.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-30 11:48:40 +00:00
Niels de Vos
0f8bbaa217 rbd: add framework for CSI-Addons Identity Service
Add a new CSI-Addons Server and empty Identity Service for the RBD
plugin. The implementation of the Identity Service procedure calls will
be done in other PRs.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-30 11:48:40 +00:00
Madhu Rajanna
f0b2ea6a6d rbd: repair imageid after resync
During resync operation the local image
will get deleted and a new image is recreated
by the rbd mirroring. The new image will have
a new imageID. Once resync is completed
update the imageID in the OMAP to get the
image removed from the trash during DeleteVolume.

Before resyncing

```
sh-4.4# rbd info replicapool/csi-vol-0c25bdd3-485f-11ec-bd30-0242ac110004
rbd image 'csi-vol-0c25bdd3-485f-11ec-bd30-0242ac110004':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 1
	id: 1efcc6b7a769
	block_name_prefix: rbd_data.1efcc6b7a769
	format: 2
	features: layering
	op_features:
	flags:
	create_timestamp: Thu Nov 18 11:02:40 2021
	access_timestamp: Thu Nov 18 11:02:40 2021
	modify_timestamp: Thu Nov 18 11:02:40 2021
	mirroring state: enabled
	mirroring mode: snapshot
	mirroring global id: 9c4c236d-8a47-4779-b4f6-94e05da70dbd
	mirroring primary: true
```

```
sh-4.4# rados listomapvals csi.volume.0c25bdd3-485f-11ec-bd30-0242ac110004
--pool=replicapool
csi.imageid
value (12 bytes) :
00000000  31 65 66 63 63 36 62 37  61 37 36 39              |1efcc6b7a769|
0000000c

csi.imagename
value (44 bytes) :
00000000  63 73 69 2d 76 6f 6c 2d  30 63 32 35 62 64 64 33  |csi-vol-0c25bdd3|
00000010  2d 34 38 35 66 2d 31 31  65 63 2d 62 64 33 30 2d  |-485f-11ec-bd30-|
00000020  30 32 34 32 61 63 31 31  30 30 30 34              |0242ac110004|
0000002c

csi.volname
value (40 bytes) :
00000000  70 76 63 2d 32 36 38 39  33 66 30 38 2d 66 66 32  |pvc-26893f08-ff2|
00000010  62 2d 34 61 30 66 2d 61  35 63 33 2d 38 38 34 62  |b-4a0f-a5c3-884b|
00000020  37 32 30 66 66 62 32 63                           |720ffb2c|
00000028

csi.volume.owner
value (7 bytes) :
00000000  64 65 66 61 75 6c 74                              |default|
00000007
```

After Resyncing

```
sh-4.4# rbd info replicapool/csi-vol-0c25bdd3-485f-11ec-bd30-0242ac110004
rbd image 'csi-vol-0c25bdd3-485f-11ec-bd30-0242ac110004':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 1
	id: 10b183a48a97
	block_name_prefix: rbd_data.10b183a48a97
	format: 2
	features: layering, non-primary
	op_features:
	flags:
	create_timestamp: Thu Nov 18 11:09:39 2021
	access_timestamp: Thu Nov 18 11:09:39 2021
	modify_timestamp: Thu Nov 18 11:09:39 2021
	mirroring state: enabled
	mirroring mode: snapshot
	mirroring global id: 9c4c236d-8a47-4779-b4f6-94e05da70dbd
	mirroring primary: false

sh-4.4# rados listomapvals csi.volume.0c25bdd3-485f-11ec-bd30-0242ac110004
--pool=replicapool
csi.imageid
value (12 bytes) :
00000000  31 30 62 31 38 33 61 34  38 61 39 37              |10b183a48a97|
0000000c

csi.imagename
value (44 bytes) :
00000000  63 73 69 2d 76 6f 6c 2d  30 63 32 35 62 64 64 33  |csi-vol-0c25bdd3|
00000010  2d 34 38 35 66 2d 31 31  65 63 2d 62 64 33 30 2d  |-485f-11ec-bd30-|
00000020  30 32 34 32 61 63 31 31  30 30 30 34              |0242ac110004|
0000002c

csi.volname
value (40 bytes) :
00000000  70 76 63 2d 32 36 38 39  33 66 30 38 2d 66 66 32  |pvc-26893f08-ff2|
00000010  62 2d 34 61 30 66 2d 61  35 63 33 2d 38 38 34 62  |b-4a0f-a5c3-884b|
00000020  37 32 30 66 66 62 32 63                           |720ffb2c|
00000028

csi.volume.owner
value (7 bytes) :
00000000  64 65 66 61 75 6c 74                              |default|
00000007
```

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-25 09:22:13 +00:00
Madhu Rajanna
027b68ab39 rbd: operate on dummy image after adding scheduling
currently we are fist operating on the  dummy
image to refresh the pool and then we are adding
the scheduling. we think the scheduling should
be added first and than we should refresh the
pool. If we do this all the existing schedules
will be considered from the scheduler.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-23 11:04:42 +00:00
Madhu Rajanna
211ca9b5a7 rbd: do deep copy for dummyVol struct
with shallow copy of rbdVol to dummyVol
the image name update of the dummyVol is getting
reflected on the rbdVol which we dont want.

do deep copy to avoid this problem.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-23 11:04:42 +00:00
Prasanna Kumar Kalever
bdcf3273b5 rbd: provide a way to supply mounter specific mapOptions from sc
Uses the below schema to supply mounter specific map/unmapOptions to the
nodeplugin based on the discussion we all had at
https://github.com/ceph/ceph-csi/pull/2636

This should specifically be really helpful with the `tryOthermonters`
set to true, i.e with fallback mechanism settings turned ON.

mapOption: "kbrd:v1,v2,v3;nbd:v1,v2,v3"

- By omitting `krbd:` or `nbd:`, the option(s) apply to
  rbdDefaultMounter which is krbd.
- A user can _override_ the options for a mounter by specifying `krbd:`
  or `nbd:`.
  mapOption: "v1,v2,v3;nbd:v1,v2,v3"
  is effectively the same as the 1st example.
- Sections are split by `;`.
- If users want to specify common options for both `krbd` and `nbd`,
  they should mention them twice.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-23 08:54:37 +00:00
Shyamsundar Ranganathan
d1c21eece9 rbd: Update sequence of operations on dummy mirror image
The dummy mirror image needs to be disabled and then
reenabled for mirroring, to ensure a newly promoted
primary is now starting to schedule snapshots.

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
2021-11-19 09:38:59 +05:30
Madhu Rajanna
517ad8c644 rbd: use dummy image to workaround rbd scheduling bug
currently we have a bug in rbd mirror scheduling module.
After doing failover and failback the scheduling is not
getting updated and the mirroring snapshots are not
getting created periodically as per the scheduling
interval. This PR workarounds this one by doing below
operations

* Create a dummy (unique) image per cluster and this image
should be easily identified.

* During Promote operation on any image enable the
mirroring on the dummy image. when we enable the mirroring
on the dummy image the pool will get updated and the
scheduling will be reconfigured.

* During Demote operation on any image disable the mirroring
on the dummy image. the disable need to be done to enable
the mirroring again when we get the promote request to make
the image as primary

* When the DR is no more needed, this image need to be
manually cleanup as for now as we dont want to add a check
in the existing DeleteVolume code path for delete dummy image
as it impact the performance of the DeleteVolume workflow.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-19 09:38:59 +05:30
Madhu Rajanna
d05fc1e8e5 util: add helper to get the cluster ID
added helper function to get the cluster ID.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-19 09:38:59 +05:30
Madhu Rajanna
e4e0f397a6 rbd: run schedule during promote operation
Moved to add scheduling to the promote
operation as scheduling need to be added
when the image is promoted and this is
the correct method of adding the scheduling
to make the scheduling take place.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-19 09:38:59 +05:30
Madhu Rajanna
7bbd2ea284 rbd: use small case of error message
the error message should not start with
the capital letter changing the case as
per the standard.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-18 10:44:12 +00:00
Madhu Rajanna
51998a5f4a cleanup: log the image name and pool name
instead of logging the volumeID and the pool
name. log the poolname and image name for better
debugging.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-18 10:44:12 +00:00
Madhu Rajanna
0f0cda49a7 rbd: log stdError for cryptosetup command
If we hit any error while running the cryptosetup
commands we are logging only the error message.
with only error message it is difficult to analyze
the problem, logging the stdError will help us to
check what is the problem.

updates: #2610

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-11-18 02:17:15 +00:00
Niels de Vos
7e22180125 rbd: call undoStagingTransaction() when NodeStageVolume() fails
On line 341 a `transaction` is created. This is passed to the deferred
`undoStagingTransaction()` function when an error in the
`NodeStageVolume` procedure is detected. So far, so good.

However, on line 356 a new `transaction` is returned. This new
`transaction` is not used for the defer call.

By removing the empty `transaction` that is used in the defer call, and
calling `undoStagingTransaction()` on an error of `stageTransaction()`,
the code is a little simpler, and the cleanup of the transaction should
be done correctly now.

Updates: #2610
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-11-17 23:58:00 +00:00
Prasanna Kumar Kalever
e6fa392df1 rbd: fix mapOptions passing with rbd-nbd mounter
This was a regression introduced by:
https://github.com/ceph/ceph-csi/pull/2556

Fixes: #2610
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-16 10:12:46 +00:00
Prasanna Kumar Kalever
50e9dfa5c5 cleanup: fix log level
This log line is seen frequently in the logs and its better to be at
Warning loglevel rather than Error based on its severity

E1109 08:30:45.612395   38328 util.go:247] kernel 4.19.202 does not support required features

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-10 10:54:29 +00:00
Prasanna Kumar Kalever
3686b6da8b rbd: utilize cookie support from rbd for nbd
Problem:
On remap/attach of device (i.e. nodeplugin restart), there is no way
for rbd-nbd to defend if the backend storage is matching with the initial
backend storage.

Say, if an initial map request for backend "pool1/image1" got mapped to
/dev/nbd0 and the userspace process is terminated (on nodeplugin restart).
A next remap/attach (nodeplugin start) request within reattach-timeout is
allowed to use /dev/nbd0 for a different backend "pool1/image2"

For example, an operation like below could be dangerous:

$ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4"
$ sudo pkill -15 rbd-nbd   <-- nodeplugin terminate
$ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs"

Solution:
rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify
between the device and the backend, so that when a remap/attach request is
made, rbd-nbd can compare and avoid such dangerous operations.

With the provided solution, as part of the initial map request, backend
cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so
that on a remap/attach request rbd-nbd will check and validate if the
backend per device cookie matches with the initial map backend with the help
of cookie.

At Ceph-csi we use VOLID as device cookie, which will be unique, we pass
the VOLID as cookie at map and use the same at the time of attach, that
way rbd-nbd can identify backends and their matching devices.

Requires:
https://github.com/ceph/ceph/pull/41323
https://lkml.org/lkml/2021/4/29/274

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-04 03:20:59 +00:00
Prasanna Kumar Kalever
793b22cf27 rbd: check for nbd cookie support
Change checkRbdNbdTools() to setRbdNbdToolFeatures()

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-11-04 03:20:59 +00:00