Commit Graph

1042 Commits

Author SHA1 Message Date
Madhu Rajanna
07aa9dea5c rbd: update namespace name in rados object
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rados object.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-28 15:50:01 +00:00
Madhu Rajanna
019628c8c2 rbd: update namespace name in metadata
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rbd image metadata.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-28 15:50:01 +00:00
Madhu Rajanna
848e3ee557 rbd: return abnormal in NodeGetVolumeStats
When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-26 09:40:22 +00:00
Madhu Rajanna
44d4546480 cephfs: return abnormal in NodeGetVolumeStats
When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-26 09:40:22 +00:00
Madhu Rajanna
f12fa3ee56 rbd: return GRPC error from GRPC method
GRPC methods should only return GRPC errors
if any error occurs.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-19 08:00:42 +00:00
Madhu Rajanna
302fead713 cephfs: delete subvolume if SetAllMetadata fails
To avoid subvolume leaks if the SetAllMetadata
operations fails delete the subvolume.
If any operation fails after creating the subvolume
we will remove the omap as the omap gets
removed we will need to remove the subvolume to
avoid stale resources.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-18 15:10:18 +00:00
Marcel Lauhoff
5a55419025 cephfs: Add placeholder journal fscrypt support
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
dc7ba684e3 rbd: Use EncryptionTypeNone
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
2abfafdf3f util: Add EncryptionTypeNone and unit tests
Add type none to distinguish disabled encryption (positive result)
from invalid configuration (negative result).

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
da76d8ddae kms: Add GetSecret() to KMIP KMS
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
1f1504479c rbd: Add context to fscrypt errors
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
3e3af4da18 rbd: support file encrypted snapshots
Support fscrypt on RBD snapshots

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
82d92aab4a rbd: Add volume journal encryption support
Add fscrypt support to the journal to support operations like
snapshotting.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
a7ea12eb8e rbd: Handle encryption type default at a more meaningful place
Different places have different meaningful fallback. When parsing
from user we should default to block, when parsing stored config we
should default to invalid and handle that as an error.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
1fa842277a rbd: fscrypt file encryption support
Integrate basic fscrypt functionality into RBD initialization. To
activate file encryption instead of block introduce the new
'encryptionType' storage class key.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
f1f50e0218 fscrypt: fix metadata directory permissions
Call Mount.Setup with SingleUserWritable constant instead of 0o755,
which is silently ignored and causes the /.fscrypt/{policy,protector}/
directories to have mode 000.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
4e38bdac10 fscrypt: fsync encrypted dir after setting policy [workaround]
Revert once our google/fscrypt dependency is upgraded to a version
that includes https://github.com/google/fscrypt/pull/359 gets accepted

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
33c33a8b49 fscrypt: Use constant protector name
Use constant protector name 'ceph-csi' instead of constant prefix
concatenated with the volume ID. When cloning volumes the ID changes
and fscrypt protected directories become inunlockable due to the
protector name change

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
97cb1b6672 fscrypt: Update mount info before create context
NewContextFrom{Mountpoint,Path} functions use cached
`/proc/self/mountinfo` to find mounted file systems by device ID.
Since we run fscrypt as a library in a long-lived process the cached
information is likely to be stale. Stale entries may map device IDs to
mount points of already destroyed RBDs and fail context creation.
Updating the cache beforehand prevents this.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
a52314356e fscrypt: Determine best supported fscrypt policy on node init
Currently fscrypt supports policies version 1 and 2. 2 is the best
choice and was the only choice prior to this commit. This adds support
for kernels < 5.4, by selecting policy version 1 there.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
dd0e1988c0 fscrypt: Fetch passphrase when keyFn is invoked not created
Fetch password when keyFn is invoked, not when it is created. This
allows creation of the keyFn before actually creating the passphrase.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
a6a4282493 fscrypt: Unlock: Fetch keys early
Fetch keys from KMS before doing anything else. This will catch KMS
errors before setting up any fscrypt metadata.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
cfea8d7562 fscrypt: fscrypt integration
Integrate google/fscrypt into Ceph CSI KMS and encryption setup. Adds
dependencies to google/fscrypt and pkg/xattr. Be as generic as
possible to support integration with both RBD and Ceph FS.

Add the following public functions:

InitializeNode: per-node initialization steps. Must be called
before Unlock at least once.

Unlock: All steps necessary to unlock an encrypted directory including
setting it up initially.

IsDirectoryUnlocked: Test if directory is really encrypted

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
2cf8ecc6c7 journal: Store encryptionType in Config struct
Add encryptionType next to kmsID to support both block and file
encryption.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
ce9fbb3474 rbd: Rename encryption to blockEncryption prep for fscrypt
In preparation of fscrypt support for RBD filesystems, rename block
encryption related function to include the word 'block'. Add struct
fields and IsFileEncrypted.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
624905d60d kms: Add basic GetSecret() test
Add rudimentary test to ensure that we can get a valid passphrase from
the GetSecret() feature

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
5df45f1c1b kms: testing: add KMS test dummy registry
Add registry similar to the providers one. This allows testers to
add and use GetKMSTestDummy() to create stripped down provider
instances suitable for use in unit tests.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
cb02a9beb9 kms: Add GetSecret() to metadata KMS
Add GetSecret() to allow direct access to passphrases without KDF and
wrapping by a DEKStore.

This will be used by fscrypt, which has its own KDF and wrapping. It
will allow users to take a k8s secret, for example, and use that
directly as a password in fscrypt.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
0599089de0 util: Add util to fetch encryption type from vol options
Fetch encryption type from vol options. Make fallback type
configurable to support RBD (default block) and Ceph FS (default file)

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
fe4821435e util: Make encryption passphrase size a parameter
fscrypt support requires keys longer than 20 bytes. As a preparation,
make the new passphrase length configurable, but default to 20 bytes.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Madhu Rajanna
69eb6e40dc rbd: return GRPC error message
The error message return from the GRPC
should be of GRPC error messages only
not the normal go errors. This commits
returns GRPC error if setAllMetadata
fails.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-17 15:17:29 +00:00
Madhu Rajanna
01d4a614c3 rbd: delete volume if setallmetadata fails
If any operations fails after the volume creation
we will cleanup the omap objects, but it is missing
if setAllMetadata fails. This commits adds the code
to cleanup the rbd image if metadata operation fails.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-17 15:17:29 +00:00
Madhu Rajanna
b40e8894f8 cephfs: use errors.As instead of errors.Is
As we need to compare the error type instead
of the error value we need to use errors.As
to check the API is implemented or not.

fixes: #3347

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-17 09:11:45 +00:00
Niels de Vos
b7703faf37 util: make inode metrics optional in FilesystemNodeGetVolumeStats()
CephFS does not have a concept of "free inodes", inodes get allocated
on-demand in the filesystem.

This confuses alerting managers that expect a (high) number of free
inodes, and warnings get produced if the number of free inodes is not
high enough. This causes alerts to always get reported for CephFS.

To prevent the false-positive alerts from happening, the
NodeGetVolumeStats procedure for CephFS (and CephNFS) will not contain
inodes in the reply anymore.

See-also: https://bugzilla.redhat.com/2128263
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-10-13 19:02:47 +00:00
Madhu Rajanna
71e5b3f922 rbd: remove dummy image workaround
To address the problem that snapshot
schedules are triggered for volumes
that are promoted, a dummy image was
disabled/enabled for replication.
This was done as a workaround, because the
promote operation was not triggering
the schedules for the image being promoted.

The bugs related to the same have been fixed in
RBD mirroring functionality and hence the
workaround #2656 can be removed from the code base.

ceph tracker https://tracker.ceph.com/issues/53914

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-10-10 08:22:10 +00:00
Yati Padia
36b061d426 rbd: get description from remote status
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.

Fixes: #3388

Signed-off-by: Yati Padia <ypadia@redhat.com>
Co-authored-by: shyam Ranganathan <srangana@redhat.com>
2022-09-14 12:06:01 +00:00
yati1998
b19705f260 rbd: implements getVolumeReplicationInfo
This commit implements getVolumeReplicationInfo
to get the last sync time and update it in volume
replication CR.

Signed-off-by: yati1998 <ypadia@redhat.com>
2022-09-13 14:17:10 +00:00
Rakshith R
a57859dfa4 rbd: use blocklist range cmd, fallback if it fails
This commit adds blocklist range cmd feature,
while fallbacks to old blocklist one ip at a
time if the cmd is invalid(not available).

Signed-off-by: Rakshith R <rar@redhat.com>
2022-09-13 13:10:32 +00:00
Prashanth Dintyala
2a6487cbf5 rbd: create token and use it for vault SA everytime possible
use TokenRequest API by default for vault SA even with K8s versions < 1.24

Signed-off-by: Prashanth Dintyala <vdintyala@nvidia.com>
2022-09-09 10:13:32 +00:00
Madhu Rajanna
76064d8e34 cephfs: retry subvolumegroup creation
Incase the  subvolumegroup is deleted
and recreated we need to restart the
cephcsi provisioner pod to clear cache
that cephcsi maintains. With this PR
if cephcsi sees NotFound error duing
subvolume creation it will reset the cache
for that filesystem so that in next RPC
call cephcsi will try to create the
subvolumegroup again

Ref: https://github.com/rook/rook/issues/10623

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-09-07 18:24:30 +00:00
Madhu Rajanna
e56621cd66 cephfs: fix subvolumegroup creation for multiple fs
In a cluster we can have multiple filesystem
for that we need to have a map of
subvolumegroups to check filesystem is created
nor not.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-09-07 18:24:30 +00:00
Madhu Rajanna
71dbc7dbb4 rbd: map only primary image
If the image is mirroring enabled
and primary consider it for mapping,
if the image is mirroring enabled but
not primary yet. return error message
until the image is marked as primary.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-09-06 10:40:12 +00:00
Madhu Rajanna
038462ff43 cephfs: return success if metadata operation not supported
If the ceph cluster is of older version and doesnot
support metadata operation, Instead of failing
the request return the success if metadata
operation is not supported.

fixes #3347

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-08-29 18:37:53 +00:00
Rakshith R
40134772a7 rbd: modify stripSecret mechanism in logGRPC()
This commit updates csi-addons spec version
and modifies logging to strip replication
request secret using csi.StripSecret, then
with replication.protosanitizer if the former
fails. This is done in order to make sure
we strip csi and replication format of secrets.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-29 11:18:15 +00:00
Rakshith R
f47839d73d rbd: improve kmip verifyResponse() error message
This commit uses %q instead %v in error messages
and adds result reason and message in kmip
verifyresponse().

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-24 07:58:57 +00:00
Rakshith R
eaa0e14cb2 rbd: fix bug in kmip kms Decrypt function
This commit fixes a bug in kmip kms Decrypt
function, where emd.DEK was fed in a Nonce
instead of emd.Nonce by mistake.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-24 07:58:57 +00:00
Niels de Vos
b697b9b0d9 cleanup: replace github.com/pborman/uuid with github.com/google/uuid
The github.com/google/uuid package is used by Kubernetes, and it is part
of the vendor/ directory already. Our usage of github.com/pborman/uuid
can be replaced by github.com/google/uuid, so that
github.com/pborman/uuid can be removed as a dependency.

Closes: #3315
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-22 14:34:25 +00:00
Rakshith R
19e4146fab rbd: add replication capability & service to csiaddons server
csi-addons server will advertise replication capability and
replication service will run with csi-addons server too.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-18 08:19:20 +00:00
Rakshith R
0c33a33d5c rbd: add kmip encryption type
The Key Management Interoperability Protocol (KMIP)
is an extensible communication protocol
that defines message formats for the manipulation
of cryptographic keys on a key management server.
Ceph-CSI can now be configured to connect to
various KMS using KMIP for encrypting RBD volumes.

https://en.wikipedia.org/wiki/Key_Management_Interoperability_Protocol

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-18 07:41:42 +00:00
Madhu Rajanna
dde21543bd cephfs: fix staticcheck comment
getting is unused for linter "staticcheck"
(nolintlint) error message due to wrong
comment format. this the format now with
`//directive // comment`

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-08-10 17:51:26 +00:00
Rakshith R
d39d2cffcc cleanup: use index instead of value while iterating
This commit cleans up for loop to use index to access
value instead of copying value into a new variable
while iterating.
```
internal/util/csiconfig.go:103:2: rangeValCopy: each \
iteration copies 136 bytes (consider pointers or indexing) \
(gocritic)
        for _, cluster := range config {
```

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-09 13:36:03 +00:00
Rakshith R
3d3c029471 nfs: add nodeserver within cephcsi
This commit adds nfs nodeserver capable of
mounting nfs volumes, even with pod networking
using NSenter design similar to rbd and cephfs.
NodePublish, NodeUnpublish, NodeGetVolumeStats
and NodeGetCapabilities have been implemented.

The nodeserver implementation has been inspired
from https://github.com/kubernetes-csi/csi-driver-nfs,
which was previously used for mounted cephcsi exported
nfs volumes. The current implementation is also
backward compatible for the previously created
PVCs.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-09 13:36:03 +00:00
Shyamsundar Ranganathan
c2280011d1 rbd: Report remote peer readiness if Up and status.Unknown
Current code uses an !A && !B condition incorrectly to
test A:Up and B:status for a remote peer image.

This should be !A || !B as we require both conditions to
be in the specified state (Up: true, and status Unknown).

This is corrected by this commit, and further fixes:
- check and return ready only when a remote site is
found in the status output
- check if all peer sites are ready, if multiple are found
and return ready appropriately

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
2022-08-09 05:32:15 +00:00
Madhu Rajanna
8d7b6ee59f rbd: consider mirror deamon state for ResyncVolume
During ResyncVolume we check if the image
is in an error state, and we resync.
After resync, the image will move to
either the `Error` or the `Resyncing` state.
And if the image is in the above two
conditions, we will return a successful
response and Ready=false so that the
consumer can wait until the volume is
ready to use. If the image is in any
other state we return an error message
to indicate the syncing is not going on.
The whole resync and image state change
depends on the rbd mirror daemon. If the
mirror daemon is not running, the image
can be in Resyncing or Unknown state.
The Ramen marks the volume replication as
secondary, and once the resync starts, it
will delete the volume replication CR as a
cleanup process.

As we dont have a check for the rbd mirror
daemon, we are returning a resync success
response and Ready=false. Due to this false
response Ramen is assuming the resync started
and deleted the volume replication CR, and
because of this, the cluster goes into a bad
state and needs manual intervention.

fixes #3289

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-08-08 13:26:15 +00:00
Niels de Vos
83df1eae53 rebase: k8s.io/mount-utils/IsNotMountPoint() is deprecated
IsNotMountPoint() is deprecated and Mounter.IsMountPoint() is
recommended to be used instead.

Reported-by: golangci/staticcheck
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
10b2277330 util: use k8s.io/mount-utils/NewWithoutSystemd() to prevent logging
NewWithoutSystemd() has been introduced in the k8s.io/mount-utils
package so that systemd is not called while executing functions. This
offers consumers the ability to prevent confusing and scary messages
from getting logged.

See-also: kubernetes/kubernetes#111218
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
3a200b6976 rbd: use IsLikelyNotMountPoint() to prevent systemd log messages
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
0a173a8a9e nfs: make DeleteVolume (more) idempotent
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-03 19:43:16 +00:00
Humble Chirammal
bc9ad3d9f1 rbd: add dummy attacher implementation
previously, it was a requirement to have attacher sidecar for CSI
drivers and there had an implementation of dummy mode of operation.
However skipAttach implementation has been stabilized and the dummy
mode of operation is going to be removed from the external-attacher.
Considering this driver  work on volumeattachment objects for NBD driver
use cases, we have to implement dummy controllerpublish and unpublish
and thus keep supporting our operations even in absence of dummy mode
of operation in the sidecar.

This commit make a NOOP controller publish and unpublish for RBD driver.

CephFS driver does not require attacher and it has already been made free
from the attachment operations.

    Ref# https://github.com/ceph/ceph-csi/pull/3149
    Ref# https://github.com/kubernetes-csi/external-attacher/issues/226

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-08-03 00:25:49 +00:00
Prasanna Kumar Kalever
30244bf11b cephfs: snapshots honor --setmetadata option
`--setmetadata` is false by default, honoring it
will keep the metadata disabled by default

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
14d6211d6d cephfs: subvolumes honor --setmetadata option
`--setmetadata` is false by default, honoring it
will keep the metadata disabled by default

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
de7128b3a2 cephfs: Add clusterName as metadata on snapshots
Example:
sh-4.4$ ceph fs subvolume snapshot metadata ls myfs csi-vol-ba248f9e-0e75-11ed-b774-8e97192ff5ec \
			csi-snap-ce24e3bb-0e75-11ed-b774-8e97192ff5ec --group_name csi
{
    "csi.ceph.com/cluster/name": "\"K8s-cluster-1\"",
    "csi.storage.k8s.io/volumesnapshot/name": "cephfs-pvc-snapshot",
    "csi.storage.k8s.io/volumesnapshot/namespace": "rook-ceph",
    "csi.storage.k8s.io/volumesnapshotcontent/name": "snapcontent-2e89e1b2-e6e9-48fe-b365-edb493d7022e"
}

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
856d7c264c cephfs: handle metadata op-failures with unsupported ceph versions
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
5f36f7e8bd cephfs: update subvolume snapshot metadata if snapshot already exists.
Make sure to set metadata when subvolume snapshot exist, i.e. if the
provisioner pod is restarted while createSnapShot is in progress, say it
created the subvolume snapshot but didn't yet set the metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
7c9259a45e cephfs: set metadata on the subvolume snapshot on create
Set snapshot-name/snapshot-namespace/snapshotcontent-name details
on subvolume snapshots as metadata on create.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
8c0dd482fa cephfs: add set/Remove subvolume snapshot metadata utility functions
Add utility functions to set/Remove
snapshot-name/snapshot-namespace/snapshotcontent-name metadata on
subvolume snapshots.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
51099d60fe cephfs: handle metadata op-failures with unsupported ceph versions
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
11d51ed9b0 cephfs: unset cluster Name metadata
unsets the cluster name metadata key and value on the subvolume

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
21d811096b cephfs: set cluster Name as metadata on the subvolume
This change helps read the cluster name from the cmdline args,
the provisioner will set the same on the subvolume.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
466bdf97b2 cephfs: set metadata on restart of provisioner pod
Make sure to set metadata when subvolume exist, i.e. if the provisioner pod
is restarted while createVolume is in progress, say it created the subvolume
but didn't yet set the metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
6bcb8ecc68 cephfs: set PV/PVC details on the subvolume as metadata on create
This helps Monitoring solutions without access to Kubernetes clusters to
display the details of the PV/PVC/NameSpace in their dashboard.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
ecf03eb6ae cephfs: add set/Get/List/Remove metadata utility functions
Add utility functions to set/Get/List/Remove PV/PVC/PVCNamespace metadata
on subvolume.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Madhu Rajanna
8c5563a9bc rbd: remove checkHealthyPrimary check
After Failover of workloads to the secondary
cluster when the primary cluster is down,
RBD Image is not marked healthy, and VR
resources are not promoted to the Primary,
In VolumeReplication, the `CURRENT STATE`
remains Unknown and doesn't change to Primary.

This happens because the primary cluster went down,
and we have force promoted the image on the
secondary cluster. and the image stays in
up+stopping_replay or could be any other states.
Currently assumption was that the image will
always be `up+stopped`. But the image will be in
`up+stopped` only for planned failover and it
could be in any other state if its a forced
failover. For this reason, removing
checkHealthyPrimary from the PromoteVolume RPC call.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-27 09:04:27 +00:00
Niels de Vos
011d4fc81c cleanup: create k8s.io/mount-utils Mounter only once
Recently the k8s.io/mount-utils package added more runtime dectection.
When creating a new Mounter, the detect is run every time. This is
unfortunate, as it logs a message like the following:

```
mount_linux.go:283] Detected umount with safe 'not mounted' behavior
```

This message might be useful, so it probably good to keep it.

In Ceph-CSI there are various locations where Mounter instances are
created. Moving that to the DefaultNodeServer type reduces it to a
single place. Some utility functions need to accept the additional
parameter too, so that has been modified as well.

See-also: kubernetes/kubernetes#109676
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-21 07:14:43 +00:00
takeaki-matsumoto
1025871021 cephfs: Support mount option on nodeplugin
add mount options on nodeplugin side

Signed-off-by: takeaki-matsumoto <takeaki.matsumoto@linecorp.com>
2022-07-18 22:04:12 +00:00
Madhu Rajanna
ceb88d6498 cephfs: remove extra check for restore size
Looks like cephfs snapshot size is buggy and its
getting removed in ceph fs. we cannot get the size
of the snapshot during CreateVolume call, so we cannot
do any size check at CreateVolume to check if the
restore size is smaller or not.

As we are removing this check it also fixes #3147
but we dont have any validation at CSI level for
smaller restore we need to depend on kubernetes
external-provisioner for it.

fixes: #3147

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-18 10:04:14 +00:00
Madhu Rajanna
f171143135 cephfs: round to cephfs size to multiple of 4Mib
Due to the bug in the df stat we need to round off
the subvolume size to align with 4Mib.

Note:- Minimum supported size in cephcsi is 1Mib,
we dont need to take care of Kib.

fixes #3240

More details at https://github.com/ceph/ceph/pull/46905

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-13 18:32:40 +00:00
Humble Chirammal
1856647506 cephfs: go with default permissions while creating subvolumes
While creating subvolumes, CephFS driver set the mode to `777`
and pass it along to go ceph apis which cause the subvolume
permission to be on 777, however if we create a subvolume
directly in the ceph cluster, the default permission bits are
set which is 755 for the subvolume. This commit try to stick
to the default behaviour even while creating the subvolume.

This also means that we can work with fsgrouppolicy set to
`File` in csiDriver object which is also addressed in this commit.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-07-13 06:49:58 +00:00
Benoît Knecht
507844c9b1 rbd: Use rados namespace when getting clone depth
When the Ceph user is restricted to a specific namespace in the pool, it is
crucial that evey interaction with the cluster is done within that namespace.
This wasn't the case in `getCloneDepth()`.

This issue was causing snapshot creation to fail with

> Failed to check and update snapshot content: failed to take snapshot of the
> volume X: "rpc error: code = Internal desc = rbd: ret=-1, Operation not
> permitted"

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2022-07-07 22:20:29 +00:00
Niels de Vos
14ba1498bf util: reduce systemd related errors while mounting
There are regular reports that identify a non-error as the cause of
failures. The Kubernetes mount-utils package has detection for systemd
based environments, and if systemd is unavailable, the following error
is logged:

    Cannot run systemd-run, assuming non-systemd OS
    systemd-run output: System has not been booted with systemd as init
    system (PID 1). Can't operate.
    Failed to create bus connection: Host is down, failed with: exit status 1

Because of the `failed` and `exit status 1` error message, users might
assume that the mounting failed. This does not need to be the case. The
container-images that the Ceph-CSI projects provides, do not use
systemd, so the error will get logged with each mount attempt.

By using the newer MountSensitiveWithoutSystemd() function from the
mount-utils package where we can, the number of confusing logs get
reduced.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-04 10:02:54 +00:00
Niels de Vos
a1ed6207f6 cephfs: report detailed error message on clone failure
go-ceph provides a new GetFailure() method to retrieve details errors
when cloning failed. This is now included in the `cephFSCloneState`
struct, which was a simple string before.

While modifying the `cephFSCloneState` struct, the constants have been
removed, as go-ceph provides them as well.

Fixes: #3140
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-06-30 19:33:41 +00:00
Yati Padia
5c40f1ef33 rbd: remove the clone in case of failure
This commit removes the clone incase
unsetAllMetadata or copyEncryptionConfig or
expand fails for createVolumeFromSnapshot
and CreateSnapshot.
It also removes the clone in case of
any failure in createCloneFromImage.

issue: #3103

Signed-off-by: Yati Padia <ypadia@redhat.com>
2022-06-30 05:50:16 +00:00
Prasanna Kumar Kalever
9fa3c8382b cleanup: reduce struct padding
internal/rbd/rbd_util.go:89:15: struct of size 312 bytes could be of
size 304 bytes:
``
struct{
	RbdImageName   	string,
	ImageID        	string,
	VolID          	string,
	Monitors       	string,
	JournalPool    	string,
	Pool           	string,
	RadosNamespace 	string,
	ClusterID      	string,
	RequestName    	string,
	NamePrefix     	string,
	ParentName     	string,
	ParentPool     	string,
	ClusterName    	string,
	Owner          	string,
	VolSize        	int64,
	StripeCount    	uint64,
	StripeUnit     	uint64,
	ObjectSize     	uint64,
	ImageFeatureSet	github.com/ceph/go-ceph/rbd.FeatureSet,
	encryption
*github.com/ceph/ceph-csi/internal/util.VolumeEncryption,
	CreatedAt
*google.golang.org/protobuf/types/known/timestamppb.Timestamp,
	conn
*github.com/ceph/ceph-csi/internal/util.ClusterConnection,
	ioctx          	*github.com/ceph/go-ceph/rados.IOContext,
	Primary        	bool,
	EnableMetadata 	bool,
}
`` (maligned)
type rbdImage struct {
              ^}`
make: *** [Makefile:118: go-lint] Error 1

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Prasanna Kumar Kalever
29a3f4acf6 cleanup: ReconcilePersistentVolume consider passing it by pointer
Address: hugeParam linter

internal/controller/persistentvolume/persistentvolume.go:59:7:
hugeParam: r is heavy (80 bytes); consider passing it by pointer
(gocritic)
[...]
internal/controller/persistentvolume/persistentvolume.go:135:7:
hugeParam: r is heavy (80 bytes); consider passing it by pointer
(gocritic)
func (r ReconcilePersistentVolume) reconcilePV(ctx context.Context, obj
runtime.Object) error {}

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Prasanna Kumar Kalever
caf4090657 rbd: provide option to disable setting metadata on rbd images
As we added support to set the metadata on the rbd images created for
the PVC and volume snapshot, by default metadata is set on all the images.

As we have seen we are hitting issues#2327 a lot of times with this,
we start to leave a lot of stale images. Currently, we rely on
`--extra-create-metadata=true` to decide to set the metadata or not,
we cannot set this option to false to disable setting metadata because we
use this for encryption too.

This changes is to provide an option to disable setting the image
metadata when starting cephcsi.

Fixes: #3009
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Madhu Rajanna
8a47904e8f rbd: add unit test for checkHealthyPrimary
Removed the code in checkHealthyPrimary which
makes the ceph call, passing it as input now.
Added unit test for checkHealthyPrimary function

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Madhu Rajanna
53e76fab69 rbd: fix checkHealthyPrimary to consider up+stopped state
we need to check for image should be in up+stopped state
not anyone of the state for that the we need to use
OR check not the AND check.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Madhu Rajanna
704cb5c941 revert: rbd: consider remote image health for primary
When the image is force promoted to primary on the
cluster the remote image might not be in replaying
state because due to the split brain state. This
PR reverts back the commit
c3c87f2ef3. Which we added
to check the remote image status.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Prasanna Kumar Kalever
1da446d2f2 rbd: healer detect Kubernetes version for right StagingTargetPath
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: #3176
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-24 12:23:29 +00:00
Madhu Rajanna
3acaa018db rbd: issue resync only if the force flag is set
During failover we do demote the volume on the primary
as the image is still not promoted yet on the remote cluster,
there are spurious split-brain errors reported by RBD,
the Cephcsi resync will attempt to resync from the "known"
secondary and that will cause data loss

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-23 13:28:18 +00:00
Madhu Rajanna
7a2dd4c3cf rbd: create token and use it for vault SA
create the token if kubernetes version in
1.24+ and use it for vault sa.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Rakshith R <rar@redhat.com>
2022-06-17 11:37:59 +00:00
Robert Vasek
fd7559a903 cephfs: added support for snapshot-backed volumes
This commit implements most of
docs/design/proposals/cephfs-snapshot-shallow-ro-vol.md design document;
specifically (de-)provisioning of snapshot-backed volumes, mounting such
volumes as well as mounting pre-provisioned snapshot-backed volumes.

Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-06-16 09:44:27 +00:00
Robert Vasek
0807fd2e6c journal: added csi.volume.backingsnapshotid image attribute
Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-06-16 09:44:27 +00:00
Madhu Rajanna
4b57cc3ec5 rbd: add support for rbd striping
RBD supports creating rbd images with
object size, stripe unit and stripe count
to support striping. This PR adds the support
for the same.

More details about striping at
https://docs.ceph.com/en/quincy/man/8/rbd/#striping

fixes: #3124

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-09 18:59:00 +00:00
Prasanna Kumar Kalever
09a8e5e9e6 rbd: unset cluster Name metadata
unsets the cluster name metadata key and value on the RBD image

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-08 16:23:59 +00:00
Prasanna Kumar Kalever
2880c25fd6 rbd: set cluster Name as metadata on the image
This change helps read the cluster name from the cmdline args,
the provisioner will set the same on the RBD images.

Fixes: #2973
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-08 16:23:59 +00:00
Prasanna Kumar Kalever
deb003e605 cleanup: use prefix instead of hardcoding csiParameterPrefix
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-08 16:23:59 +00:00
Madhu Rajanna
1952a9b4b3 ci: fix all linter errors found in golangci-lint
Fixing all the linter errors found in golang-ci
lint v1.46.2

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-03 12:55:54 +00:00
Madhu Rajanna
c9943320ac cephfs: skip NetNamespaceFilePath if the volume is pre-provisioned
In case of pre-provisioned volume the clusterID is
not set in the volume context as the clusterID is missing
we cannot extract the NetNamespaceFilePath from the
configuration file. For static volume and dynamically
provisioned volume the clusterID is set.

Note:- This is a special case to support mounting PV
without clusterID parameter.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-03 07:25:25 +00:00
Rakshith R
7688306f87 rbd: use vaultAuthPath variable name in error msg
Before the change, the error msg was the following:
```
failed to set VAULT_AUTH_MOUNT_PATH in Vault config: path is empty
```
`vaultAuthPath` is the actual variable name set by the
user. The error message will now be the following:
```
failed to set "vaultAuthPath" in vault config: path is empty
```

Signed-off-by: Rakshith R <rar@redhat.com>
2022-05-26 07:37:48 +00:00