By default, `cryptsetup luksFormat` uses Argon2i as Password-Based Key
Derivation Function (PBKDF), which not only has a CPU cost, but also a memory
cost (to make brute-force attacks harder).
The memory cost is based on the available system memory by default, which in
the context of Ceph CSI can be a problem for two reasons:
1. Pods can have a memory limit (much lower that the memory available on the
node, usually) which isn't taken into account by `cryptsetup`, so it can get
OOM-killed when formating a new volume;
2. The amount of memory that was used during `cryptsetup luksFormat` will then
be needed for `cryptsetup luksOpen`, so if the volume was formated on a node
with a lot of memory, but then needs to be opened on a different node with
less memory, `cryptsetup` will get OOM-killed.
This commit sets the PBKDF memory limit to a fixed value to ensure consistent
memory usage regardless of the specifications of the nodes where the volume
happens to be formatted in the first place.
The limit is set to a relatively low value (32 MiB) so that the `csi-rbdplugin`
container in the `nodeplugin` pod doesn't require an extravagantly high memory
limit in order to format/open volumes (particularly with operations happening
in parallel), while at the same time not being so low as to render it
completely pointless.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Add `mkfsOptions` to the StorageClass and pass them to the `mkfs`
command while creating the filesystem on the RBD device.
Fixes: #374
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Storing the default `mkfs` arguments in a map with key per filesystem
type makes this a little more modular. It prepares th code for fetching
the `mkfs` arguments from the VolumeContext.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit makes use of crush location labels from node
labels to supply `crush_location` and `read_from_replica=localize`
options during rbd map cmd. Using these options, ceph
will be able to redirect reads to the closest OSD,
improving performance.
Signed-off-by: Rakshith R <rar@redhat.com>
The StagingTargetPath is an optional entry in
NodeExpandVolumeRequest, We cannot expect it to be
set always and at the same time cephcsi depended
on the StaingTargetPath to retrieve some metadata
information.
This commit will check all the mount ref and identifies
the stagingTargetPath by checking the image-meta.json
file exists and this is a costly operation as we need to
loop through all the mounts and check image-meta.json
in each mount but this is happens only if the
StaingTargetPath is not set in the NodeExpandVolumeRequest
fixes#3623
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
set disableInUseChecks on rbd volume struct
as it will be used later to check whether
the rbd image is allowed to mount on multiple
nodes.
fixes: #3604
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
We should not call ExpandVolume for the BackingSnapshot
subvolume as there wont be any real subvolume created for
it and even if we call it the ExpandVolume will fail
fail as there is no real subvolume exists.
This commits fixes by adjusting the `if` check to ensure
that ExpandVolume will only be called either the
VolumeRequest is to create from a snapshot or volume
and BackingSnapshot is not true.
sample code here https://go.dev/play/p/PI2tNii5tTg
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add Ceph FS fscrypt support, similar to the RBD/ext4 fscrypt
integration. Supports encrypted PVCs, snapshots and clones.
Requires kernel and Ceph MDS support that is currently not in any
stable release.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
this commit remove the protobuf dependency locking in the module
description.
Also, ptypes.TimestampProto is deprecated and this commit
make use of the timestamppb.New() for the construction.
ParseTime() function has been removed and callers adjusted to the
same.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We need to unset the metadata on the clone
and restore PVC if the parent PVC was created
when setmetadata was set to true and it was
set to false when restore and clone pvc was
created.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
`ceph osd blocklist range add/rm <ip>` cmd is outputting
"blocklisting cidr:10.1.114.75:0/32 until 202..." messages
incorrectly into stdErr. This commit ignores stdErr when err
is nil.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds code to setup encryption on a rbdVol
being repaired in a followup CreateVolume request.
This is fixes a bug wherein encryption metadata may not
have been set in previous request due to container restart.
Fixes: #3402
Signed-off-by: Rakshith R <rar@redhat.com>
Checking volume details for the existing volumeID
first. if details like OMAP, RBD Image, Pool doesnot
exists try to use clusterIDMapping to look for the
correct informations.
fixes: #2929
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Sometime the json unmarshal might
get success and return empty time
stamp. add a check to make sure the
time is not zero always.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
As per the csiaddon spec last sync time is
required parameter in the GetVolumeReplicationInfo
if we are failed to parse the description, return
not found error message instead of nil
which is empty response
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rados object.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rbd image metadata.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
To avoid subvolume leaks if the SetAllMetadata
operations fails delete the subvolume.
If any operation fails after creating the subvolume
we will remove the omap as the omap gets
removed we will need to remove the subvolume to
avoid stale resources.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Different places have different meaningful fallback. When parsing
from user we should default to block, when parsing stored config we
should default to invalid and handle that as an error.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Integrate basic fscrypt functionality into RBD initialization. To
activate file encryption instead of block introduce the new
'encryptionType' storage class key.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Call Mount.Setup with SingleUserWritable constant instead of 0o755,
which is silently ignored and causes the /.fscrypt/{policy,protector}/
directories to have mode 000.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Revert once our google/fscrypt dependency is upgraded to a version
that includes https://github.com/google/fscrypt/pull/359 gets accepted
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Use constant protector name 'ceph-csi' instead of constant prefix
concatenated with the volume ID. When cloning volumes the ID changes
and fscrypt protected directories become inunlockable due to the
protector name change
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
NewContextFrom{Mountpoint,Path} functions use cached
`/proc/self/mountinfo` to find mounted file systems by device ID.
Since we run fscrypt as a library in a long-lived process the cached
information is likely to be stale. Stale entries may map device IDs to
mount points of already destroyed RBDs and fail context creation.
Updating the cache beforehand prevents this.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Currently fscrypt supports policies version 1 and 2. 2 is the best
choice and was the only choice prior to this commit. This adds support
for kernels < 5.4, by selecting policy version 1 there.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Fetch password when keyFn is invoked, not when it is created. This
allows creation of the keyFn before actually creating the passphrase.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Fetch keys from KMS before doing anything else. This will catch KMS
errors before setting up any fscrypt metadata.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Integrate google/fscrypt into Ceph CSI KMS and encryption setup. Adds
dependencies to google/fscrypt and pkg/xattr. Be as generic as
possible to support integration with both RBD and Ceph FS.
Add the following public functions:
InitializeNode: per-node initialization steps. Must be called
before Unlock at least once.
Unlock: All steps necessary to unlock an encrypted directory including
setting it up initially.
IsDirectoryUnlocked: Test if directory is really encrypted
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
In preparation of fscrypt support for RBD filesystems, rename block
encryption related function to include the word 'block'. Add struct
fields and IsFileEncrypted.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Add registry similar to the providers one. This allows testers to
add and use GetKMSTestDummy() to create stripped down provider
instances suitable for use in unit tests.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Add GetSecret() to allow direct access to passphrases without KDF and
wrapping by a DEKStore.
This will be used by fscrypt, which has its own KDF and wrapping. It
will allow users to take a k8s secret, for example, and use that
directly as a password in fscrypt.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>