Commit Graph

264 Commits

Author SHA1 Message Date
Benoît Knecht
0ec6e10bf2 util: Limit cryptsetup PBKDF memory usage
By default, `cryptsetup luksFormat` uses Argon2i as Password-Based Key
Derivation Function (PBKDF), which not only has a CPU cost, but also a memory
cost (to make brute-force attacks harder).

The memory cost is based on the available system memory by default, which in
the context of Ceph CSI can be a problem for two reasons:

1. Pods can have a memory limit (much lower that the memory available on the
   node, usually) which isn't taken into account by `cryptsetup`, so it can get
   OOM-killed when formating a new volume;
2. The amount of memory that was used during `cryptsetup luksFormat` will then
   be needed for `cryptsetup luksOpen`, so if the volume was formated on a node
   with a lot of memory, but then needs to be opened on a different node with
   less memory, `cryptsetup` will get OOM-killed.

This commit sets the PBKDF memory limit to a fixed value to ensure consistent
memory usage regardless of the specifications of the nodes where the volume
happens to be formatted in the first place.

The limit is set to a relatively low value (32 MiB) so that the `csi-rbdplugin`
container in the `nodeplugin` pod doesn't require an extravagantly high memory
limit in order to format/open volumes (particularly with operations happening
in parallel), while at the same time not being so low as to render it
completely pointless.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 1852e977f8)
2023-04-27 16:47:12 +00:00
Rakshith R
95682522ee rbd: add capability to automatically enable read affinity
This commit makes use of crush location labels from node
labels to supply `crush_location` and `read_from_replica=localize`
options during rbd map cmd. Using these options, ceph
will be able to redirect reads to the closest OSD,
improving performance.

Signed-off-by: Rakshith R <rar@redhat.com>
2023-02-14 08:29:46 +00:00
Madhu Rajanna
3967e4dae9 cleanup: fix static checks
fix SA1019 static check to replace
io/utils with os package and sets
with generic sets

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2023-02-03 08:55:43 +00:00
Madhu Rajanna
e9e33fb851 cleanup: fix static checks
fix SA1019 static check to replace
io/utils with os package

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2023-02-02 14:53:59 +00:00
Marcel Lauhoff
2abfafdf3f util: Add EncryptionTypeNone and unit tests
Add type none to distinguish disabled encryption (positive result)
from invalid configuration (negative result).

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
f1f50e0218 fscrypt: fix metadata directory permissions
Call Mount.Setup with SingleUserWritable constant instead of 0o755,
which is silently ignored and causes the /.fscrypt/{policy,protector}/
directories to have mode 000.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
4e38bdac10 fscrypt: fsync encrypted dir after setting policy [workaround]
Revert once our google/fscrypt dependency is upgraded to a version
that includes https://github.com/google/fscrypt/pull/359 gets accepted

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
33c33a8b49 fscrypt: Use constant protector name
Use constant protector name 'ceph-csi' instead of constant prefix
concatenated with the volume ID. When cloning volumes the ID changes
and fscrypt protected directories become inunlockable due to the
protector name change

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
97cb1b6672 fscrypt: Update mount info before create context
NewContextFrom{Mountpoint,Path} functions use cached
`/proc/self/mountinfo` to find mounted file systems by device ID.
Since we run fscrypt as a library in a long-lived process the cached
information is likely to be stale. Stale entries may map device IDs to
mount points of already destroyed RBDs and fail context creation.
Updating the cache beforehand prevents this.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
a52314356e fscrypt: Determine best supported fscrypt policy on node init
Currently fscrypt supports policies version 1 and 2. 2 is the best
choice and was the only choice prior to this commit. This adds support
for kernels < 5.4, by selecting policy version 1 there.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
dd0e1988c0 fscrypt: Fetch passphrase when keyFn is invoked not created
Fetch password when keyFn is invoked, not when it is created. This
allows creation of the keyFn before actually creating the passphrase.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
a6a4282493 fscrypt: Unlock: Fetch keys early
Fetch keys from KMS before doing anything else. This will catch KMS
errors before setting up any fscrypt metadata.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
cfea8d7562 fscrypt: fscrypt integration
Integrate google/fscrypt into Ceph CSI KMS and encryption setup. Adds
dependencies to google/fscrypt and pkg/xattr. Be as generic as
possible to support integration with both RBD and Ceph FS.

Add the following public functions:

InitializeNode: per-node initialization steps. Must be called
before Unlock at least once.

Unlock: All steps necessary to unlock an encrypted directory including
setting it up initially.

IsDirectoryUnlocked: Test if directory is really encrypted

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
624905d60d kms: Add basic GetSecret() test
Add rudimentary test to ensure that we can get a valid passphrase from
the GetSecret() feature

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
0599089de0 util: Add util to fetch encryption type from vol options
Fetch encryption type from vol options. Make fallback type
configurable to support RBD (default block) and Ceph FS (default file)

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Marcel Lauhoff
fe4821435e util: Make encryption passphrase size a parameter
fscrypt support requires keys longer than 20 bytes. As a preparation,
make the new passphrase length configurable, but default to 20 bytes.

Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
2022-10-17 17:33:52 +00:00
Rakshith R
d39d2cffcc cleanup: use index instead of value while iterating
This commit cleans up for loop to use index to access
value instead of copying value into a new variable
while iterating.
```
internal/util/csiconfig.go:103:2: rangeValCopy: each \
iteration copies 136 bytes (consider pointers or indexing) \
(gocritic)
        for _, cluster := range config {
```

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-09 13:36:03 +00:00
Rakshith R
3d3c029471 nfs: add nodeserver within cephcsi
This commit adds nfs nodeserver capable of
mounting nfs volumes, even with pod networking
using NSenter design similar to rbd and cephfs.
NodePublish, NodeUnpublish, NodeGetVolumeStats
and NodeGetCapabilities have been implemented.

The nodeserver implementation has been inspired
from https://github.com/kubernetes-csi/csi-driver-nfs,
which was previously used for mounted cephcsi exported
nfs volumes. The current implementation is also
backward compatible for the previously created
PVCs.

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-09 13:36:03 +00:00
Niels de Vos
011d4fc81c cleanup: create k8s.io/mount-utils Mounter only once
Recently the k8s.io/mount-utils package added more runtime dectection.
When creating a new Mounter, the detect is run every time. This is
unfortunate, as it logs a message like the following:

```
mount_linux.go:283] Detected umount with safe 'not mounted' behavior
```

This message might be useful, so it probably good to keep it.

In Ceph-CSI there are various locations where Mounter instances are
created. Moving that to the DefaultNodeServer type reduces it to a
single place. Some utility functions need to accept the additional
parameter too, so that has been modified as well.

See-also: kubernetes/kubernetes#109676
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-21 07:14:43 +00:00
takeaki-matsumoto
1025871021 cephfs: Support mount option on nodeplugin
add mount options on nodeplugin side

Signed-off-by: takeaki-matsumoto <takeaki.matsumoto@linecorp.com>
2022-07-18 22:04:12 +00:00
Madhu Rajanna
f171143135 cephfs: round to cephfs size to multiple of 4Mib
Due to the bug in the df stat we need to round off
the subvolume size to align with 4Mib.

Note:- Minimum supported size in cephcsi is 1Mib,
we dont need to take care of Kib.

fixes #3240

More details at https://github.com/ceph/ceph/pull/46905

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-13 18:32:40 +00:00
Niels de Vos
14ba1498bf util: reduce systemd related errors while mounting
There are regular reports that identify a non-error as the cause of
failures. The Kubernetes mount-utils package has detection for systemd
based environments, and if systemd is unavailable, the following error
is logged:

    Cannot run systemd-run, assuming non-systemd OS
    systemd-run output: System has not been booted with systemd as init
    system (PID 1). Can't operate.
    Failed to create bus connection: Host is down, failed with: exit status 1

Because of the `failed` and `exit status 1` error message, users might
assume that the mounting failed. This does not need to be the case. The
container-images that the Ceph-CSI projects provides, do not use
systemd, so the error will get logged with each mount attempt.

By using the newer MountSensitiveWithoutSystemd() function from the
mount-utils package where we can, the number of confusing logs get
reduced.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-04 10:02:54 +00:00
Prasanna Kumar Kalever
caf4090657 rbd: provide option to disable setting metadata on rbd images
As we added support to set the metadata on the rbd images created for
the PVC and volume snapshot, by default metadata is set on all the images.

As we have seen we are hitting issues#2327 a lot of times with this,
we start to leave a lot of stale images. Currently, we rely on
`--extra-create-metadata=true` to decide to set the metadata or not,
we cannot set this option to false to disable setting metadata because we
use this for encryption too.

This changes is to provide an option to disable setting the image
metadata when starting cephcsi.

Fixes: #3009
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Madhu Rajanna
7a2dd4c3cf rbd: create token and use it for vault SA
create the token if kubernetes version in
1.24+ and use it for vault sa.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Rakshith R <rar@redhat.com>
2022-06-17 11:37:59 +00:00
Prasanna Kumar Kalever
2880c25fd6 rbd: set cluster Name as metadata on the image
This change helps read the cluster name from the cmdline args,
the provisioner will set the same on the RBD images.

Fixes: #2973
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-08 16:23:59 +00:00
Prasanna Kumar Kalever
deb003e605 cleanup: use prefix instead of hardcoding csiParameterPrefix
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-08 16:23:59 +00:00
Madhu Rajanna
1952a9b4b3 ci: fix all linter errors found in golangci-lint
Fixing all the linter errors found in golang-ci
lint v1.46.2

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-03 12:55:54 +00:00
Prasanna Kumar Kalever
bac33262ae rbd: add unset volume/snapshot metadata utility functions
Added
GetVolumeMetadataKeys()
GetSnaoshotMetadataKeys()
unsetVolumeMetadata() and
unsetSnapshotMetadata()

functions.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-05-12 15:54:09 +00:00
Marcus Röder
a95a6213eb util: support systems using the new cgroup v2 structure
With cgroup v2, the location of the pids.max file changed and so did the
/proc/self/cgroup file

new /proc/self/cgroup file
`
0::/user.slice/user-500.slice/session-14.scope
`

old file:
`
11:pids:/user.slice/user-500.slice/session-2.scope
10:blkio:/user.slice
9:net_cls,net_prio:/
8:perf_event:/
...
`

There is no directory per subsystem (e.g. /sys/fs/cgroup/pids) any more, all
files are now in one directory.

fixes: https://github.com/ceph/ceph-csi/issues/3085

Signed-off-by: Marcus Röder <m.roeder@yieldlab.de>
2022-05-07 20:38:48 +00:00
Madhu Rajanna
d2bc9743f7 cephfs: add netNamespaceFilePath for CephFS
as same host directory is not shared between
the cephfs and the rbd plugin pod. we need
to keep the netNamespaceFilePath separately
for both cephfs and rbd. CephFS plugin will
use this path to execute mount -t commands.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-19 12:28:46 +00:00
Madhu Rajanna
eb4bfb7326 cleanup: use block comment for ClusterInfo example
Adjusted the mix of tabs and the spaces and also
used block comment for better readability.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-19 12:28:46 +00:00
Madhu Rajanna
b4acbd08a5 rbd: move radosNamespace to RBD section
As radosNamespace is more specific to
RBD not the general ceph configuration. Now
we introduced a new RBD section for RBD specific
options, Moving the radosNamespace to RBD section
and keeping the radosNamespace still under the
global ceph level configration for backward
compatibility.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-19 12:28:46 +00:00
Madhu Rajanna
766346868e util: Add RBD specific options in clusterInfo
As the netNamespaceFilePath can be separate for
both cephfs and rbd adding the netNamespaceFilePath
path for RBD, This will help us to keep RBD and
CephFS specific options separately.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-19 12:28:46 +00:00
Madhu Rajanna
c245436ec4 util: fix logging in ExecuteCommandWithNSEnter
log the nsenter and its argument after executing
the command with the nsenter CLI.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-14 12:17:21 +00:00
Niels de Vos
28369702d2 nfs: use go-ceph API for creating/deleting exports
Recent versions of Ceph allow calling the NFS-export management
functions over the go-ceph API.

This seems incompatible with older versions that have been tested with
the `ceph nfs` commands that this commit replaces.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-04-14 08:01:45 +00:00
Humble Chirammal
959df4dbac doc: correct typos in struct field comments and release.md
corrected strings in the release guide and util server.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-04-11 06:23:25 +00:00
Prasanna Kumar Kalever
41fe2c7dda rbd: set metadata on the snapshot
Set snapshot-name/snapshot-namespace/snapshotcontent-name details
on RBD backend snapshot image as metadata on snapshot

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-08 15:43:14 +00:00
Prasanna Kumar Kalever
ae5925f04c rbd: update PV/PVC metadata on a reattach of PV
Example if a PVC was delete by setting `persistentVolumeReclaimPolicy` as
`Retain` on PV, and PV is reattached to a new PVC, we make sure to update
PV/PVC image metadata on a PV reattach.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-08 15:43:14 +00:00
Prasanna Kumar Kalever
4d750ed0e5 rbd: add set/Get VolumeMetadata() utility function
Define and use PV and PVC metadata keys used by external provisioner.
The CSI external-provisioner (v1.6.0+) introduces the
--extra-create-metadata flag, which automatically sets map<string, string>
parameters in the CSI CreateVolumeRequest.

Add utility functions to set/Get PV/PVC/PVCNamespace metadata on image

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-04-08 15:43:14 +00:00
Madhu Rajanna
7b2aef0d81 util: add support for the nsenter
add support to run rbd map and mount -t
commands with the nsenter.

complete design of pod/multus network
is added here https://github.com/rook/rook/
blob/master/design/ceph/multus-network.md#csi-pods

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-04-08 10:23:21 +00:00
Thibaut Blanchard
e874c9c11b rbd: fix topology snapshot pool
Restoring a snapshot with a new PVC results with a wrong
dataPoolName in case of initial volume linked
to a storageClass with topology constraints and erasure coding.

Signed-off-by: Thibaut Blanchard <thibaut.blanchard@gmail.com>
2022-03-30 04:40:30 +00:00
Robert Vasek
f6ae612003 util: added reference tracker
RT, reference tracker, is key-based implementation of a reference counter.
Unlike an integer-based counter, RT counts references by tracking unique
keys. This allows accounting in situations where idempotency must be
preserved. It guarantees there will be no duplicit increments or decrements
of the counter.

Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-03-27 19:24:26 +00:00
Madhu Rajanna
366c2ace31 util: add helper to get pvcnamespace from input
added helper function to return the pvc namespace
name from the input parameters.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-21 08:54:43 +00:00
Madhu Rajanna
772fe8d6c8 util: add helper function to strip kube parameters
added helper function to strip the kubernetes
specific parameters from the volumeContext as
volumeContext is storaged in the PV volumeAttributes

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-03-21 08:54:43 +00:00
Robert Vasek
80dda7cc30 cephfs: detect corrupt ceph-fuse mounts and try to remount
Mounts managed by ceph-fuse may get corrupted by e.g. the ceph-fuse process
exiting abruptly, or its parent container being terminated, taking down its
child processes with it.

This commit adds checks to NodeStageVolume and NodePublishVolume procedures
to detect whether a mountpoint in staging_target_path and/or target_path is
corrupted, and remount is performed if corruption is detected.

Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2022-03-10 06:05:52 +00:00
Rakshith R
3203673d17 cleanup: remove ceph.conf WA options which are already fixed
This commit removes ceph.conf WA options:
```
     # Workaround for http://tracker.ceph.com/issues/23446
     fuse_set_user_groups = false

     # ceph-fuse which uses libfuse2 by default has write buffer size of 2KiB
     # adding 'fuse_big_writes = true' option by default to override this limit
     # see https://github.com/ceph/ceph-csi/issues/1928
     fuse_big_writes = true
```
Since they are already fixed.

Refer: https://tracker.ceph.com/issues/44885
Refer: https://tracker.ceph.com/issues/23446
Closes: #2825

Signed-off-by: Rakshith R <rar@redhat.com>
2022-02-04 15:42:32 +00:00
Madhu Rajanna
28fef9b379 cleanup: remove thick provisioning code
This commit removes the thick provisioning
code as thick provisioning is deprecated in
cephcsi 3.5.0.

fixes: #2795

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-28 11:17:15 +00:00
Madhu Rajanna
9c841c83d4 cleanup: rename errorPair to pairError
to fix the errname check renaming the
struct.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
c67bacdb11 cleanup: use %s instead of %w for t.Errorf
As t.Errorf does not support error-wrapping
directive using %s.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00
Madhu Rajanna
813f6c30cc cleanup: use WriteString instead of Write
use WriteString instead of Write  for the temp
files.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-01-24 05:25:11 +00:00