Currently rbd CSI plugin uses formatAndMount of
mount.SafeFormatAndMount. This does not allow to pass or use
specific formatting arguments with it. This patch introduce
RBD specific formatting options with both xfs and ext4,
for example: -E no-discard with ext4 and -k option with
XFS to boost formatting performance of RBD device.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
(cherry picked from commit 0e6617e1ff)
if any on going opearation is seen,we
have to return Abort error message
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 6aac399075)
The storage class already takes MountOptions(MountFlags), these are the
bind mount options. Some of these options may not be recognised by the
cephfs mount. Hence added a new parameterin Storage Class for
- cephfs kernel mount options,
- ceph-fuse mount options
Ceph kernel mount options are different from ceph-fuse options, hence
added two different parameters.
Signed-off-by: Poornima G <pgurusid@redhat.com>
Ceph kernel client is more performant than ceph fuse client.
The kernel client has Quota support only in the kernel version >=4.17.
Hence use ceph kernel client when the kernel version is >=4.17.
Signed-off-by: Poornima G <pgurusid@redhat.com>
Sometime rbd images are mapped even if the
connection timeout error occurs, this will
try to unmap if the received error message
is connection timeout.This will fix stale maps
and rbd image deletion issue
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
if both controller and nodeserver flags are set/unset
cephcsi will start both server,
if only one flag is set, it will start relavent
service.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This will help user to check whats
the actual error. if the config file
is having issue or the clusterid is
not valid.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Image deletion takes time proportional to the size of the
image. Hence, ceph manager is enhanced to support async
deletion of an image, or rather passing the task of
deleting an image to the ceph manager.
This commit leverages the ceph manager enhancement in the CSI code.
NOTE: This is tested against a ceph cluster that is running
Ceph master version of the code. Once other releases
catch up in terms of the feature, the optimization would be
available to the CSI driver as well.
Fixes: #523
Signed-off-by: ShyamsundarR <srangana@redhat.com>
The container runtime CRI-O limits the number of PIDs to 1024 by
default. When many PVCs are requested at the same time, it is possible
for the provisioner to start too many threads (or go routines) and
executing 'rbd' commands can start to fail. In case a go routine can not
get started, the process panics.
The PID limit can be changed by passing an argument to kubelet, but this
will affect all pids running on a host. Changing the parameters to
kubelet is also not a very elegant solution.
Instead, the provisioner pod can change the configuration itself. The
pod is running in privileged mode and can write to /sys/fs/cgroup where
the limit is configured.
With this change, the limit is configured to 'max', just as if there is
no limit at all. The logs of the csi-rbdplugin in the provisioner pod
will reflect the change it makes when starting the service:
$ oc -n rook-ceph logs -c csi-rbdplugin csi-rbdplugin-provisioner-0
..
I0726 13:59:19.737678 1 cephcsi.go:127] Initial PID limit is set to 1024
I0726 13:59:19.737746 1 cephcsi.go:136] Reconfigured PID limit to -1 (max)
..
It is possible to pass a different limit on the commandline of the
cephcsi executable. The following flag has been added:
--pidlimit=<int> the PID limit to configure through cgroups
This accepts special values -1 (max) and 0 (default, do not
reconfigure). Other integers will be the limit that gets configured in
cgroups.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In unstage we now adhere to the transaction (or order of steps)
done in Stage. To enable this we stash the image meta data
into a local file on the staging path for use with unstage
request.
This helps in unmapping a stale map, in case the mount or
other steps in the transaction are complete.
Signed-off-by: ShyamsundarR <srangana@redhat.com>
This change also starts mapping nbd based access using ther rbd CLI
as, it is a prerequisite to get device listing for nbd as well.
Signed-off-by: ShyamsundarR <srangana@redhat.com>
This commit moves the mounting of a block volumes and filesystems
to a sub-file (already the case) or a sub-dir within the staging
path.
This enables using the staging path to store any additional data
regarding the mount. For example, this will be extended in the
future to store the fsid of the cluster, and maybe the pool name
to map unmap requests to the right image.
Also, this fixes the noted hack in the code, to determine in a
common manner if there is a mount on the passed in staging path.
Signed-off-by: ShyamsundarR <srangana@redhat.com>
once we map the rbd image on a node
we will get the device name its mapped
in the map output itself,no need to
check the devicepath post rbd mapping
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It's CO responsibility to create the
stagingPath as per the CSI spec.
The CO SHALL ensure
// that the path is directory and that the process serving the
// request has `read` and `write` permission to that directory. The
// CO SHALL be responsible for creating the directory if it does not
// exist.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
... and not that of the FS subvolume group `csi`.
There is no reason for setting the mode of FS subvolume group `csi`
(a CephFS subdirectory) as 777. It's default mode is 755. It's
sufficient to set the mode of FS subvolumes within the subvolume group
to `777`.
Signed-off-by: Ramana Raja <rraja@redhat.com>
... instead of that of the `csi` subvolume group. The pool layout
specified via storage class's `pool` setting is a subvolume property
and not a subvolume group property. The `csi` subvolume group
may have subvolumes of different storage classes with different
pool layouts.
Fixes: #499
Signed-off-by: Ramana Raja <rraja@redhat.com>
if mapping of rbd device is passed and mounting
device to stagingpath fails or if chmod on targetpath fails
,which may leave up stale mapping if
unstage is called
this will be fixed by unmapping if somthing fails
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Every Ceph CLI that is invoked at present passes the key via the
--key option, and hence is exposed to key being displayed on
the host using a ps command or such means.
This commit addresses this issue by stashing the key in a tmp
file, which is again created on a tmpfs (or empty dir backed by
memory). Further using such tmp files as arguments to the --keyfile
option for every CLI that is invoked.
This prevents the key from being visible as part of the argument list
of the invoked program on the system.
Fixes: #318
Signed-off-by: ShyamsundarR <srangana@redhat.com>
Currently, provisioner creates user for every volume and nodeplugin
uses this user to mount that volume. But nodeplugin and provisioner
already have admin credentials, hence using the admin credentials
to mount the volume and getting rid of user creation for each volume.
Signed-off-by: Poornima G <pgurusid@redhat.com>
in NodeStage RPC call we have to map the
device to the node plugin and make sure the
the device will be mounted to the global path
in nodeUnstage request unmount the device from
global path and unmap the device
if the volume mode is block we will be creating
a file inside a stageTargetPath and it will be
considered as the global path
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
File descriptors in use to parse errors from a few command
invocations were incorrect. This led to inability to detect
certain errors cases and act accordingly.
One of the easiest noticeable issues was when an image is deleted
but its RADOS keys and maps are still intact. In such cases
the DeleteVolume call always errored out unable to find the
image rather than, proceed with cleaning up the RADOS objects
and returning a success.
The original method of using stdout was incorrect, as the command
was tested from within a shell script and the scripts STDIN/OUT/ERR
was redirected to understand behavior. This is now tested using just
the CLI in question, and also examining Ceph code, and further
testing a couple of edge conditions by deleting backing images
for PVs
Signed-off-by: ShyamsundarR <srangana@redhat.com>
update driver version and add git commit
to the image. This will help us to identify
what latest git commit image contains.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently CephFs provisioner mounts the ceph filesystem
and creates a subdirectory as a part of provisioning the
volume. Ceph now supports commands to provision fs subvolumes,
hance modify the provisioner to use ceph mgr commands to
(de)provision fs subvolumes.
Signed-off-by: Poornima G <pgurusid@redhat.com>
This commit adds support to mount and delete volumes provisioned by older
plugin versions (1.0.0) in order to support backward compatibility to 1.0.0
created volumes.
It adds back the ability to specify where older meta data was specified, using
the metadatastorage option to the plugin. Further, using the provided meta data
to mount and delete the older volumes.
It also supports a variety of ways in which monitor information may have been
specified (in the storage class, or in the secret), to keep the monitor
information current.
Testing done:
- Mount/Delete 1.0.0 plugin created volume with monitors in the StorageClass
- Mount/Delete 1.0.0 plugin created volume with monitors in the secret with
a key "monitors"
- Mount/Delete 1.0.0 plugin created volume with monitors in the secret with
a user specified key
- PVC creation and deletion with the current version (to ensure at the minimum
no broken functionality)
- Tested some negative cases, where monitor information is missing in secrets
or present with a different key name, to understand if failure scenarios work
as expected
Updates #378
Follow-up work:
- Documentation on how to upgrade to 1.1 plugin and retain above functionality
for older volumes
Signed-off-by: ShyamsundarR <srangana@redhat.com>
As detailed in issue #279, current lock scheme has hash
buckets that are count of CPUs. This causes a lot of contention
when parallel requests are made to the CSI plugin. To reduce
lock contention, this commit introduces granular locks per
identifier.
The commit also changes the timeout for gRPC requests to Create
and Delete volumes, as the current timeout is 10s (kubernetes
documentation says 15s but code defaults are 10s). A virtual
setup takes about 12-15s to complete a request at times, that leads
to unwanted retries of the same request, hence the increased
timeout to enable operation completion with minimal retries.
Tests to create PVCs before and after these changes look like so,
Before:
Default master code + sidecar provisioner --timeout option set
to 30 seconds
20 PVCs
Creation: 3 runs, 396/391/400 seconds
Deletion: 3 runs, 218/271/118 seconds
- Once was stalled for more than 8 minutes and cancelled the run
After:
Current commit + sidecar provisioner --timeout option set to 30 sec
20 PVCs
Creation: 3 runs, 42/59/65 seconds
Deletion: 3 runs, 32/32/31 seconds
Fixes: #279
Signed-off-by: ShyamsundarR <srangana@redhat.com>
Also reduced code duplication in fetching pool list from Ceph.
DeleteSnapshot like DeleteVolume, should return a success when it
detects that the snapshot keys are missing from the RADOS OMaps that
store the snapshot UUID to request name mapping.
This was missing in the code, and is now added.
Signed-off-by: ShyamsundarR <srangana@redhat.com>