add support to run rbd map and mount -t
commands with the nsenter.
complete design of pod/multus network
is added here https://github.com/rook/rook/
blob/master/design/ceph/multus-network.md#csi-pods
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
OIDC token file path has been modified from
`/var/run/secrets/token` to `/run/secrets/tokens`.
This has been done to ensure compliance with
FHS 3.0.
refer:
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s13.html
Signed-off-by: Rakshith R <rar@redhat.com>
With Amazon STS and kubernetes cluster is configured with
OIDC identity provider, credentials to access Amazon KMS
can be fetched using oidc-token(serviceaccount token).
Each tenant/namespace needs to create a secret with aws region,
role and CMK ARN.
Ceph-CSI will assume the given role with oidc token and access
aws KMS, with given CMK to encrypt/decrypt DEK which will stored
in the image metdata.
Refer: https://docs.aws.amazon.com/STS/latest/APIReference/welcome.htmlResolves: #2879
Signed-off-by: Rakshith R <rar@redhat.com>
Mounts managed by ceph-fuse may get corrupted by e.g. the ceph-fuse process
exiting abruptly, or its parent container being terminated, taking down its
child processes with it.
This commit adds checks to NodeStageVolume and NodePublishVolume procedures
to detect whether a mountpoint in staging_target_path and/or target_path is
corrupted, and remount is performed if corruption is detected.
Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
It was decided that latest ceph CSI versions would drop support for
older Kubernetes versions, making this check useless. So it was removed.
Removing this version check allows for the deployment of the CephFS
resizer component when using the helm chart on non vanilla kubernetes
clusters whose API server version are in the form of `1.x.y-abc+def-ghi`.
Signed-off-by: Benjamin Guillon <benjamin.guillon@cc.in2p3.fr>
avoid specifying the image feature dependencies
and add a link to rbd official document for
reference to the image feature dependencies.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Makes the rbd images features in the storageclass
as optional so that default image features of librbd
can be used. and also kept the option to user
to specify the image features in the storageclass.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as deep-flatten is long supported in ceph and its
enabled by default in the librbd, providing an option
to enable it in cephcsi for the rbd images we are
creating.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add selinuxMount flag to enable/disable /etc/selinux host mount inside pods
to support selinux-enabled filesystems
Signed-off-by: Francesco Astegiano <francesco.astegiano@gmail.com>
to show what ports containers are exposing add port sections to nodeplugin
and provisioner helm templates
Signed-off-by: Deividas Burškaitis <deividas.burskaitis@oxylabs.io>
removes namespace from non-namespaced storageclass
object.
fixes: #2714
Replacement for #2715 as we didnt receive any update
and PR is already closed.
Co-authored-by: jhrcz-ls
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit removes the thick provisioning
code as thick provisioning is deprecated in
cephcsi 3.5.0.
fixes: #2795
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
updating external resizer image version
from 1.3.0 to latest available release i.e
1.4.0
1.4.0 changelog link
https://github.com/kubernetes-csi/
external-resizer/blob/master/CHANGELOG/CHANGELOG-1.4.md
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit updates sidecars to the latest available version
which is compatible with kubernetes 1.23 and csi spec 1.5
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Deployments place all sockets for communicating with CSI components in
the shared `/csi` directory. The CSI-Addons socket was introduced
recently, but not configured to be in the same location (by default
placed in `/tmp`).
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When topology is disabled, the ClusterRoleBinding is not created in the Helm
chart. However, the nodeplugin needs access to volumeattachments for the volume
healer.
Signed-off-by: Steven Reitsma <steven@properchaos.nl>
When generating csiconfiguration from values the config.json key gets merged with cluster-mapping.json
as the config.json toYaml element supresses a newline.
This fixes the situation where configuration is generated as shown;
```
data:
config.json: |-
[{"clusterID":"....","monitors":["..."]}]cluster-mapping.json: |-
[]
```
Signed-off-by: Toby Jackson <toby@warmfusion.co.uk>
Version field for helm Chart.yaml needs to have SemVer 2
compatible value, therefore use "<MAJOR-VERSION>-canary"
on "devel" branch.
Refer: https://helm.sh/docs/topics/charts/#the-chartyaml-file
Signed-off-by: Rakshith R <rar@redhat.com>
This change allows the user to choose not to fallback to NBD mounter
when some ImageFeatures are absent with krbd driver, rather just fail
the NodeStage call.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
we dont need securityContext for the cephfs provisioner
pod as its not doing any special operations like mount,
selinux operations etc .
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
cephfs deployment doesnot need extra permission like
privileged,Capabilities and reduce unwanted volumes.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we dont need securityContext for the cephfs provisioner
pod as its not doing any special operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently, we delete the ceph client log file on unmap/detach.
This patch provides additional alternatives for users who would like to
persist the log files.
Strategies:
-----------
`remove`: delete log file on unmap/detach
`compress`: compress the log file to gzip on unmap/detach
`preserve`: preserve the log file in text format
Note that the default strategy will be remove on unmap, and these options
can be tweaked from the storage class
Compression size details example:
On Map: (with debug-rbd=20)
---------
$ ls -lh
-rw-r--r-- 1 root root 526K Sep 1 18:15
rbd-nbd-0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-d2e89c87-0b4d-11ec-8ea6-160f128e682d.log
On unmap:
---------
$ ls -lh
-rw-r--r-- 1 root root 33K Sep 1 18:15
rbd-nbd-0001-0024-fed5480a-f00f-417a-a51d-31d8a8144c03-0000000000000003-d2e89c87-0b4d-11ec-8ea6-160f128e682d.gz
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
cephLogDir: is a storage class option that is passed to rbd-nbd daemon.
cephLogDirHostPath: is a nodeplugin daemonset level option that helps in
using the right host-path while bind-mounting
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Problem:
--------
1. rbd-nbd by default logs to /var/log/ceph/ceph-client.admin.log,
Unfortunately, container doesn't have /var/log/ceph directory hence
rbd-nbd is not logging now.
2. Rbd-nbd logs are not persistent across nodeplugin restarts.
Solution:
--------
Provide a host path so that log directory is made available, and the
logs persist on the hostnode across container restarts.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
- mount host's /etc/selinux in node plugins
- process mount options in all code paths for cephfs volume options
Signed-off-by: Alexandre Lossent <alexandre.lossent@cern.ch>
This change resolves a typo for installing the CSIDriver
resource in Kubernetes clusters before 1.18,
where the apiVersion is incorrect.
See also:
https://kubernetes-csi.github.io/docs/csi-driver-object.html
[ndevos: replace v1betav1 in examples with v1beta1]
Signed-off-by: Thomas Kooi <t.j.kooi@avisi.nl>
Problem:
-------
For rbd nbd userspace mounter backends, after a restart of the nodeplugin
all the mounts will start seeing IO errors. This is because, for rbd-nbd
backends there will be a userspace mount daemon running per volume, post
restart of the nodeplugin pod, there is no way to restore the daemons
back to life.
Solution:
--------
The volume healer is a one-time activity that is triggered at the startup
time of the rbd nodeplugin. It navigates through the list of volume
attachments on the node and acts accordingly.
For now, it is limited to nbd type storage only, but it is flexible and
can be extended in the future for other backend types as needed.
From a few feets above:
This solves a severe problem for nbd backed csi volumes. The healer while
going through the list of volume attachments on the node, if finds the
volume is in attached state and is of type nbd, then it will attempt to
fix the rbd-nbd volumes by sending a NodeStageVolume request with the
required volume attributes like secrets, device name, image attributes,
and etc.. which will finally help start the required rbd-nbd daemons in
the nodeplugin csi-rbdplugin container. This will allow reattaching the
backend images with the right nbd device, thus allowing the applications
to perform IO without any interruptions even after a nodeplugin restart.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Nodeplugin needs below cluster roles:
persistentvolumes: get
volumeattachments: list, get
These additional permissions are needed by the volume healer. Volume healer
aims at fixing the volume health issues at the very startup time of the
nodeplugin. As part of its operations, volume healer has to run through
the list of volume attachments and understand details about each
persistentvolume.
The later commits will use these additional cluster roles.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The provisioner and node-plugin have the capability to connect to
Hashicorp Vault with a ServiceAccount from the Namespace where the PVC
is created. This requires permissions to read the contents of the
ServiceAccount from an other Namespace than where Ceph-CSI is deployed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit updates the helm chart documentations
with the configurations available while deploying
these helm charts.
Signed-off-by: Yati Padia <ypadia@redhat.com>
Current implementation of semvercompare fails against
pre-release versions. This commit fixes it by using
the entire version string at which csidriver api became GA.
s|">=1.18"|">=1.18.0-beta.1"
Fixes: #2039
Signed-off-by: Rakshith R <rar@redhat.com>
csidriver object can be created on the kubernetes
for below reason.
If a CSI driver creates a CSIDriver object,
Kubernetes users can easily discover the CSI
Drivers installed on their cluster
(simply by issuing kubectl get CSIDriver)
Ref: https://kubernetes-csi.github.io/docs/csi-driver-object.html#what-is-the-csidriver-object
attachRequired is always required to be set to
true to avoid issue on RWO PVC.
more details about it at https://github.com/rook/rook/pull/4332
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
set system-cluster-critical priorityclass on
provisioner pods. the system-cluster-critical is
having lowest priority compared to node-critical.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
set system-node-critical priority on the plugin
pods, as its the highest priority and this need to
be applied on plugin pods as its critical for
storage in cluster.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as provisioner need to get the configmap from
different namespace to check tenant configuration.
added the clusterrole get access for the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
PR #1736 made the kubelet path configurable. It also introduced a change in
the path to the CSI socket. By default the path is now
`/var/lib/kubelet/cephfs.csi.ceph.com/csi.sock` instead of
`/var/lib/kubelet/plugins/cephfs.csi.ceph.com/csi.sock`. This PR
restores the old default.
Signed-off-by: Matthias Neugebauer <matthias.neugebauer@uni-muenster.de>
Tenants can have their own ConfigMap that contains connection parameters
to the Vault Service where the PV encyption keys are located. It is
possible for a Tenant to use a different Vault Service than the one
configured by the Storage Admin who deployed Ceph-CSI.
For this, the node-plugin needs to be able to read the ConfigMap from
the Tenants namespace.
See-also: docs/design/proposals/encryption-with-vault-tokens.md
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In order to fetch the Kubernetes Secret with the Vault Token for a
Tenant, the ClusterRole needs to allow reading Secrets from all
Kubernetes Namespaces (each Tenant has their own Namespace).
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This argument in csi-provisioner sidecar allows us to receive pv/pvc
name/namespace metadata in the createVolume() request.
For ex:
csi.storage.k8s.io/pvc/name
csi.storage.k8s.io/pvc/namespace
csi.storage.k8s.io/pv/name
This is a useful information which can be used depend on the use case we
have at our driver. The features like vault token enablement for multi
tenancy, RBD mirroring ..etc can consume this based on the need.
Refer: #1305
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
external-provisioner is exposing a new argument
to set the default fstype while starting the provisioner
sidecar, if the fstype is not specified in the storageclass
the default fstype will be applied for the pvc created from
the storageclass.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This PR makes the changes in csi templates and
upgrade documentation required for updating
csi sidecar images.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
removed unwanted extra arguments from the helm templates
and added a single value kubeletDir to make the kubelet
root-dir configurable.
previously used variables like socketDir,registrationDir
and pluginDir is removed now because if we have the kubelet
path we can derive all other required path for cephcsi to
work properly.
fixes: #1475
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
updated deployment template for the new controller and
also added `update` configmap RBAC for the controller
as the controller uses the configmap for the leader
election.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
MD014 - Dollar signs used before commands without showing output
The dollar signs are unnecessary, it is easier to copy and paste and
less noisy if the dollar signs are omitted. Especially when the
command doesn't list the output, but if the command follows output
we can use `$ ` (dollar+space) mainly to differentiate between
command and its ouput.
scenario 1: when command doesn't follow output
```console
cd ~/work
```
scenario 2: when command follow output (use dollar+space)
```console
$ ls ~/work
file1 file2 dir1 dir2 ...
```
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
instead of keeping the log level at 5, which
is required only for tracing the errors. this commit
adds an option for users to configure the log level
for all containers.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When replication count is >1 of the provisioner, the added anti-affinity rules
will prevent provisioner operators from scheduling on the same nodes. The
kubernetes scheduler will spread the pods across nodes to improve availability
during node failures.
Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
The lifecycle preStop hook fails on container stop / exit
because /bin/sh is not present in the driver registrar container
image.
the driver-registrar will remove the socket file
before stopping. we dont need to have any preStop hook
to remove the socket as it was not working as expected
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The aggregate clusterrole were designed for the scenario where
the rules are not completely owned by one component.
the aggregate rules can be removed and simplify
certain issues around upgrades.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as v1.0.0 is deprecated we need to remove the support
for it in the Next coming (v3.0.0) release. This PR
removes the support for the same.
closes#882
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Added maxsnapshotsonimage flag to flatten
the older rbd images on the chain to avoid
issue in krbd.The limit is in krbd since it
only allocate 1 4KiB page to handle all the
snapshot ids for an image.
The max limit is 510 as per
https://github.com/torvalds/linux/blob/
aaa2faab4ed8e5fe0111e04d6e168c028fe2987f/drivers/block/rbd.c#L98
in cephcsi we arekeeping the default to 450 to reserve 10%
to avoid issues.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added skipForceFlatten flag to skip
the image deptha and skip image flattening.
This will be very useful if the kernel is
not listed in cephcsi which supports deep
flatten fauture.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added Hardlimit and Softlimit flags for cephcsi
arguments. When the Softlimit is reached cephcsi
will start a background task to flatten the rbd
image and return success and if the hardlimit
is reached it will start a background task
to flatten the rbd image and return ready
to use as false to make sure that the image
will not be used until it is flatten.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
--retry-interval-start:
This is initial retry interval for failures. 1 second is used by default.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>