Problem:
-------
For rbd nbd userspace mounter backends, after a restart of the nodeplugin
all the mounts will start seeing IO errors. This is because, for rbd-nbd
backends there will be a userspace mount daemon running per volume, post
restart of the nodeplugin pod, there is no way to restore the daemons
back to life.
Solution:
--------
The volume healer is a one-time activity that is triggered at the startup
time of the rbd nodeplugin. It navigates through the list of volume
attachments on the node and acts accordingly.
For now, it is limited to nbd type storage only, but it is flexible and
can be extended in the future for other backend types as needed.
From a few feets above:
This solves a severe problem for nbd backed csi volumes. The healer while
going through the list of volume attachments on the node, if finds the
volume is in attached state and is of type nbd, then it will attempt to
fix the rbd-nbd volumes by sending a NodeStageVolume request with the
required volume attributes like secrets, device name, image attributes,
and etc.. which will finally help start the required rbd-nbd daemons in
the nodeplugin csi-rbdplugin container. This will allow reattaching the
backend images with the right nbd device, thus allowing the applications
to perform IO without any interruptions even after a nodeplugin restart.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Nodeplugin needs below cluster roles:
persistentvolumes: get
volumeattachments: list, get
These additional permissions are needed by the volume healer. Volume healer
aims at fixing the volume health issues at the very startup time of the
nodeplugin. As part of its operations, volume healer has to run through
the list of volume attachments and understand details about each
persistentvolume.
The later commits will use these additional cluster roles.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The provisioner and node-plugin have the capability to connect to
Hashicorp Vault with a ServiceAccount from the Namespace where the PVC
is created. This requires permissions to read the contents of the
ServiceAccount from an other Namespace than where Ceph-CSI is deployed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
csidriver object can be created on the kubernetes
for below reason.
If a CSI driver creates a CSIDriver object,
Kubernetes users can easily discover the CSI
Drivers installed on their cluster
(simply by issuing kubectl get CSIDriver)
Ref: https://kubernetes-csi.github.io/docs/csi-driver-object.html#what-is-the-csidriver-object
attachRequired is always required to be set to
true to avoid issue on RWO PVC.
more details about it at https://github.com/rook/rook/pull/4332
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
use the latest version of csi-snapshotter sidecar image at the
provisioner templates
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
set system-cluster-critical priorityclass on
provisioner pods. the system-cluster-critical is
having lowest priority compared to node-critical.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
set system-node-critical priority on the plugin
pods, as its the highest priority and this need to
be applied on plugin pods as its critical for
storage in cluster.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as provisioner need to get the configmap from
different namespace to check tenant configuration.
added the clusterrole get access for the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Tenants can have their own ConfigMap that contains connection parameters
to the Vault Service where the PV encyption keys are located. It is
possible for a Tenant to use a different Vault Service than the one
configured by the Storage Admin who deployed Ceph-CSI.
For this, the node-plugin needs to be able to read the ConfigMap from
the Tenants namespace.
See-also: docs/design/proposals/encryption-with-vault-tokens.md
Signed-off-by: Niels de Vos <ndevos@redhat.com>
if the kms encryption configmap is not mounted
as a volume to the CSI pods, add the code to
read the configuration from the kubernetes. Later
the code to fetch the configmap will be moved to
the new sidecar which is will talk to respective
CO to fetch the encryption configurations.
The k8s configmap uses the standard vault spefic
names to add the configurations. this will be converted
back to the CSI configurations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In order to fetch the Kubernetes Secret with the Vault Token for a
Tenant, the ClusterRole needs to allow reading Secrets from all
Kubernetes Namespaces (each Tenant has their own Namespace).
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This argument in csi-provisioner sidecar allows us to receive pv/pvc
name/namespace metadata in the createVolume() request.
For ex:
csi.storage.k8s.io/pvc/name
csi.storage.k8s.io/pvc/namespace
csi.storage.k8s.io/pv/name
This is a useful information which can be used depend on the use case we
have at our driver. The features like vault token enablement for multi
tenancy, RBD mirroring ..etc can consume this based on the need.
Refer: #1305
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
external-provisioner is exposing a new argument
to set the default fstype while starting the provisioner
sidecar, if the fstype is not specified in the storageclass
the default fstype will be applied for the pvc created from
the storageclass.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
with csi-provisioner v2.x the topology based
provisioning will not have any backward compatibility
with older version of kubernetes, if the nodes are
not labeled with topology keys, the pvc creation
is going to get fail with error `accessibility
requirements: no available topology found`, disabling
the topology based provisioning by default, if user want
to use it he can always enable it.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This PR makes the changes in csi templates and
upgrade documentation required for updating
csi sidecar images.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
updated deployment template for the new controller and
also added `update` configmap RBAC for the controller
as the controller uses the configmap for the leader
election.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When running a simple build with only the required arguments, the
following warning are reported:
$ buildah bud --build-arg=BASE_IMAGE=ceph/ceph:v15 --build-arg=GO_ARCH=amd64 -f ./deploy/cephcsi/image/Dockerfile .
...
STEP 15: COPY . ${SRC_DIR}
STEP 16: RUN make cephcsi
cephcsi image settings: quay.io/cephcsi/cephcsi version canary
make: git: Command not found
make: git: Command not found
if [ ! -d ./vendor ]; then (go mod tidy && go mod vendor); fi
make: git: Command not found
...
STEP 23: COMMIT
Getting image source signatures
...
Writing manifest to image destination
Storing signatures
--> 239b19c4049
git is used to detect the current commit, and store it in the binary
that is built. Without the commit, the "Git Commit:" in the output is
empty, making it impossible to get the exact version:
$ podman run --rm 239b19c4049 --version
Cephcsi Version: canary
Git Commit:
Go Version: go1.15
Compiler: gc
Platform: linux/amd64
Kernel: 5.8.4-200.fc32.x86_64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The added anti-affinity rules prevent provisioner operators from scheduling on
the same nodes. The kubernetes scheduler will spread the pods across nodes to
improve availability during node failures.
Signed-off-by: Nico Berlee <nico.berlee@on2it.net>
The lifecycle preStop hook fails on container stop / exit
because /bin/sh is not present in the driver registrar container
image.
the driver-registrar will remove the socket file
before stopping. we dont need to have any preStop hook
to remove the socket as it was not working as expected
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The aggregate clusterrole were designed for the scenario where
the rules are not completely owned by one component.
the aggregate rules can be removed and simplify
certain issues around upgrades.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as v1.0.0 is deprecated we need to remove the support
for it in the Next coming (v3.0.0) release. This PR
removes the support for the same.
closes#882
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
`make image-cephcsi` will fail when Golang is not installed. There is no
strict requirement for Golang to be available, it is only used to gather
the architecture of the OS where the image is built. It is possible to
build the image successfully with `make image-cephcsi GOARCH=amd64`.
In case Golang is not installed, GOARCH can not be detected
automatically. This will cause a failure while installing Golang in the
container image. Because the failure is not very clear, display a
warning in the case the GO_ARCH (from ${GOARCH} in the Makefile) is not
set.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
added Hardlimit and Softlimit flags for cephcsi
arguments. When the Softlimit is reached cephcsi
will start a background task to flatten the rbd
image and return success and if the hardlimit
is reached it will start a background task
to flatten the rbd image and return ready
to use as false to make sure that the image
will not be used until it is flatten.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
For building a arm64 container image on amd64, it is needed to configure
the system specifically for that. In order to prevent including a amd64
executable in a arm64 image, a check has been added.
When running an arm64 executable on a amd64 system, an error should
occur when cross architecture containers are not supported. This can be
triggered when running `make image-cephcsi GO_ARCH=arm64` on a amd64
system.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Considering this parameter is available for other sidecars we should
have a parity between the sidecars. Adding it for the same reason
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
When building with go-ceph, there are several dynamically linked
libraries used directly (libcephfs, librados and librbd), and many more
indirectly.
By adding an additional RUN statement to check if all libraries are
available in the final image, problems related to missing libraries
should be caught before publishing/consuming the image.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
unlike other containers, image pull policy of
csi-snapshotter was set to "Always", which can
be changed to pull only if not present.
Signed-off-by: Yug Gupta <ygupta@redhat.com>
with the help of qemu-user-static we can run
different architecute contains.
more info at https://github.com/multiarch/qemu-user-static
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add support for multi architecture build
for cephcsi. with multistage docker build
we we build cephcsi binary for both arm64
and amd64 architecture.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>