This change allows the user to choose not to fallback to NBD mounter
when some ImageFeatures are absent with krbd driver, rather just fail
the NodeStage call.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Thick-provisioning was introduced to make accounting of assigned space
for volumes easier. When thick-provisioned volumes are the only consumer
of the Ceph cluster, this works fine. However, it is unlikely that this
is the case. Instead, accounting of the requested (thin-provisioned)
size of volumes is much more practical as different types of volumes can
be tracked.
OpenShift already provides cluster-wide quotas, which can combine
accounting of requested volumes by grouping different StorageClasses.
In addition to the difficult practise of allowing only thick-provisioned
RBD backed volumes, the performance makes thick-provisioning
troublesome. As volumes need to be completely allocated, data needs to
be written to the volume. This can take a long time, depending on the
size of the volume. Provisioning, cloning and snapshotting becomes very
much noticeable, and because of the additional time consumption, more
prone to failures.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
For static volume, the user will manually mounts
already existing image as a volume to the application
pods. As its a rbd Image, if the PVC is of type
fileSystem the image will be mapped, formatted
and mounted on the node,
If the user resizes the image on the ceph cluster.
User cannot not automatically resize the filesystem
created on the rbd image. Even if deletes and
recreates the kubernetes objects, the new size
will not be visible on the node.
With this changes During the NodeStageVolumeRequest
the nodeplugin will check the size of the mapped rbd
image on the node using the devicePath. and also
the rbd image size on the ceph cluster.
If the size is not matching it will do the file
system resize on the node as part of the
NodeStageVolumeRequest RPC call.
The user need to do below operation to see new size
* Resize the rbd image in ceph cluster
* Scale down all the application pods using the static
PVC.
* Make sure no application pods which are using the
static PVC is running on a node.
* Scale up all the application pods.
Validate the new size in application pod mounted
volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
cephLogDir: is a storage class option that is passed to rbd-nbd daemon.
cephLogDirHostPath: is a nodeplugin daemonset level option that helps in
using the right host-path while bind-mounting
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
As we have deprecated earlier versions than v3.3.0, it is not required
to keep the upgrade docs for the same. The upgrade doc for v3.2.0 to
v3.3.0 has been kept intact.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
added design doc to handle volumeID mapping in case
of the failover in the Disaster Recovery.
update #2118
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Mainly removed rbd-nbd mounter specified at the pre-upgrade
considerations affecting the restarts.
Also updated the 3.3 tags to 3.4
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Add documenation for Disaster Recovery
which steps to Failover and Failback in case
of a planned migration or a Disaster.
Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
In addition to the single ServiceAccount KMS support for Hashicorp
Vault, Ceph-CSI can now use a ServiceAccount per Tenant as well. This
adds the user-documentation with references to the example deployment
files.
Closes: #2222
Signed-off-by: Niels de Vos <ndevos@redhat.com>
A new KMS that supports Hashicorp Vault with the Kubernetes Auth backend
and ServiceAccounts per Tenant (Kubernetes Namespace).
Updates: #2222
Signed-off-by: Niels de Vos <ndevos@redhat.com>
As Travis CI `https://travis-ci.org/` is getting
shutdown date on June 15th. Either we need to move
to new place https://www.travis-ci.com/ or we can
switch to github action to push image and the helm
charts when a PR is merged.
fixes: #1781
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
At present the cert keys are not unique which is not correct.
The keys in the secret should be unique and this patch address
the same.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
As we have v3.3 as the latest release
updating the upgrade doc in the devel
branch to point to the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In the case of the Async DR, the volumeID will
not be the same if the clusterID or the PoolID
is different, With Earlier implementation, it
is expected that the new volumeID mapping is
stored in the rados omap pool. In the case of the
ControllerExpand or the DeleteVolume Request,
the only volumeID will be sent it's not possible
to find the corresponding poolID in the new cluster.
With This Change, it works as below
The csi-rbdplugin-controller will watch for the PV
objects, when there are any PV objects created it
will check the omap already exists, If the omap doesn't
exist it will generate the new volumeID and it checks for
the volumeID mapping entry in the PV annotation, if the
mapping does not exist, it will add the new entry
to the PV annotation.
The cephcsi will check for the PV annotations if the
omap does not exist if the mapping exists in the PV
annotation, it will use the new volumeID for further
operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Current rbd plugin only supports the layering feature
for rbd image. Add exclusive-lock and journaling image
features for the rbd.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: woohhan <woohyung_han@tmax.co.kr>
Update the emcrypted PVC implementation doc with references to the new
EncryptedKMS, DEKStore and VolumeEncryption types.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add an option to the StorageClass to support creating fully allocated
(thick provisioned) RBD images
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When a volume was provisioned by an old Ceph-CSI provisioner, the
metadata of the RBD image will contain `requiresEncryption` to indicate
a passphrase needs to be created. New Ceph-CSI provisioners create the
passphrase in the CreateVolume request, and set `encryptionPrepared`
instead.
When a new node-plugin detects that `requiresEncryption` is set in the
RBD image metadata, it will fallback to the old behaviour.
In case `encryptionPrepared` is read from the RBD image metadata, the
passphrase is used to cryptsetup/format the image.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
currently, the keys for kms certificates/keys in a
secret is ca.cert, tls.cert and
tls.key, this commit changes the key from ca.cert
and tls.cert to cert and tls.key to key.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Added a option to pass the client certificate
and the client certificate key for the vault token
based encryption.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The yaml files for RBD encryption are located in examples/kms/vault, and
not in the examples/rbd directory.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In addition to the Vault KMS support (uses Kubernetes ServiceAccount),
there is the new Vault Tokens KMS feature.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Design for adding a new KMS type "VaultTokens" that can be used to
configure a Hashicorp Vault service where each tenant has their own
personal token to manage encryptions keys for PVCs.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
RBD Snapshot doc was the part of the README.md file. Hence,
renamed the cephfs-snap-clone.md file to snap-clone.md file
and moved the rbd snapshot document there.
Signed-off-by: yati1998 <ypadia@redhat.com>
We do not have `text` in the new section of the MarkDown Rules. Hence
dropping them.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Update the coding guide about MD014, i.e.
Dollar signs used before commands without showing output
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
MD014 - Dollar signs used before commands without showing output
The dollar signs are unnecessary, it is easier to copy and paste and
less noisy if the dollar signs are omitted. Especially when the
command doesn't list the output, but if the command follows output
we can use `$ ` (dollar+space) mainly to differentiate between
command and its ouput.
scenario 1: when command doesn't follow output
```console
cd ~/work
```
scenario 2: when command follow output (use dollar+space)
```console
$ ls ~/work
file1 file2 dir1 dir2 ...
```
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Added a document which contains the steps
and RBD CLI commands we execute when we create
a kubernetes snapshot, delete kubernetes snapshot,
Restore a snapshot to a new PVC,Kubernetes volume
cloning and kubernetes PVC deletion.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as v1.0.0 is deprecated we need to remove the support
for it in the Next coming (v3.0.0) release. This PR
removes the support for the same.
closes#882
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Added maxsnapshotsonimage flag to flatten
the older rbd images on the chain to avoid
issue in krbd.The limit is in krbd since it
only allocate 1 4KiB page to handle all the
snapshot ids for an image.
The max limit is 510 as per
https://github.com/torvalds/linux/blob/
aaa2faab4ed8e5fe0111e04d6e168c028fe2987f/drivers/block/rbd.c#L98
in cephcsi we arekeeping the default to 450 to reserve 10%
to avoid issues.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
updated upgrade documentation to remove
the snapshot created by alpha driver before
upgrade of CSI driver as beta snapshot is not
backward compatible with the alpha snapshot.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added skipForceFlatten flag to skip
the image deptha and skip image flattening.
This will be very useful if the kernel is
not listed in cephcsi which supports deep
flatten fauture.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added Hardlimit and Softlimit flags for cephcsi
arguments. When the Softlimit is reached cephcsi
will start a background task to flatten the rbd
image and return success and if the hardlimit
is reached it will start a background task
to flatten the rbd image and return ready
to use as false to make sure that the image
will not be used until it is flatten.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Updated storageclass and snapshotclass
to include the name prefix for naming
subvolumes and snapshots.
Fixes: #1087
Signed-off-by: Yug Gupta <ygupta@redhat.com>
If the PR is having trivial changes or the reviewer is
confident enough that PR doesn't need a second review,
the reviewer can set `ready-to-merge` label on the PR.
The bot will merge the PR if it's having one approval and the
label `ready-to-merge`
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
With the change in #382, support for static PV for CephFS was added.
This change is to update the already existing doc for the same.
Issue: #669
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
Added step to identify alpha snapshot CRD.
Added step to delete alpha CRD and link
for installing beta CRD.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The commitlint CI job uses the configuration from .commitlintrc.yaml
which contains the different components that Ceph-CSI uses. A short
description of each component has been added, so that contributors
understand what component to mention in the prefix of the subject in
commit messages.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Added document on the standard user need to follow
when writting the commit message and to include
sign-off in commit message.
source: https://probot.github.io/apps/dco/
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Updated golang version to 1.13.x and
also updated user to set GO111MODULE=on
and CGO_ENABLED=1 when doing development
in cephcsi
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The internal/ directory in Go has a special meaning, and indicates that
those packages are not meant for external consumption. Ceph-CSI does
provide public APIs for other projects to consume. There is no plan to
keep the API of the internally used packages stable.
Closes: #903
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In (standard, non-privileged) container environments the /sys/fs/cgroup
mountpoint is not available. This would cause the tests to fail, as
TestGetPIDLimit() tries to write to the cgroup configuration.
The test will work when run as root on a privileged container or
directly on a host (as Travis CI does).
Setting the CEPH_CSI_RUN_ALL_TESTS environment variable to a non-empty
value will cause the test to be executed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
As kubernetes CSI sidecar is exposing the
GRPC mertics we can make use of the same in
ceph-csi we dont need to expose our own.
update: #881
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This makes it possible to build on any platform that supports Linux
containers. The container image used for building is created once, or on
updating the `scripts/Dockerfile.build` and is cached afterwards.
To build the executable in a container, use `make containerized-build`
and everything will be done automatically. The executable will also be
available on the usual location.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This PR updates the upgrade doc to handle the
node drain issue what we have seen in
https://github.com/ceph/ceph-csi/issues/756
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
this allows administrators to override the naming prefix for both volumes and snapshots
created by the rbd plugin.
Signed-off-by: Reinier Schoof <reinier@skoef.nl>
PR #282 introduces the mount cache to
solve cephfs fuse mount issue when cephfs plugin pod
restarts .This is not working as intended. This PR removes
the code for maintainability.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
* moves KMS type from StorageClass into KMS configuration itself
* updates omapval used to identify KMS to only it's ID without the type
why?
1. when using multiple KMS configurations (not currently supported)
automated parsing of kms configuration will be failing because some
entries in configs won't comply with the requested type
2. less options are needed in the StorageClass and less data used to
identify the KMS
Signed-off-by: Vasyl Purchel vasyl.purchel@workday.com
Signed-off-by: Andrea Baglioni andrea.baglioni@workday.com
- adds proposal document for PVC encryption from PR448
- adds per-volume encription by generating encryption passphrase
for each volume and storing it in a KMS
- adds HashiCorp Vault integration as a KMS for encryption passphrases
- avoids encrypting volume second time if it was already encrypted but
no file system created
- avoids unnecessary checks if volume is a mapped device when encryption
was not requested
- prevents resizing encrypted volumes (it is not currently supported)
- prevents creating snapshots from encrypted volumes to prevent attack
on encryption key (security guard until re-encryption of volumes
implemented)
Signed-off-by: Vasyl Purchel vasyl.purchel@workday.com
Signed-off-by: Andrea Baglioni andrea.baglioni@workday.comFixes#420Fixes#744
Adds encryption in StorageClass as a parameter. Encryption passphrase is
stored in kubernetes secrets per StorageClass. Implements rbd volume
encryption relying on dm-crypt and cryptsetup using LUKS extension
The change is related to proposal made earlier. This is a first part of
the full feature that adds encryption with passphrase stored in secrets.
Signed-off-by: Vasyl Purchel vasyl.purchel@workday.com
Signed-off-by: Andrea Baglioni andrea.baglioni@workday.com
Signed-off-by: Ioannis Papaioannou ioannis.papaioannou@workday.com
Signed-off-by: Paul Mc Auley paul.mcauley@workday.com
Signed-off-by: Sergio de Carvalho sergio.carvalho@workday.com
The storage class already takes MountOptions(MountFlags), these are the
bind mount options. Some of these options may not be recognised by the
cephfs mount. Hence added a new parameterin Storage Class for
- cephfs kernel mount options,
- ceph-fuse mount options
Ceph kernel mount options are different from ceph-fuse options, hence
added two different parameters.
Signed-off-by: Poornima G <pgurusid@redhat.com>
The container runtime CRI-O limits the number of PIDs to 1024 by
default. When many PVCs are requested at the same time, it is possible
for the provisioner to start too many threads (or go routines) and
executing 'rbd' commands can start to fail. In case a go routine can not
get started, the process panics.
The PID limit can be changed by passing an argument to kubelet, but this
will affect all pids running on a host. Changing the parameters to
kubelet is also not a very elegant solution.
Instead, the provisioner pod can change the configuration itself. The
pod is running in privileged mode and can write to /sys/fs/cgroup where
the limit is configured.
With this change, the limit is configured to 'max', just as if there is
no limit at all. The logs of the csi-rbdplugin in the provisioner pod
will reflect the change it makes when starting the service:
$ oc -n rook-ceph logs -c csi-rbdplugin csi-rbdplugin-provisioner-0
..
I0726 13:59:19.737678 1 cephcsi.go:127] Initial PID limit is set to 1024
I0726 13:59:19.737746 1 cephcsi.go:136] Reconfigured PID limit to -1 (max)
..
It is possible to pass a different limit on the commandline of the
cephcsi executable. The following flag has been added:
--pidlimit=<int> the PID limit to configure through cgroups
This accepts special values -1 (max) and 0 (default, do not
reconfigure). Other integers will be the limit that gets configured in
cgroups.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This change also starts mapping nbd based access using ther rbd CLI
as, it is a prerequisite to get device listing for nbd as well.
Signed-off-by: ShyamsundarR <srangana@redhat.com>
Use Deployment with leader election instead of StatefulSet
Deployment behaves better when a node gets disconnected
from the rest of the cluster - new provisioner leader
is elected in ~15 seconds, while it may take up to
5 minutes for StatefulSet to start a new replica.
Refer: kubernetes-csi/external-provisioner@52d1fbc
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit adds support to mount and delete volumes provisioned by older
plugin versions (1.0.0) in order to support backward compatibility to 1.0.0
created volumes.
It adds back the ability to specify where older meta data was specified, using
the metadatastorage option to the plugin. Further, using the provided meta data
to mount and delete the older volumes.
It also supports a variety of ways in which monitor information may have been
specified (in the storage class, or in the secret), to keep the monitor
information current.
Testing done:
- Mount/Delete 1.0.0 plugin created volume with monitors in the StorageClass
- Mount/Delete 1.0.0 plugin created volume with monitors in the secret with
a key "monitors"
- Mount/Delete 1.0.0 plugin created volume with monitors in the secret with
a user specified key
- PVC creation and deletion with the current version (to ensure at the minimum
no broken functionality)
- Tested some negative cases, where monitor information is missing in secrets
or present with a different key name, to understand if failure scenarios work
as expected
Updates #378
Follow-up work:
- Documentation on how to upgrade to 1.1 plugin and retain above functionality
for older volumes
Signed-off-by: ShyamsundarR <srangana@redhat.com>