Thick-provisioning was introduced to make accounting of assigned space
for volumes easier. When thick-provisioned volumes are the only consumer
of the Ceph cluster, this works fine. However, it is unlikely that this
is the case. Instead, accounting of the requested (thin-provisioned)
size of volumes is much more practical as different types of volumes can
be tracked.
OpenShift already provides cluster-wide quotas, which can combine
accounting of requested volumes by grouping different StorageClasses.
In addition to the difficult practise of allowing only thick-provisioned
RBD backed volumes, the performance makes thick-provisioning
troublesome. As volumes need to be completely allocated, data needs to
be written to the volume. This can take a long time, depending on the
size of the volume. Provisioning, cloning and snapshotting becomes very
much noticeable, and because of the additional time consumption, more
prone to failures.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
After moving moving image to trash, if `trash remove` step fails,
then external-provisioner will issue subsequent requests, in which
image will be absent in pool( will be in trash) and omap cleanup will
be done with stale image left in trash with no `trash remove` step on it.
To avoid this scenario list trash images and find corresponding id for given
image name and add a task to flatten when we encounter a ErrImageNotFound.
Fixes: #1728
Signed-off-by: Rakshith R <rar@redhat.com>
During PVC snapshot/clone both kms config and passphrase needs to copied,
while for PVC restore only passphrase needs to be copied to dest rbdvol
since destination storageclass may have another kms config.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds the logic to detect a passed in volumeID
is a migrated volume ID and if yes, the driver connect to the
backend cluster and clean/delete the image. The logic
only applied if its a migration volume ID. The migration volume ID
carry the information like mons, pool and image name which is
good enough for the driver to identify and connect to the backend
cluster for its operations.
migration volID format:
<mig>_mons-<monsHash>_image-<imageUID>_<poolHash>
Details on the hash values:
* MonsHash: this carry a hash value (md5sum) which will be acted as the
`clusterID` for the operations in this context.
* ImageUID: this is the unique UUID generated by kubernetes for the created
volume.
* PoolHash: this is an encoded string of pool name.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds capability to genVolFromVolumeOptions() to fetch
mapped clusted-id & mon ips for mirrored PVC on secondary cluster
which may have different cluster-id.
This is required for NodeStageVolume().
We also don't need to check for mapping during volume create requests,
so it can be disabled by passing a bool checkClusterIDMapping as false.
GetMonsAndClusterID() is modified to accept bool checkClusterIDMapping
based on which clustermapping is checked to fetch mapped cluster-id and
mon-ips.
Signed-off-by: Rakshith R <rar@redhat.com>
all the error check scenarios of genVolFromVolID() and unreserving
omap entries based on the error made deleteVolume method complex,
this patch create a new function which handle the error check and
unrerving omap entries accordingly and finally return the response
to deletevolume/caller.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Moving the log functions into its own internal/util/log package makes it
possible to split out the humongous internal/util packages in further
smaller pieces. This reduces the inter-dependencies between utility
functions and components, preventing circular dependencies which are not
allowed in Go.
Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
If the image is in a secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote cluster
and the local image is getting replayed. Delete the
OMAP data generated as we cannot delete the
secondary image. When the image on the primary
cluster gets deleted/mirroring disabled, the image on
all the remote (secondary) clusters will get
auto-deleted. This helps in garbage collecting
the OMAP, PVC and PV objects after failback operation.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit fixes snapshot id idempotency issue by
always returning an error when flattening is in progress
and not using `readyToUse:false` response.
Signed-off-by: Rakshith R <rar@redhat.com>
Volume generated from snap using genrateVolFromSnap
already copies volume ID correctly, therefore removing
`vol.VolID = rbdVol.VolID` which wrongly copies parent
Volume ID instead leading to error from copyEncryption()
on parent and clone volume ID being equal.
Signed-off-by: Rakshith R <rar@redhat.com>
Previously in ControllerExpandVolume() we had a check for encrypted
volumes and we use to fail for all expand requests on an encrypted
volume. Also for Block VolumeMode PVCs NodeExpandVolume used to be
ignored/skipped.
With these changes, we add support for the expansion of encrypted volumes.
Also for raw Block VolumeMode PVCs with Encryption we call NodeExpandVolume.
That said,
With LUKS1, cryptsetup utility doesn't prompt for a passphrase on resizing
the crypto mapper device. This is because LUKS1 devices don't use kernel
keyring for volume keys.
Whereas, LUKS2 devices use kernel keyring for volume key by default, i.e.
cryptsetup utility asks for a passphrase if it detects volume key was
previously passed to dm-crypt via kernel keyring service, we are overriding
the default by --disable-keyring option during cryptsetup open command.
So that at the time of crypto mapper device resize we will not be
prompted for any passphrase.
Fixes: #1469
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
nlreturn linter requires a new line before return
and branch statements except when the return is alone
inside a statement group (such as an if statement) to
increase code clarity. This commit addresses such issues.
Updates: #1586
Signed-off-by: Rakshith R <rar@redhat.com>
We have many declarations and invocations..etc with long lines which are
very difficult to follow while doing code reading. This address the issues
in 'internal/rbd/*server.go' and 'internal/rbd/driver.go' files to restrict
the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds checks for missing `imageFeatures` parameter
in createvolumerequest and nodestagerequest(only for static PVs).
Missing `imageFeatures` parameter is ignored in case of non-static
PVs to ensure backwards compatibility with older versions which
did not have `imageFeatures` as required parameter.
Signed-off-by: Rakshith R <rar@redhat.com>
When cloning a volume from a (CSI) snapshot, we use DeepCopy() and do
not need an RBD snapshot as source.
Suggested-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case restoring a snapshot of a thick-PVC failed during DeepCopy(),
the image will exist, but have partial contents. Only when the image has
the thick-provisioned metadata set, it has completed DeepCopy().
When the metadata is missing, the image is deleted, and an error is
returned to the caller. Kubernetes will automatically retry provisioning
on the ABORTED error, and the restoring will get restarted from the
beginning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
the parent volume(CreateVolume) and the clone volume
(CreateSnapshot) are both indepedent and parent volume
can be deleted anytime. To check the thick provision
during Snapshot restore(CreateVolume from snapshot)
we need the thick provision metadata so for the same
reason setting the thick provision metadata on the
clone image we are creating at the CreateSnapshot time.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added validation to allow only Restore of Thick PVC
snapshot to a thick clone and creation of thick clone
from thick PVC.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In case cloning a thick-PVC failed during DeepCopy(), the image will
exist, but have partial contents. Only when the image has the
thick-provisioned metadata set, it has completed DeepCopy().
When the metadata is missing, the image is deleted, and an error is
returned to the caller. Kubernetes will automatically retry provisioning
on the ABORTED error, and the cloning will get restarted from the
beginning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Thick-provisioned images are independent, cloned images or snapshots are
deep-flattened during creation. There is no need to try and flatten them
again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit addresses the following issue:
'nolint:gocyclo // complexity needs to be reduced.'
is unused for linter "gocyclo" (nolintlint)
Updates:#2025
Signed-off-by: Yati Padia <ypadia@redhat.com>
Validate Snapshot request to check if the
passed pool name is not empty.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
as we are supporting the creation of clone to a new
pool we need to pass the correct parent volume
to cleanup the snapshot on parent volume.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
added support to create image in different pool.
if the snapshot/rbd image exists in one pool we
can create a clone the clone of the rbd image to
a different pool.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
CreateVolume will fail in below cases
* If the snapshot is encrypted and requested volume
is not encrypted
* If the snapshot is not encrypted and requested
volume is encrypted
* If the parent volume is encrypted and requested volume
is not encrypted
* If the parent volume is not encrypted and requested
volume is encrypted
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Move the repairing of a volume/snapshot from CreateVolume to its own
function. This reduces the complexity of the code, and makes the
procedure easier to understand. Further enhancements to repairing an
exsiting volume can be done in the new function.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit removes calling of .String() when logging
since `%s`,`%v` or `%q` will call an existing .String() function
automatically.
Fixes: #2051
Signed-off-by: Rakshith R <rar@redhat.com>
At present we return the volume connect error if the clone
from snapshot fails when rbdvolume is encrypted, which is incorrect.
This patch correctly return the failed copy encryption error to the
caller
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
It is possible that when a provisioner restarts after a snapshot was
cloned, but before the newly restored image had its encryption metadata
set, the new image is not marked as encrypted. This will prevent
attaching/mounting the image, as the encryption key will not be fetched,
or is not available in the DEKStore.
By actively repairing the encryption configuration when needed, this
problem should be addressed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
buildCreateVolumeResponse() exists exactly for the need to create a
csi.CreateVolumeResponse based on an rbdVolume. Calling this helper
reduces the code duplication in CreateVolume().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbdVolume that needs its encryption configured is constructed in the
Exists() method. It is suitable to move the copyEncryptionConfig() call
there as well, so that the object is completely constructed in a single
place.
Golang-ci:gocyclo complained about the increased complexity of the
Exists() function. Moving the repairing of the ImageID into its own
helper function makes the code a little easier to understand.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Introduce helper function cloneFromSnapshot() that takes care of the
procedures that are needed when an existing snapshot has been found.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Without this, the rbdVolume can not connect to the Ceph cluster and
configure the (optional) encryption.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The ControllerServer should not need to care about support for
encryption, ideally it is transparantly handled by the rbdVolume type
and its internal API.
Deleting the DEK was one of the last remainders that was explicitly done
inside the ControllerServer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
added ReplicationServer struct for the replication related
operation it also embed the ControllerServer which
already implements the helper functions like locking/unlocking etc.
removed getVolumeFromID and cleanup functions for better
code readability and easy maintaince.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The rbdSnapshot and rbdVolume structs have many common attributes. In
order to combine these into an rbdImage struct that implements shared
functionality, having the same attribute for the ID makes things much
easier.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Use DEKStore API for Fetching and Storing passphrases.
Drop the fallback for the old KMS interface that is now provided as
DEKStore. The original implementation has been re-used for the DEKStore
interface.
This also moves GetCryptoPassphrase/StoreNewCryptoPassphrase functions
to methods of VolumeEncryption.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Prepare for grouping encryption related functions together. The main
rbdVolume object should not be cluttered with KMS or DEK procedures.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add an option to the StorageClass to support creating fully allocated
(thick provisioned) RBD images
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This adds internal/rbd/encryption.go which will be used to include other
encryption functionality to support additional KMS related functions. It
will work together with the shared API from internal/util/kms.go.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Have the provisioner create the passphrase for the volume, instead of
doign it lazily at the time the volume is used for the 1st time. This
prevents potential races where pods on different nodes try to store
different passphrases at the (almost) same time.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
`cap` builtin function returns the capacity of a type. Its not
good practice to use this builtin function for other variable
names, removing it here
Ref# https://golang.org/pkg/builtin/#cap
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
cephcsi uses cli for fetching snap list as well as to check the
snapshot namespace, replaced that with go-ceph calls.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
It doesnot make sense to allow the creation of empty
volumes which is going to be accessed with readonly mode,
this commit allows the creation of volume which is having
readonly capabilities only if the content source is set
for the volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
At CSI spec < 1.2.0, there was no volumecapability in the
expand request. However its available from v1.2+ which allows
us to declare the node operations based on the volume mode.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Make sure to operate within the namespace if any given
when dealing with rbd images and snapshots and their journals.
Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>
if we are not able to fetch the cluster-ID from
the createSnapshot request and also if we are
not able to get the monitor information from
the cluster-ID return error instead of using
the parent image information.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Replaced command execution with go-ceph Resize() function.
Volsize is being updated before waiting for resize() to return,
fixed it to get updated only after resize() is successful.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
While adding the context.Context to the resizeRBDimage() function, it
became a little ugly. So renaming the function to resize() and making it
a method of the rbdVolume type.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new rbdVolume.isInUse() method will replace the rbdStatus()
function. This removes one more rbd command execution in the
DeleteVolume path.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This change replaces the sentinel errors in rbd module with
standard errors created with errors.New().
Related: #1203
Signed-off-by: Sven Anderson <sven@redhat.com>
The sentinel error code had additional fields in the errors, that are
used nowhere. This leads to unneccesarily complicated code. This
change replaces the sentinel errors in utils with standard errors
created with errors.New() and adds a simple JoinErrors() function to
be able to combine sentinel errors from different code tiers.
Related: #1203
Signed-off-by: Sven Anderson <sven@redhat.com>
Take operation locks on the resources before operating
on the resouces. This allows us to do parallel operations
for some RPC calls such as Clone and Restore of PVC.
This operations will only be blocked if the image is
expanding or Snapshot and RBD image is getting deleted.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add explanation to nolint directives.
Issue reported:
whyNoLint: include an explanation for nolint directive (gocritic)
Signed-off-by: Yug <yuggupta27@gmail.com>
If the requested volume size and the snapshot or the
parent volume from which the clone is to be created
is not equal cephcsi returns an error message.
updates #1188
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the snapshots on the parent image exceeds
maxSnapshotsOnImage count, we need to flatten
all the temporary cloned images to over come the
krbd issue of maximum number of snapshots on
an image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as v1.0.0 is deprecated we need to remove the support
for it in the Next coming (v3.0.0) release. This PR
removes the support for the same.
closes#882
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Added support for RBD PVC to PVC cloning, below
commands are executed to create a PVC-PVC clone from
RBD side.
* Check the depth(n) of the cloned image if n>=(hard limit -2)
or ((soft limit-2) Add a task to flatten the image and return
about (to avoid image leak) **Note** will try to flatten the
temp clone image in the chain if available
* Reserve the key and values in omap (this will help us to
avoid the leak as it's not reserved earlier as we have returned
ABORT (the request may not come back))
* Create a snapshot of rbd image
* Clone the snapshot (temp clone)
* Delete the snapshot
* Snapshot the temp clone
* Clone the snapshot (final clone)
* Delete the snapshot
```bash
1) check the image depth of the parent image if flatten required
add a task to flatten image and return ABORT to avoid leak
(hardlimit-2 and softlimit-2 check will be done)
2) Reserve omap keys
2) rbd snap create <RBD image for src k8s volume>@<random snap name>
3) rbd clone --rbd-default-clone-format 2 --image-feature
layering,deep-flatten <RBD image for src k8s volume>@<random snap>
<RBD image for temporary snap image>
4) rbd snap rm <RBD image for src k8s volume>@<random snap name>
5) rbd snap create <cloned RBD image created in snapshot process>@<random snap name>
6) rbd clone --rbd-default-clone-format 2 --image-feature <k8s dst vol config>
<RBD image for temporary snap image>@<random snap name> <RBD image for k8s dst vol>
7)rbd snap rm <RBD image for src k8s volume>@<random snap name>
```
* Delete temporary clone image created as part of clone(delete if present)
* Delete rbd image
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we need to take lock on parent rbd image when
we are creating a snapshot from it, if the user
tries to delete/resize the rbd image when we are
taking snapshots,we may face issues. if the volume
lock is present on the rbd image, the user cannot
resize the rbd image nor delete the rbd image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
flatten cloned images to remove the link
with the snapshot as the parent snapshot can
be removed from the trash.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
By fixing the golangci-lint runs, this now gets reported as a problem.
Instead of addressing the compexity of the DeleteVolume() method here,
mark it as a TODO.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
currently, various calls to deleteImage does not
need the rbdStatus check. That is only required
when calling from DeleteVolume. This PR optimizes the
rbd image deletion by removing the status check.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This Adds a support for create,delete snapshot
and creating a new rbd image from the snapshot.
* Create a snapshot
* Create a temporary snapshot from the parent volume
* Clone a new image from a temporary snapshot with options
--rbd-default-clone-format 2 --image-feature layering,deep-flatten
* Delete temporary snapshot created
* Create a new snapshot from cloned image
* Check the image chain depth, if the Softlimit is reached Add a
task Flatten the cloned image and return success. if the depth
is reached hard limit Add a task Flatten the cloned image and
return snapshot status ready as false
```bash
1) rbd snap create <RBD image for src k8s volume>@<random snap name>
2) rbd clone --rbd-default-clone-format 2 --image-feature
layering,deep-flatten <RBD image for src k8s volume>@<random snap>
<RBD image for temporary snap image>
3) rbd snap rm <RBD image for src k8s volume>@<random snap name>
4) rbd snap rm <RBD image for temporary snap image>@<random snap name>
5) check the depth, if the depth is greater than configured hard
limit add a task to flatten the cloned image return snapshot status
ready as false if the depth is greater than soft limit add a task
to flatten the image and return success
```
* Create a clone from snapshot
* Clone a new image from the snapshot with user-provided options
* Check the depth(n) of the cloned image if n>=(hard limit)
Add task to flatten the image and return ABORT (to avoid image leak)
```bash
1) rbd clone --rbd-default-clone-format 2 --image-feature
<k8s dst vol config> <RBD image for temporary snap image>@<random snap name>
<RBD image for k8s dst vol>
2) check the depth, if the depth is greater than configured hard limit
add a task to flatten the cloned image return ABORT error if the depth is
greater than soft limit add a task to flatten the image and return success
```
* Delete snapshot or pvc
* Move the temporary cloned image to the trash
* Add task to remove the image from the trash
```bash
1) rbd trash mv <cloned image>
2) ceph rbd task trash remove <cloned image>
```
With earlier implementation to delete the image, we used to add
a task to remove the image with new changes this cannot be done
as the image may contain snapshots or linking.so we will be
doing below steps to delete an image(this will be
applicable for both normal image and cloned image)
* Move the rbd image to the trash
* Add task to remove the image from the trash
```bash
1) rbd trash mv <image>
2) ceph rbd task trash remove <image>
```
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we dont need to call the snapshot CLI functions
to get snapshot details. as these details are not
requried with new snapshot design.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
As we are using v2 cloning we dont need to
do protect the snapshot before cloning and
unprotect it before deleting.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
By using switch/case it is easier to follow the error checking of the
genVolFromVolID() function. In case a new error is added as a return of
the function, it will be simpler to add checking for it.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The gocyclo linter complains about the high complexity of the
CreateVolume() function:
> pkg/rbd/controllerserver.go:133:1: cyclomatic complexity 21 of func `(*ControllerServer).CreateVolume` is high (> 20) (gocyclo)
By splitting it up and separeting the creation of an exisint CSI Volume
object in buildCreateVolumeResponse(), the gocyclic linter does not
complain any longer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>