This helps Monitoring solutions without access to Kubernetes clusters to
display the details of the PV/PVC/NameSpace in their dashboard.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Below are the 3 different cases where we need
the PVC namespace for encryption
* CreateVolume:- Read the namespace from the
createVolume parameters and store it in the omap
* NodeStage:- Read the namespace from the omap
not from the volumeContext
* Regenerate:- Read the pvc namespace from the claimRef
not from the volumeAttributes.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
remove kubernetes csi prefixed parameters
from the volumeContext as we dont want
to store it in the PV VolumeAttributes.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit ensures that parent image is flattened before
creating volume.
- If the data source is a PVC, the underlying image's parent
is flattened(which would be a temp clone or snapshot).
hard & soft limit is reduced by 2 to account for depth that
will be added by temp & final clone.
- If the data source is a Snapshot, the underlying image is
itself flattened.
hard & soft limit is reduced by 1 to account for depth that
will be added by the clone which will be restored from the
snapshot.
Flattening step for resulting PVC image restored from snapshot is removed.
Flattening step for temp clone & final image is removed when pvc clone is
being created.
Fixes: #2190
Signed-off-by: Rakshith R <rar@redhat.com>
Makes the rbd images features in the storageclass
as optional so that default image features of librbd
can be used. and also kept the option to user
to specify the image features in the storageclass.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Return the dataPool used to create the image instead of the default one
provided by the createVolumeRequest.
In case of topologyConstrainedDataPools, they may differ.
Don't add datapool if it's not present
Signed-off-by: Sébastien Bernard <sebastien.bernard@sfr.com>
This commit removes the thick provisioning
code as thick provisioning is deprecated in
cephcsi 3.5.0.
fixes: #2795
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently most of the internal methods have the
rbdVolume as the received. As these methods
are completely internal and requires only
the fields of the rbdImage use rbdImage
as the receiver instead of rbdVolume.
updates #2742
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as per the CSI standard the size is optional parameter,
as we are allowing the clone to a bigger size
today we need to block the clone to a smaller size
as its a have side effects like data corruption etc.
Note:- Even though this check is present in kubernetes
sidecar as CSI is CO independent adding the check
here.
updates: #2718
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as per the CSI standard the size is optional parameter,
as we are allowing the restore to a bigger size
today we need to block the restore to a smaller size
as its a have side effects like data corruption.
Note:- Even though this check is present in kubernetes
sidecar as CSI is CO independent adding the check
here.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
SINGLE_NODE_WRITER capability ambiguity has been fixed in csi spec v1.5
which allows the SP drivers to declare more granular WRITE capability in form
of SINGLE_NODE_SINGLE_WRITER or SINGLE_NODE_MULTI_WRITER.
These are not really new capabilities rather capabilities introduced to
get the desired functionality from CO side based on the capabilities SP
driver support for various CSI operations, this new capabilities also help
to address new access mode RWOP (readwriteoncepod).
This commit adds a helper function which identity the request is of
multiwriter mode and also validates whether it is filesystem mode or
block mode. Based on the inspection it fails to allow multi write
requests for filesystem mode and only allow multi write request against
block mode.
This commit also adds unit tests for isMultiWriterBlock function which
validates various accesstypes and accessmodes.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
With introduction of go-ceph rbd admin task api, credentials are
no longer required to be passed as cli cmd is not invoked.
Signed-off-by: Rakshith R <rar@redhat.com>
after creating the rbd image log the image
details corresponding for the request along
with the request name.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
moved ParentName, ParentPool and ImageFeatureSet
fields to the rbdImage struct as these are the
first citizens on the rbdImage.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the requested volume size is greater than
the snapshot size, resize the cloned volume
after creating a clone from a snapshot.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the requested volume size is greater than
the parent volume size, resize the cloned volume
after creating a final clone from a parent volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added a check to consider ErrImageNotFound error
during DeleteSnapshot operation, if the error
is ErrImageNotFound we need to ensure that image
is removed from the trash and also the rados
OMAP data is removed.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as we are moving the VolSize to rbdImage struct
we should reuse the same instead of maintaining
one more field in rbdSnapshot struct.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
when doing the internal operation to get the
latest details the rbd image size is also getting
updated and this will update the volume size also
without actual requested size we cannot do the
resize operation for bigger clones. This commit
adds a new field called RequestedVolSize to rbdVolume
struct to hold the user requested size.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added a new helper function called cleanupThickClone
to cleanup the snapshot and clone if the thick
provisioning is not fully completed.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
remove the bigger size validation when
creating a volume from a snapshot or when
creation a clone from a volume as we resized
the volume after cloning.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
IsBlockMultiNode() is a new helper that takes a slice of
VolumeCapability objects and checks if it includes multi-node access
and/or block-mode support.
This can then easily be used in other services that need checking for
these particular capabilities, and preventing multi-node block-mode
access.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
genVolFromVolID() is used by the CSI Controller service to create an
rbdVolume object from a CSI volume_id. This function is useful for
CSI-Addons Services as well, so rename it to GenVolFromVolID().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
parseAndDeleteMigratedVolume() prviously clubbed the logic of
parsing of migration volume handle and then continued with the
deletion of the volume. however this commit split this
logic into two, ie parsing has been done in parseMigrationVolID()
and DeleteMigratedVolume() deletes the backend volume.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds a couple of helper functions to parse the migration
request secret and set it for further csi driver operations.
More details:
The intree secret has a data field called "key" which is the base64
admin secret key. The ceph CSI driver currently expect the secret to
contain data field "UserKey" for the equivalant. The CSI driver also
expect the "UserID" field which is not available in the in-tree secret
by deafult. This missing userID will be filled (if the username differ
than 'admin') in the migration secret as 'adminId' field in the
migration request, this commit adds the logic to parse this migration
secret as below:
"key" field value will be picked up from the migraion secret to "UserKey"
field.
"adminId" field value will be picked up from the migration secret to "UserID"
field
if `adminId` field is nil or not set, `UserID` field will be filled with
default value ie `admin`.The above logic get activated only when the secret
is a migration secret, otherwise skipped to the normal workflow as we have
today.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Thick-provisioning was introduced to make accounting of assigned space
for volumes easier. When thick-provisioned volumes are the only consumer
of the Ceph cluster, this works fine. However, it is unlikely that this
is the case. Instead, accounting of the requested (thin-provisioned)
size of volumes is much more practical as different types of volumes can
be tracked.
OpenShift already provides cluster-wide quotas, which can combine
accounting of requested volumes by grouping different StorageClasses.
In addition to the difficult practise of allowing only thick-provisioned
RBD backed volumes, the performance makes thick-provisioning
troublesome. As volumes need to be completely allocated, data needs to
be written to the volume. This can take a long time, depending on the
size of the volume. Provisioning, cloning and snapshotting becomes very
much noticeable, and because of the additional time consumption, more
prone to failures.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
After moving moving image to trash, if `trash remove` step fails,
then external-provisioner will issue subsequent requests, in which
image will be absent in pool( will be in trash) and omap cleanup will
be done with stale image left in trash with no `trash remove` step on it.
To avoid this scenario list trash images and find corresponding id for given
image name and add a task to flatten when we encounter a ErrImageNotFound.
Fixes: #1728
Signed-off-by: Rakshith R <rar@redhat.com>
During PVC snapshot/clone both kms config and passphrase needs to copied,
while for PVC restore only passphrase needs to be copied to dest rbdvol
since destination storageclass may have another kms config.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds the logic to detect a passed in volumeID
is a migrated volume ID and if yes, the driver connect to the
backend cluster and clean/delete the image. The logic
only applied if its a migration volume ID. The migration volume ID
carry the information like mons, pool and image name which is
good enough for the driver to identify and connect to the backend
cluster for its operations.
migration volID format:
<mig>_mons-<monsHash>_image-<imageUID>_<poolHash>
Details on the hash values:
* MonsHash: this carry a hash value (md5sum) which will be acted as the
`clusterID` for the operations in this context.
* ImageUID: this is the unique UUID generated by kubernetes for the created
volume.
* PoolHash: this is an encoded string of pool name.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds capability to genVolFromVolumeOptions() to fetch
mapped clusted-id & mon ips for mirrored PVC on secondary cluster
which may have different cluster-id.
This is required for NodeStageVolume().
We also don't need to check for mapping during volume create requests,
so it can be disabled by passing a bool checkClusterIDMapping as false.
GetMonsAndClusterID() is modified to accept bool checkClusterIDMapping
based on which clustermapping is checked to fetch mapped cluster-id and
mon-ips.
Signed-off-by: Rakshith R <rar@redhat.com>
all the error check scenarios of genVolFromVolID() and unreserving
omap entries based on the error made deleteVolume method complex,
this patch create a new function which handle the error check and
unrerving omap entries accordingly and finally return the response
to deletevolume/caller.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Moving the log functions into its own internal/util/log package makes it
possible to split out the humongous internal/util packages in further
smaller pieces. This reduces the inter-dependencies between utility
functions and components, preventing circular dependencies which are not
allowed in Go.
Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
If the image is in a secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote cluster
and the local image is getting replayed. Delete the
OMAP data generated as we cannot delete the
secondary image. When the image on the primary
cluster gets deleted/mirroring disabled, the image on
all the remote (secondary) clusters will get
auto-deleted. This helps in garbage collecting
the OMAP, PVC and PV objects after failback operation.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit fixes snapshot id idempotency issue by
always returning an error when flattening is in progress
and not using `readyToUse:false` response.
Signed-off-by: Rakshith R <rar@redhat.com>
Volume generated from snap using genrateVolFromSnap
already copies volume ID correctly, therefore removing
`vol.VolID = rbdVol.VolID` which wrongly copies parent
Volume ID instead leading to error from copyEncryption()
on parent and clone volume ID being equal.
Signed-off-by: Rakshith R <rar@redhat.com>
Previously in ControllerExpandVolume() we had a check for encrypted
volumes and we use to fail for all expand requests on an encrypted
volume. Also for Block VolumeMode PVCs NodeExpandVolume used to be
ignored/skipped.
With these changes, we add support for the expansion of encrypted volumes.
Also for raw Block VolumeMode PVCs with Encryption we call NodeExpandVolume.
That said,
With LUKS1, cryptsetup utility doesn't prompt for a passphrase on resizing
the crypto mapper device. This is because LUKS1 devices don't use kernel
keyring for volume keys.
Whereas, LUKS2 devices use kernel keyring for volume key by default, i.e.
cryptsetup utility asks for a passphrase if it detects volume key was
previously passed to dm-crypt via kernel keyring service, we are overriding
the default by --disable-keyring option during cryptsetup open command.
So that at the time of crypto mapper device resize we will not be
prompted for any passphrase.
Fixes: #1469
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
nlreturn linter requires a new line before return
and branch statements except when the return is alone
inside a statement group (such as an if statement) to
increase code clarity. This commit addresses such issues.
Updates: #1586
Signed-off-by: Rakshith R <rar@redhat.com>
We have many declarations and invocations..etc with long lines which are
very difficult to follow while doing code reading. This address the issues
in 'internal/rbd/*server.go' and 'internal/rbd/driver.go' files to restrict
the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds checks for missing `imageFeatures` parameter
in createvolumerequest and nodestagerequest(only for static PVs).
Missing `imageFeatures` parameter is ignored in case of non-static
PVs to ensure backwards compatibility with older versions which
did not have `imageFeatures` as required parameter.
Signed-off-by: Rakshith R <rar@redhat.com>
When cloning a volume from a (CSI) snapshot, we use DeepCopy() and do
not need an RBD snapshot as source.
Suggested-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case restoring a snapshot of a thick-PVC failed during DeepCopy(),
the image will exist, but have partial contents. Only when the image has
the thick-provisioned metadata set, it has completed DeepCopy().
When the metadata is missing, the image is deleted, and an error is
returned to the caller. Kubernetes will automatically retry provisioning
on the ABORTED error, and the restoring will get restarted from the
beginning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
the parent volume(CreateVolume) and the clone volume
(CreateSnapshot) are both indepedent and parent volume
can be deleted anytime. To check the thick provision
during Snapshot restore(CreateVolume from snapshot)
we need the thick provision metadata so for the same
reason setting the thick provision metadata on the
clone image we are creating at the CreateSnapshot time.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added validation to allow only Restore of Thick PVC
snapshot to a thick clone and creation of thick clone
from thick PVC.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In case cloning a thick-PVC failed during DeepCopy(), the image will
exist, but have partial contents. Only when the image has the
thick-provisioned metadata set, it has completed DeepCopy().
When the metadata is missing, the image is deleted, and an error is
returned to the caller. Kubernetes will automatically retry provisioning
on the ABORTED error, and the cloning will get restarted from the
beginning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Thick-provisioned images are independent, cloned images or snapshots are
deep-flattened during creation. There is no need to try and flatten them
again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit addresses the following issue:
'nolint:gocyclo // complexity needs to be reduced.'
is unused for linter "gocyclo" (nolintlint)
Updates:#2025
Signed-off-by: Yati Padia <ypadia@redhat.com>
Validate Snapshot request to check if the
passed pool name is not empty.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
as we are supporting the creation of clone to a new
pool we need to pass the correct parent volume
to cleanup the snapshot on parent volume.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
added support to create image in different pool.
if the snapshot/rbd image exists in one pool we
can create a clone the clone of the rbd image to
a different pool.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
CreateVolume will fail in below cases
* If the snapshot is encrypted and requested volume
is not encrypted
* If the snapshot is not encrypted and requested
volume is encrypted
* If the parent volume is encrypted and requested volume
is not encrypted
* If the parent volume is not encrypted and requested
volume is encrypted
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Move the repairing of a volume/snapshot from CreateVolume to its own
function. This reduces the complexity of the code, and makes the
procedure easier to understand. Further enhancements to repairing an
exsiting volume can be done in the new function.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit removes calling of .String() when logging
since `%s`,`%v` or `%q` will call an existing .String() function
automatically.
Fixes: #2051
Signed-off-by: Rakshith R <rar@redhat.com>
At present we return the volume connect error if the clone
from snapshot fails when rbdvolume is encrypted, which is incorrect.
This patch correctly return the failed copy encryption error to the
caller
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
It is possible that when a provisioner restarts after a snapshot was
cloned, but before the newly restored image had its encryption metadata
set, the new image is not marked as encrypted. This will prevent
attaching/mounting the image, as the encryption key will not be fetched,
or is not available in the DEKStore.
By actively repairing the encryption configuration when needed, this
problem should be addressed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
buildCreateVolumeResponse() exists exactly for the need to create a
csi.CreateVolumeResponse based on an rbdVolume. Calling this helper
reduces the code duplication in CreateVolume().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbdVolume that needs its encryption configured is constructed in the
Exists() method. It is suitable to move the copyEncryptionConfig() call
there as well, so that the object is completely constructed in a single
place.
Golang-ci:gocyclo complained about the increased complexity of the
Exists() function. Moving the repairing of the ImageID into its own
helper function makes the code a little easier to understand.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Introduce helper function cloneFromSnapshot() that takes care of the
procedures that are needed when an existing snapshot has been found.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Without this, the rbdVolume can not connect to the Ceph cluster and
configure the (optional) encryption.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The ControllerServer should not need to care about support for
encryption, ideally it is transparantly handled by the rbdVolume type
and its internal API.
Deleting the DEK was one of the last remainders that was explicitly done
inside the ControllerServer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
added ReplicationServer struct for the replication related
operation it also embed the ControllerServer which
already implements the helper functions like locking/unlocking etc.
removed getVolumeFromID and cleanup functions for better
code readability and easy maintaince.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The rbdSnapshot and rbdVolume structs have many common attributes. In
order to combine these into an rbdImage struct that implements shared
functionality, having the same attribute for the ID makes things much
easier.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Use DEKStore API for Fetching and Storing passphrases.
Drop the fallback for the old KMS interface that is now provided as
DEKStore. The original implementation has been re-used for the DEKStore
interface.
This also moves GetCryptoPassphrase/StoreNewCryptoPassphrase functions
to methods of VolumeEncryption.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Prepare for grouping encryption related functions together. The main
rbdVolume object should not be cluttered with KMS or DEK procedures.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add an option to the StorageClass to support creating fully allocated
(thick provisioned) RBD images
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This adds internal/rbd/encryption.go which will be used to include other
encryption functionality to support additional KMS related functions. It
will work together with the shared API from internal/util/kms.go.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Have the provisioner create the passphrase for the volume, instead of
doign it lazily at the time the volume is used for the 1st time. This
prevents potential races where pods on different nodes try to store
different passphrases at the (almost) same time.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
`cap` builtin function returns the capacity of a type. Its not
good practice to use this builtin function for other variable
names, removing it here
Ref# https://golang.org/pkg/builtin/#cap
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
cephcsi uses cli for fetching snap list as well as to check the
snapshot namespace, replaced that with go-ceph calls.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
It doesnot make sense to allow the creation of empty
volumes which is going to be accessed with readonly mode,
this commit allows the creation of volume which is having
readonly capabilities only if the content source is set
for the volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
At CSI spec < 1.2.0, there was no volumecapability in the
expand request. However its available from v1.2+ which allows
us to declare the node operations based on the volume mode.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Make sure to operate within the namespace if any given
when dealing with rbd images and snapshots and their journals.
Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>
if we are not able to fetch the cluster-ID from
the createSnapshot request and also if we are
not able to get the monitor information from
the cluster-ID return error instead of using
the parent image information.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Replaced command execution with go-ceph Resize() function.
Volsize is being updated before waiting for resize() to return,
fixed it to get updated only after resize() is successful.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>