after creating the clone get the current
image details like size, creationTime,
imageFeatures etc from the ceph cluster.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
moved ParentName, ParentPool and ImageFeatureSet
fields to the rbdImage struct as these are the
first citizens on the rbdImage.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the volume with a bigger size is created
from a snapshot or from another volume we
need to exapand the filesystem also in the
csidriver as nodeExpand request is not triggered
for this one, During NodeStageVolume we can
expand the filesystem by checking filesystem
needs expansion or not.
If its a encrypted device, check the device
size of rbd device and the LUKS device if required
the device will be expanded before
expanding the filesystem.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the requested volume size is greater than
the snapshot size, resize the cloned volume
after creating a clone from a snapshot.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the requested volume size is greater than
the parent volume size, resize the cloned volume
after creating a final clone from a parent volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added a check to consider ErrImageNotFound error
during DeleteSnapshot operation, if the error
is ErrImageNotFound we need to ensure that image
is removed from the trash and also the rados
OMAP data is removed.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we need actual size of the rbdVolume
created for the snapshot, as we are not
storing the size of the snapshot in OMAP
we need to fetch the size from ceph cluster
and update the same on rbdSnapshot struct.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as we are moving the VolSize to rbdImage struct
we should reuse the same instead of maintaining
one more field in rbdSnapshot struct.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
move the Volsize to the rbdImage struct
as size is more applicable for rbdImage
as rbdImage is used for both rbdVolume
and rbdSnapshot.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as we are no longer supporting the v1.x
version of cephcsi. removing the json tag
used to store rbd volume details in configmap.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
when doing the internal operation to get the
latest details the rbd image size is also getting
updated and this will update the volume size also
without actual requested size we cannot do the
resize operation for bigger clones. This commit
adds a new field called RequestedVolSize to rbdVolume
struct to hold the user requested size.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added a new helper function called cleanupThickClone
to cleanup the snapshot and clone if the thick
provisioning is not fully completed.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
remove the bigger size validation when
creating a volume from a snapshot or when
creation a clone from a volume as we resized
the volume after cloning.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
dummy image rbdVolume struct is derived
from the actual one rbdVolume of the
volumeID sent in the EnableVolumeReplication
request. and the dummy rbdVolume struct contains
the image id of the actual volume because
of that when we are repairing the dummy
image the image is sent to trash but not
deleted due to the wrong image ID. resetting
the image id will makes sure the image id
is fetching from ceph cluster and same
image id will be used for manager operation.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
IsBlockMultiNode() is a new helper that takes a slice of
VolumeCapability objects and checks if it includes multi-node access
and/or block-mode support.
This can then easily be used in other services that need checking for
these particular capabilities, and preventing multi-node block-mode
access.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
we have added clusterID mapping to identify the volumes
in case of a failover in Disaster recovery in #1946.
with #2314 we are moving to a configuration in
configmap for clusterID and poolID mapping.
and with #2314 we have all the required information
to identify the image mappings.
This commit removes the workaround implementation done
in #1946.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The rbd package contains several functions that can be used by
CSI-Addons Service implmentations. Unfortunately it is not possible to
do this, as the rbd-driver needs to import the csi-addons/rbd package to
provide the CSI-Addons server. This causes a circular import when
services use the rbd package:
- rbd/driver.go import csi-addons/rbd
- csi-addons/rbd import rbd (including the driver)
By moving rbd/driver.go into its own package, the circular import can be
prevented.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
HexStringToInteger() used to return a uint64, but everywhere else uint
is used. Having HexStringToInteger() return a uint as well makes it a
little easier to use when setting it with SetGlobalInt().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the rbd-driver starts, it initializes some global (yuck!) variables
in the rbd package. Because the rbd-driver is moved out into its own
package, these variables can not easily be set anymore.
Introcude SetGlobalInt(), SetGlobalBool() and InitJournals() so that the
rbd-driver can configure the rbd package.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbd-driver calls rbd.runVolumeHealer() which is not available
outside the rbd package. By moving the rbd-driver into its own package,
RunVolumeHealer() needs to be exported.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
NodeServer.mounter is internal to the NodeServer type, but it needs to
be initialized by the rbd-driver. The rbd-driver is moved to its own
package, so .Mounter needs to be available from there in order to set
it.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
genVolFromVolID() is used by the CSI Controller service to create an
rbdVolume object from a CSI volume_id. This function is useful for
CSI-Addons Services as well, so rename it to GenVolFromVolID().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The dummy image will be created with 1Mib size.
during the snapshot transfer operation the 1Mib
will be transferred even if the dummy image doesnot
contains any data. adding the new image features
`fast-diff,layering,obj-map,exclusive-lock`on the
dummy image will ensure that only the diff is
transferred to the remote cluster.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we added a workaround for rbd scheduling by creating
a dummy image in #2656. with the fix we are creating
a dummy image of the size of the first actual rbd
image which is sent in EnableVolumeReplication request
if the actual rbd image size is 1TiB we are creating
a dummy image of 1TiB which is not good. even though
its a thin provisioned rbd images this is causing
issue for the transfer of the snapshot during
the mirroring operation.
This commit recreates the rbd image with 1MiB size
which is the smaller supported size in rbd.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
rbd mirroring CLI calls are async and it doesn't wait
for the operation to be completed. ex:- `rbd mirror image enable`
it will enable the mirroring on the image but it doesn't
ensure that the image is mirroring enabled and healthy
primary. The same goes for the promote volume also.
This commits adds a check-in PromoteVolume to make sure
the image in a healthy state i.e `up+stopped`.
note:- not considering any intermediate states to make
sure the image is completely healthy before responding
success to the RPC call.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Journal-based RADOS block device mirroring ensures point-in-time
consistent replicas of all changes to an image, including reads and
writes, block device resizing, snapshots, clones, and flattening.
Journaling-based mirroring records all modifications to an image in the
order in which they occur. This ensures that a crash-consistent mirror
of an image is available.
Mirroring when configured in journal mode, mirroring will
utilize the RBD journaling image feature to replicate the image
contents. If the RBD journaling image feature is not yet enabled on the
image, it will be automatically enabled.
Fixes: #2018
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Depending on the way Ceph-CSI is deployed, the capabilities will be
configured for the GetCapabilities procedure. The other procedures are
more straight-forward.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
After adding the new CSI-Addons Server, golang-ci complains that
driver.Run() is too complex. By moving the profiling checks and starting
of the go-routines in their own function, golang-ci is happy again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
currently we are fist operating on the dummy
image to refresh the pool and then we are adding
the scheduling. we think the scheduling should
be added first and than we should refresh the
pool. If we do this all the existing schedules
will be considered from the scheduler.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
with shallow copy of rbdVol to dummyVol
the image name update of the dummyVol is getting
reflected on the rbdVol which we dont want.
do deep copy to avoid this problem.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Uses the below schema to supply mounter specific map/unmapOptions to the
nodeplugin based on the discussion we all had at
https://github.com/ceph/ceph-csi/pull/2636
This should specifically be really helpful with the `tryOthermonters`
set to true, i.e with fallback mechanism settings turned ON.
mapOption: "kbrd:v1,v2,v3;nbd:v1,v2,v3"
- By omitting `krbd:` or `nbd:`, the option(s) apply to
rbdDefaultMounter which is krbd.
- A user can _override_ the options for a mounter by specifying `krbd:`
or `nbd:`.
mapOption: "v1,v2,v3;nbd:v1,v2,v3"
is effectively the same as the 1st example.
- Sections are split by `;`.
- If users want to specify common options for both `krbd` and `nbd`,
they should mention them twice.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The dummy mirror image needs to be disabled and then
reenabled for mirroring, to ensure a newly promoted
primary is now starting to schedule snapshots.
Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
currently we have a bug in rbd mirror scheduling module.
After doing failover and failback the scheduling is not
getting updated and the mirroring snapshots are not
getting created periodically as per the scheduling
interval. This PR workarounds this one by doing below
operations
* Create a dummy (unique) image per cluster and this image
should be easily identified.
* During Promote operation on any image enable the
mirroring on the dummy image. when we enable the mirroring
on the dummy image the pool will get updated and the
scheduling will be reconfigured.
* During Demote operation on any image disable the mirroring
on the dummy image. the disable need to be done to enable
the mirroring again when we get the promote request to make
the image as primary
* When the DR is no more needed, this image need to be
manually cleanup as for now as we dont want to add a check
in the existing DeleteVolume code path for delete dummy image
as it impact the performance of the DeleteVolume workflow.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Moved to add scheduling to the promote
operation as scheduling need to be added
when the image is promoted and this is
the correct method of adding the scheduling
to make the scheduling take place.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
instead of logging the volumeID and the pool
name. log the poolname and image name for better
debugging.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
On line 341 a `transaction` is created. This is passed to the deferred
`undoStagingTransaction()` function when an error in the
`NodeStageVolume` procedure is detected. So far, so good.
However, on line 356 a new `transaction` is returned. This new
`transaction` is not used for the defer call.
By removing the empty `transaction` that is used in the defer call, and
calling `undoStagingTransaction()` on an error of `stageTransaction()`,
the code is a little simpler, and the cleanup of the transaction should
be done correctly now.
Updates: #2610
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Problem:
On remap/attach of device (i.e. nodeplugin restart), there is no way
for rbd-nbd to defend if the backend storage is matching with the initial
backend storage.
Say, if an initial map request for backend "pool1/image1" got mapped to
/dev/nbd0 and the userspace process is terminated (on nodeplugin restart).
A next remap/attach (nodeplugin start) request within reattach-timeout is
allowed to use /dev/nbd0 for a different backend "pool1/image2"
For example, an operation like below could be dangerous:
$ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4"
$ sudo pkill -15 rbd-nbd <-- nodeplugin terminate
$ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs"
Solution:
rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify
between the device and the backend, so that when a remap/attach request is
made, rbd-nbd can compare and avoid such dangerous operations.
With the provided solution, as part of the initial map request, backend
cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so
that on a remap/attach request rbd-nbd will check and validate if the
backend per device cookie matches with the initial map backend with the help
of cookie.
At Ceph-csi we use VOLID as device cookie, which will be unique, we pass
the VOLID as cookie at map and use the same at the time of attach, that
way rbd-nbd can identify backends and their matching devices.
Requires:
https://github.com/ceph/ceph/pull/41323https://lkml.org/lkml/2021/4/29/274
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This change allows the user to choose not to fallback to NBD mounter
when some ImageFeatures are absent with krbd driver, rather just fail
the NodeStage call.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Currently, we recognize and warn for the provided image features based on
our prior intelligence at ceph-csi (i.e based on supportedFeatures map
and validateImageFeatures) at image/PV creation time. It might be very
much possible that the cluster is heterogeneous i.e. the PV creation and
application container might both be on different nodes with different
kernel versions (krbd driver versions).
This PR adds a mechanism to check for the supported krbd features during
mount time, if the krbd driver doesn't have the specified image feature
then it will fall back to rbd-nbd mounter.
Fixes: #478
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
this commit make use of the migration request secret parsing and set
the required fields for further nodestage operations
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>