At present the KMS structs are exported and ideally we should be
able to work without exporting the same.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Currently, we are using methods and all the methods
makes a network call to fetch details from the ceph
clusters, its difficult to write test cases for
these functions, if we move to the interfaces
we can make use of mock to write unit testing
for the caller functions.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
To be consistent with other components and also to explictly
state it belong to `ibm keyprotect` service introducing this
change
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
currently we are overriding the permission to `0o777` at time of node
stage which is not the correct action. That said, this permission
change causes an extra permission correction at time of nodestaging
by the CO while the FSGROUP change policy has been set to
`OnRootMismatch`.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
as ioutil.ReadFile is deprecated and
suggestion is to use os.ReadFile as
per https://pkg.go.dev/io/ioutil updating
the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as ioutil.WriteFile is deprecated and
suggestion is to use os.WriteFile as
per https://pkg.go.dev/io/ioutil updating
the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently most of the internal methods have the
rbdVolume as the received. As these methods
are completely internal and requires only
the fields of the rbdImage use rbdImage
as the receiver instead of rbdVolume.
updates #2742
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
During CreateVolume from snapshot/volume,
its difficult to identify if the clone is
failed and a new clone is created. In case
of clone failure logging the error message
for better debugging.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as per the CSI standard the size is optional parameter,
as we are allowing the clone to a bigger size
today we need to block the clone to a smaller size
as its a have side effects like data corruption etc.
Note:- Even though this check is present in kubernetes
sidecar as CSI is CO independent adding the check
here.
updates: #2718
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as per the CSI standard the size is optional parameter,
as we are allowing the restore to a bigger size
today we need to block the restore to a smaller size
as its a have side effects like data corruption.
Note:- Even though this check is present in kubernetes
sidecar as CSI is CO independent adding the check
here.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently, as a workaround, we are calling
the resize volume on the cloned, restore volumes
to adjust the cloned, restored volumes.
With this fix, we are calling the resize volume
only if there is a size mismatch with requested
and the volume from which the new volume needs
to be created.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
SINGLE_NODE_WRITER capability ambiguity has been fixed in csi spec v1.5
which allows the SP drivers to declare more granular WRITE capability in form
of SINGLE_NODE_SINGLE_WRITER or SINGLE_NODE_MULTI_WRITER.
These are not really new capabilities rather capabilities introduced to
get the desired functionality from CO side based on the capabilities SP
driver support for various CSI operations, this new capabilities also help
to address new access mode RWOP (readwriteoncepod).
This commit adds a helper function which identity the request is of
multiwriter mode and also validates whether it is filesystem mode or
block mode. Based on the inspection it fails to allow multi write
requests for filesystem mode and only allow multi write request against
block mode.
This commit also adds unit tests for isMultiWriterBlock function which
validates various accesstypes and accessmodes.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
SINGLE_NODE_WRITER capability ambiguity has been fixed in csi spec v1.5
which allows the SP drivers to declare more granular WRITE capability.
These are not really new capabilities rather capabilities introduced to
get the desired functionality from CO side based on the capabilities SP
driver support for various CSI operations.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit adds optional BaseURL and TokenURL configuration to
key protect/hpcs configuration and client connections, if not
provided default values are used.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
implement UnfenceClusterNetwork grpc call
which allows to unblock the access to a
CIDR block by removing it from network fence.
Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
implement FenceClusterNetwork grpc call which
allows to blocks access to a CIDR block by
creating a network fence.
Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
Convert the CIDR block into a range of IPs,
and then add network fencing via "ceph osd blocklist"
for each IP in that range.
Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
This commit removes rbdVol.getTrashPath() function
since it is no longer being used due to introduction
of go-ceph rbd admin task api for deletion.
Signed-off-by: Rakshith R <rar@redhat.com>
With introduction of go-ceph rbd admin task api, credentials are
no longer required to be passed as cli cmd is not invoked.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit removes `rv.Connect(cr)` since the rbdVolume should
have an active connection in this stage of the function call.
`rv.getCloneDepth(ctx)` will work after a connect to the cluster.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds support to go-ceph rbd task api
`trash remove` and `flatten` instead of using cli
cmds.
Fixes: #2186
Signed-off-by: Rakshith R <rar@redhat.com>
considering IBM has different crypto services (ex: SKLM) in place, its
good to keep the configmap key names with below format
`IBM_KP_...` instead of `KP_..`
so that in future, if we add more crypto services from IBM we can keep
similar schema specific to that specific service from IBM.
Ex: `IBM_SKLM_...`
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
The CSI Controller (provisioner) can call `rbd sparsify` to reduce the
space consumption of the volume.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
use ExecCommandWithTimeout with timeout
of 1 minute for the promote operation.
If the command doesnot returns error/response
in 1 minute the process will be killed
and error will be returned to the user.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added ExecCommandWithTimeout helper function
to execute the commands with the timeout option,
if the command does not return any response with
in the timeout time the process will be terminated
and error will be returned back to the user.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
after creating the rbd image log the image
details corresponding for the request along
with the request name.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as getImageInfo is already called inside
cloneRbdImageFromSnapshot function right
after creating the clone. remove the extra
API call to get the details again.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
after creating the clone get the current
image details like size, creationTime,
imageFeatures etc from the ceph cluster.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
moved ParentName, ParentPool and ImageFeatureSet
fields to the rbdImage struct as these are the
first citizens on the rbdImage.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the volume with a bigger size is created
from a snapshot or from another volume we
need to exapand the filesystem also in the
csidriver as nodeExpand request is not triggered
for this one, During NodeStageVolume we can
expand the filesystem by checking filesystem
needs expansion or not.
If its a encrypted device, check the device
size of rbd device and the LUKS device if required
the device will be expanded before
expanding the filesystem.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the requested volume size is greater than
the snapshot size, resize the cloned volume
after creating a clone from a snapshot.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the requested volume size is greater than
the parent volume size, resize the cloned volume
after creating a final clone from a parent volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added a check to consider ErrImageNotFound error
during DeleteSnapshot operation, if the error
is ErrImageNotFound we need to ensure that image
is removed from the trash and also the rados
OMAP data is removed.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we need actual size of the rbdVolume
created for the snapshot, as we are not
storing the size of the snapshot in OMAP
we need to fetch the size from ceph cluster
and update the same on rbdSnapshot struct.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as we are moving the VolSize to rbdImage struct
we should reuse the same instead of maintaining
one more field in rbdSnapshot struct.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
move the Volsize to the rbdImage struct
as size is more applicable for rbdImage
as rbdImage is used for both rbdVolume
and rbdSnapshot.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as we are no longer supporting the v1.x
version of cephcsi. removing the json tag
used to store rbd volume details in configmap.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
when doing the internal operation to get the
latest details the rbd image size is also getting
updated and this will update the volume size also
without actual requested size we cannot do the
resize operation for bigger clones. This commit
adds a new field called RequestedVolSize to rbdVolume
struct to hold the user requested size.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added a new helper function called cleanupThickClone
to cleanup the snapshot and clone if the thick
provisioning is not fully completed.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
remove the bigger size validation when
creating a volume from a snapshot or when
creation a clone from a volume as we resized
the volume after cloning.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
dummy image rbdVolume struct is derived
from the actual one rbdVolume of the
volumeID sent in the EnableVolumeReplication
request. and the dummy rbdVolume struct contains
the image id of the actual volume because
of that when we are repairing the dummy
image the image is sent to trash but not
deleted due to the wrong image ID. resetting
the image id will makes sure the image id
is fetching from ceph cluster and same
image id will be used for manager operation.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit adds the support for HPCS/Key Protect IBM KMS service
to Ceph CSI service. EncryptDEK() and DecryptDEK() of RBD volumes are
done with the help of key protect KMS server by wrapping and unwrapping
the DEK and by using the DEKStoreMetadata.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
The ceph changes are done on the both server and the
client side this change is not enough for remove
setting the size of cloned volumes.
this caused the regression like #2719#2720#2721#2722.
This reverts commit 3565a342d5.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
IsBlockMultiNode() is a new helper that takes a slice of
VolumeCapability objects and checks if it includes multi-node access
and/or block-mode support.
This can then easily be used in other services that need checking for
these particular capabilities, and preventing multi-node block-mode
access.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
we have added clusterID mapping to identify the volumes
in case of a failover in Disaster recovery in #1946.
with #2314 we are moving to a configuration in
configmap for clusterID and poolID mapping.
and with #2314 we have all the required information
to identify the image mappings.
This commit removes the workaround implementation done
in #1946.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The rbd package contains several functions that can be used by
CSI-Addons Service implmentations. Unfortunately it is not possible to
do this, as the rbd-driver needs to import the csi-addons/rbd package to
provide the CSI-Addons server. This causes a circular import when
services use the rbd package:
- rbd/driver.go import csi-addons/rbd
- csi-addons/rbd import rbd (including the driver)
By moving rbd/driver.go into its own package, the circular import can be
prevented.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
HexStringToInteger() used to return a uint64, but everywhere else uint
is used. Having HexStringToInteger() return a uint as well makes it a
little easier to use when setting it with SetGlobalInt().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the rbd-driver starts, it initializes some global (yuck!) variables
in the rbd package. Because the rbd-driver is moved out into its own
package, these variables can not easily be set anymore.
Introcude SetGlobalInt(), SetGlobalBool() and InitJournals() so that the
rbd-driver can configure the rbd package.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbd-driver calls rbd.runVolumeHealer() which is not available
outside the rbd package. By moving the rbd-driver into its own package,
RunVolumeHealer() needs to be exported.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
NodeServer.mounter is internal to the NodeServer type, but it needs to
be initialized by the rbd-driver. The rbd-driver is moved to its own
package, so .Mounter needs to be available from there in order to set
it.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
genVolFromVolID() is used by the CSI Controller service to create an
rbdVolume object from a CSI volume_id. This function is useful for
CSI-Addons Services as well, so rename it to GenVolFromVolID().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
k8s.io/utils/mount has moved to k8s.io/mount-utils, and Ceph-CSI uses
that already in most locations. Only internal/util/util.go still imports
the old path.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The dummy image will be created with 1Mib size.
during the snapshot transfer operation the 1Mib
will be transferred even if the dummy image doesnot
contains any data. adding the new image features
`fast-diff,layering,obj-map,exclusive-lock`on the
dummy image will ensure that only the diff is
transferred to the remote cluster.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we added a workaround for rbd scheduling by creating
a dummy image in #2656. with the fix we are creating
a dummy image of the size of the first actual rbd
image which is sent in EnableVolumeReplication request
if the actual rbd image size is 1TiB we are creating
a dummy image of 1TiB which is not good. even though
its a thin provisioned rbd images this is causing
issue for the transfer of the snapshot during
the mirroring operation.
This commit recreates the rbd image with 1MiB size
which is the smaller supported size in rbd.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
rbd mirroring CLI calls are async and it doesn't wait
for the operation to be completed. ex:- `rbd mirror image enable`
it will enable the mirroring on the image but it doesn't
ensure that the image is mirroring enabled and healthy
primary. The same goes for the promote volume also.
This commits adds a check-in PromoteVolume to make sure
the image in a healthy state i.e `up+stopped`.
note:- not considering any intermediate states to make
sure the image is completely healthy before responding
success to the RPC call.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Journal-based RADOS block device mirroring ensures point-in-time
consistent replicas of all changes to an image, including reads and
writes, block device resizing, snapshots, clones, and flattening.
Journaling-based mirroring records all modifications to an image in the
order in which they occur. This ensures that a crash-consistent mirror
of an image is available.
Mirroring when configured in journal mode, mirroring will
utilize the RBD journaling image feature to replicate the image
contents. If the RBD journaling image feature is not yet enabled on the
image, it will be automatically enabled.
Fixes: #2018
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Depending on the way Ceph-CSI is deployed, the capabilities will be
configured for the GetCapabilities procedure. The other procedures are
more straight-forward.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
After adding the new CSI-Addons Server, golang-ci complains that
driver.Run() is too complex. By moving the profiling checks and starting
of the go-routines in their own function, golang-ci is happy again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add a new CSI-Addons Server and empty Identity Service for the RBD
plugin. The implementation of the Identity Service procedure calls will
be done in other PRs.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
currently we are fist operating on the dummy
image to refresh the pool and then we are adding
the scheduling. we think the scheduling should
be added first and than we should refresh the
pool. If we do this all the existing schedules
will be considered from the scheduler.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
with shallow copy of rbdVol to dummyVol
the image name update of the dummyVol is getting
reflected on the rbdVol which we dont want.
do deep copy to avoid this problem.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Uses the below schema to supply mounter specific map/unmapOptions to the
nodeplugin based on the discussion we all had at
https://github.com/ceph/ceph-csi/pull/2636
This should specifically be really helpful with the `tryOthermonters`
set to true, i.e with fallback mechanism settings turned ON.
mapOption: "kbrd:v1,v2,v3;nbd:v1,v2,v3"
- By omitting `krbd:` or `nbd:`, the option(s) apply to
rbdDefaultMounter which is krbd.
- A user can _override_ the options for a mounter by specifying `krbd:`
or `nbd:`.
mapOption: "v1,v2,v3;nbd:v1,v2,v3"
is effectively the same as the 1st example.
- Sections are split by `;`.
- If users want to specify common options for both `krbd` and `nbd`,
they should mention them twice.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The dummy mirror image needs to be disabled and then
reenabled for mirroring, to ensure a newly promoted
primary is now starting to schedule snapshots.
Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
currently we have a bug in rbd mirror scheduling module.
After doing failover and failback the scheduling is not
getting updated and the mirroring snapshots are not
getting created periodically as per the scheduling
interval. This PR workarounds this one by doing below
operations
* Create a dummy (unique) image per cluster and this image
should be easily identified.
* During Promote operation on any image enable the
mirroring on the dummy image. when we enable the mirroring
on the dummy image the pool will get updated and the
scheduling will be reconfigured.
* During Demote operation on any image disable the mirroring
on the dummy image. the disable need to be done to enable
the mirroring again when we get the promote request to make
the image as primary
* When the DR is no more needed, this image need to be
manually cleanup as for now as we dont want to add a check
in the existing DeleteVolume code path for delete dummy image
as it impact the performance of the DeleteVolume workflow.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Moved to add scheduling to the promote
operation as scheduling need to be added
when the image is promoted and this is
the correct method of adding the scheduling
to make the scheduling take place.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
instead of logging the volumeID and the pool
name. log the poolname and image name for better
debugging.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If we hit any error while running the cryptosetup
commands we are logging only the error message.
with only error message it is difficult to analyze
the problem, logging the stdError will help us to
check what is the problem.
updates: #2610
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
On line 341 a `transaction` is created. This is passed to the deferred
`undoStagingTransaction()` function when an error in the
`NodeStageVolume` procedure is detected. So far, so good.
However, on line 356 a new `transaction` is returned. This new
`transaction` is not used for the defer call.
By removing the empty `transaction` that is used in the defer call, and
calling `undoStagingTransaction()` on an error of `stageTransaction()`,
the code is a little simpler, and the cleanup of the transaction should
be done correctly now.
Updates: #2610
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This log line is seen frequently in the logs and its better to be at
Warning loglevel rather than Error based on its severity
E1109 08:30:45.612395 38328 util.go:247] kernel 4.19.202 does not support required features
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Problem:
On remap/attach of device (i.e. nodeplugin restart), there is no way
for rbd-nbd to defend if the backend storage is matching with the initial
backend storage.
Say, if an initial map request for backend "pool1/image1" got mapped to
/dev/nbd0 and the userspace process is terminated (on nodeplugin restart).
A next remap/attach (nodeplugin start) request within reattach-timeout is
allowed to use /dev/nbd0 for a different backend "pool1/image2"
For example, an operation like below could be dangerous:
$ sudo rbd-nbd map --try-netlink rbd-pool/ext4-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="bfc444b4-64b1-418f-8b36-6e0d170cfc04" TYPE="ext4"
$ sudo pkill -15 rbd-nbd <-- nodeplugin terminate
$ sudo rbd-nbd attach --try-netlink --device /dev/nbd0 rbd-pool/xfs-image
/dev/nbd0
$ sudo blkid /dev/nbd0
/dev/nbd0: UUID="d29bf343-6570-4069-a9ea-2fa156ced908" TYPE="xfs"
Solution:
rbd-nbd/kernel now provides a way to keep some metadata in sysfs to identify
between the device and the backend, so that when a remap/attach request is
made, rbd-nbd can compare and avoid such dangerous operations.
With the provided solution, as part of the initial map request, backend
cookie (ceph-csi VOLID) can be stored in the sysfs per device config, so
that on a remap/attach request rbd-nbd will check and validate if the
backend per device cookie matches with the initial map backend with the help
of cookie.
At Ceph-csi we use VOLID as device cookie, which will be unique, we pass
the VOLID as cookie at map and use the same at the time of attach, that
way rbd-nbd can identify backends and their matching devices.
Requires:
https://github.com/ceph/ceph/pull/41323https://lkml.org/lkml/2021/4/29/274
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This change allows the user to choose not to fallback to NBD mounter
when some ImageFeatures are absent with krbd driver, rather just fail
the NodeStage call.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>