By returning a connected rbdVolume in parseVolCreateRequest(), the
CreateVolume() function can be simplified a little. There is no need to
call the additional Connect() and detect failures with it.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Not all snapshot objects are free'd correctly after they were allocated.
It is possible that some connections to the Ceph cluster were never
closed. This does not need to be a noticeable problem, as connections
are re-used where possible, but it isn't clean either.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Just like GenVolFromVolID() the genSnapFromSnapID() function can return
a snapshot. There is no need to allocated an empty snapshot and pass
that to the genSnapFromSnapID() function.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
if rbd storage class is created with topologyconstraintspools
replicated pool was still mandatory, making the pool optional if the
topologyconstraintspools is requested
Closes: https://github.com/ceph/ceph-csi/issues/4380
Signed-off-by: parth-gr <partharora1010@gmail.com>
Added unit test for
validateVolumeGroupSnapshotRequest API which
validates the input VolumeGroupSnapshotRequest
request
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
adding UnimplementedGroupControllerServer to
the DefaultControllerServer struct to avoid
build errors when some non mandatory RPC's
are not implemented.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
implemented DeleteVolumeGroupSnapshot RPC which
does below operations
* Basic request validation
* Get the snapshotId's and volumeId's
mapping reserved for the UUID
* Delete snapshot and remove its mapping
from the omap
* Repeat above steps until all the mapping
are removed
* Remove the reserved uuid from the omap
* Reset the filesystem quiesce, This might be
required as cephfs doesnt provide any options to
remove the quiesce, if we get any request with same
ID again we can reuse the quiesce API for same set-id
* Return success if the received error is
Pool not found or key not found.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
implemented CreateVolumeGroupSnapshot RPC which
does below operations
* Basic request validation
* Reserve the UUID for the group name
* Quiesce the filesystem for all the subvolumes
from the input volumeId's
* Take the snapshot for all the input volumeId's
* Add the mapping between volumeId's and snapshot
Id's in omap
* Release the quiesce for the filesystem for
all the subvolumes from the input volumeId's
Undo all the operations if anything fails.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
volumegroup.go holders all the helpers
to extra the group details from the request
and also to extra group details from the
groupID.
This also provide helpers to reserve group
for the request Name and also an undo function
incase if somethings goes wrong and we need to
cleanup the reserved omap entries.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Adding a lock for the volumegroup so
that we can take care of serializing
the same requests to ensure same requests
are not served in parallel.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added validateCreateVolumeGroupSnapshotRequest
to validate the CreateVolumeGroupSnapshotRequest
request and ensure that all the requirement
options are set. if not, reject the RPC request.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Same group jounral config need to be reused
for multiple connection where different monitors
and users are used, for that reason create a unique
connection each time.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The only encoding version that exists is `1`. There is no need to have
multiple constants for that version across different packages. Because
there is only one version, `GenerateVolID()` does not really require it,
and it can use a default version.
If there is a need in the future to support an other encoding version,
this can be revisited with a cleaner solution.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
The VolumeGroupJournal interface does not need to return anything except
for a potential error. Any instance that implements the
VolumeGroupJournal interface can be used to call all functions.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Padding a passphrase with null chars to arrive at a 32-byte length
later forces a user to also pass null chars via the term when
attempting to manually unlock a subvolume via the fscrypt cli tools.
This also had a side-effect of truncating any longer length passphrase
down to a shorter 32-byte length.
fixup for:
cfea8d7562dd0e1988c0
Signed-off-by: Michael Fritch <mfritch@suse.com>
fscrypt will infinitely retry the keyFn during an auth failure,
preventing the csi driver from progressing when configured with
an invalid passphrase
See also:
8c12cd64ab/actions/callback.go (L102-L106)
Signed-off-by: Michael Fritch <mfritch@suse.com>
This commit logs sitestatues and description in
GetVolumeReplicationInfo RPC call for better
debuging.
Fixes: #4430
Signed-off-by: Yati Padia <ypadia@redhat.com>
currently we are not logging the RequestID
for the replication RPC calls. This PR
adds the replication case to the getReqID
function.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added ValidateGroupControllerServiceRequest
helper function which can be used to validate the
group controller service request.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added CreateVolumeGroupSnapshotRequest and
DeleteVolumeGroupSnapshotRequest to the
getReqID so that we can get the ReqID for
the logging.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added GroupControllerGetCapabilities RPC
to the default controller server which returns
the group capabilities which are already set.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Added helper function to add the group
controller capabilities which needs to
be included by csi driver that wants to
implement group controller.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Implement the required function to store/retrieve
the details from the omap for the volumegroup.
This adds a new omap object that contains the
mapping of the RequestName and all the volumeID
and its corresponding snapshotID belonging to a
group.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Added a implementation for the listOmapVals
which list the object keys and values from
the rados omap.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.
Fixes: #4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
consider fsName optional for static volume
as it is not required to be set during mount
operation with fuse and kernel client.
fixes: #4311
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The `rbdGetDeviceList()` function uses two very similar types for
converting krbd and NBD device information from JSON. There is no need
to use this distinction, and callers of `rbdGetDeviceList()` should not
need to care about it either.
By introducing a `deviceInfo` interface with Get-functions, the
`rbdGetDeviceList()` function becomes a little simpler, with a clearly
defined API for the returned list.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This is to pre-emptively add check for EAGAIN error returned from
ceph as part of https://github.com/ceph/ceph/pull/52670 if all the
clone threads are busy and return csi compatible error.
Fixes: #3996
Signed-off-by: karthik-us <ksubrahm@redhat.com>
The ceph fs subvolume resize support is available
in all the active ceph releases. Hence removing the
code to check the supportability of the feature.
Signed-off-by: karthik-us <ksubrahm@redhat.com>
This commit makes use of crush location labels from node
labels to supply `crush_location` and `read_from_replica=localize`
options during mount. Using these options, cephfs
will be able to redirect reads to the closest OSD,
improving performance.
Signed-off-by: Praveen M <m.praveen@ibm.com>
Snapshot procedures do not seem to contain the `Req-ID:` prefix in the
logs anymore (or weren't they there at all?) for some reason. This adds
them back.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Implemented the capability to include kernel mount options and
fuse mount options for individual clusters within the ceph-csi-config
ConfigMap.This allows users to configure the kernel/fuse mount options
for each cluster separately. The mount options specified in the ConfigMap
will supersede those provided via command line arguments.
Signed-off-by: Praveen M <m.praveen@ibm.com>
This commit adds GetCephFSMountOptions util method which returns
KernelMountOptions and fuseMountOptions for cluster `clusterID`.
Signed-off-by: Praveen M <m.praveen@ibm.com>
Implemented the capability to include read affinity options
for individual clusters within the ceph-csi-config ConfigMap.
This allows users to configure the crush location for each
cluster separately. The read affinity options specified in
the ConfigMap will supersede those provided via command line arguments.
Signed-off-by: Praveen M <m.praveen@ibm.com>
If any operations like Resize, Deleting
snapshot fails, we need to remove
both snapshot and the clone to avoid
resource leak.
closes: #4218
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The ReplicationServer is not used anymore, the functionality has moved
to CSI-Addons and the `internal/csi-addons/rbd` package. These last
references were not activated anywhere, so can be removed without any
impact.
See-also: #3314
Signed-off-by: Niels de Vos <ndevos@ibm.com>
When FilesystemNodeGetVolumeStats() succeeds, the volume must be
healthy. This can be included in the VolumeCondition CSI message by
default.
Checks that detect an abnormal VolumeCondition should prevent calling
FilesystemNodeGetVolumeStats() as it is possible that the function will
hang.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
The HealthChecker is configured to use the Staging path pf the volume,
with a `.csi/` subdirectory. In the future this directory could be a
directory that is not under the Published directory.
Fixes: #4219
Signed-off-by: Niels de Vos <ndevos@ibm.com>
re-arrange the struct members to
fix below lint issue
```
struct of size 336 bytes could be of size 328 bytes
```
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit eliminates the code for protecting and unprotecting
snapshots, as the functionality to protect and unprotect snapshots
is being deprecated.
Signed-off-by: Praveen M <m.praveen@ibm.com>
this commit adds client eviction to cephfs, based
on the IPs in cidr block, it evicts those IPs from
the network.
Signed-off-by: Riya Singhal <rsinghal@redhat.com>
Issue:
The RoundOffCephFSVolSize() function omits the fractional
part when calculating the size for cephfs volumes, leading
to the created volume capacity to be lesser than the requested
volume capacity.
Fix:
Consider the fractional part during the size calculation so the
rounded off volume size will be greater than or equal to the
requested volume size.
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Fixes: #4179
Multiple go-routines may simultaneously create the
subVolumeGroupCreated map or write into it
for a particular group.
This commit safeguards subVolumeGroupCreated map
from concurrent creation/writes while allowing for multiple
readers.
Signed-off-by: Rakshith R <rar@redhat.com>
Multiple go-routines may simultaneously check for a clusterID's
presence in clusterAdditionalInfo and create an entry if it is
absent. This set of operation needs to be serialized.
Therefore, this commit safeguards clusterAdditionalInfo map
from concurrent writes with a mutex to prevent the above problem.
Signed-off-by: Rakshith R <rar@redhat.com>
This PR updates the snapshot RbdImageName in
`createSnapshot` method. This resolves the
incorrect statement logged during snapshot creation.
Signed-off-by: Praveen M <m.praveen@ibm.com>
This commit updates the snapshot RbdImageName with the clone
RbdImageName before snapshot creation. This will fix the
incorrect log statement.
Signed-off-by: Praveen M <m.praveen@ibm.com>
During ResyncVolume call, discard not found
error from GetMetadata API. If the image gets
resynced the local image creation time will be
lost, if the key is not present in the image
metadata then we can assume that the image is
already resynced.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add support to create RWX clone from the
ROX clone, in ceph no subvolume clone is
created when ROX clone is created from a
snapshot just a internal ref counter is
added. This PR allows creating a RWX clone
from a ROX clone which allows users to create
RW copy of PVC where cephcsi will identify
the snapshot created for the ROX volume and
creates a subvolume from the CephFS snapshot.
updates: #3603
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
During the Demote volume store
the image creation timestamp.
During Resync do below operation
* Check image creation timestamp
stored during Demote operation
and current creation timestamp during Resync
and check both are equal and its for
force resync then issue resync
* If the image on both sides is
not in unknown state, check
last_snapshot_timestamp on the
local mirror description, if its present
send volumeReady as false or else return
error message.
If both the images are in up+unknown the
send volumeReady as true.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Not sure why but go-lint is failing
with below error and this fix is required
to make it pass
```
directive `//nolint:staticcheck // See comment above.`
is unused for linter "staticcheck" (nolintlint)
```
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
MetricsBindAddress is replaced by Metrics in the
controller-runtime manager options in version 0.16.0
as part of
e59161ee8f
Updating the same here.
Signed-off-by: karthik-us <ksubrahm@redhat.com>
Set VolumeOptions.Pool parameter to empty for Snapshot-backed volumes.
This Pool parameter is optional and only used as 'pool-layout' parameter
during subvolume and subvolume clone create request in cephcsi
and not used for Snapshot-backed volume at all.
It is not saved anywhere for use in subsequent operations after create too.
Therefore, We can set it to empty and not error out.
Signed-off-by: rakshith-r <rar@redhat.com>
This commit makes sure sparsify() is not run when rbd
image is in use.
Running rbd sparsify with workload doing io and too
frequently is not desirable.
When a image is in use fstrim is run and sparsify will
be run only when image is not mapped.
Signed-off-by: Rakshith R <rar@redhat.com>
The clients parameter in the storage class is used to limit access to
the export to the set of hostnames, networks or ip addresses specified.
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attempted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.
In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.
When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.
Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.
Fixes: #3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit modifies code to handle last sync duration being
empty & 0,returning nil & 0 on encountering it respectively.
Earlier both case return 0. Test case is added too.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit get more information from the description
like lastsyncbytes and lastsyncduration and send them
as a response of getvolumereplicationinfo request.
Signed-off-by: Yati Padia <ypadia@redhat.com>
this commit migrates the replication controller server
from internal/rbd and adds it to csi-addons.
Signed-off-by: riya-singhal31 <rsinghal@redhat.com>
this commit removes grpc import from replication.go
and replaced it with usual errors and passed gRPC
responses in csi-addons
Signed-off-by: riya-singhal31 <rsinghal@redhat.com>
There is no release for sigs.k8s.io/controller-runtime that supports
Kubernetes v1.27. The main branch has all the required modifications, so
we can use that for the time being.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
golangci-lint reports that `grpc_middleware.WithUnaryServerChain` is
deprecated and `google.golang.org/grpc.ChainUnaryInterceptor` should be
used instead.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
CephNFS can enable different security flavours for exported volumes.
This can be configured in the optional `secTypes` parameter in the
StorageClass.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
By default, `cryptsetup luksFormat` uses Argon2i as Password-Based Key
Derivation Function (PBKDF), which not only has a CPU cost, but also a memory
cost (to make brute-force attacks harder).
The memory cost is based on the available system memory by default, which in
the context of Ceph CSI can be a problem for two reasons:
1. Pods can have a memory limit (much lower that the memory available on the
node, usually) which isn't taken into account by `cryptsetup`, so it can get
OOM-killed when formating a new volume;
2. The amount of memory that was used during `cryptsetup luksFormat` will then
be needed for `cryptsetup luksOpen`, so if the volume was formated on a node
with a lot of memory, but then needs to be opened on a different node with
less memory, `cryptsetup` will get OOM-killed.
This commit sets the PBKDF memory limit to a fixed value to ensure consistent
memory usage regardless of the specifications of the nodes where the volume
happens to be formatted in the first place.
The limit is set to a relatively low value (32 MiB) so that the `csi-rbdplugin`
container in the `nodeplugin` pod doesn't require an extravagantly high memory
limit in order to format/open volumes (particularly with operations happening
in parallel), while at the same time not being so low as to render it
completely pointless.
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
Add `mkfsOptions` to the StorageClass and pass them to the `mkfs`
command while creating the filesystem on the RBD device.
Fixes: #374
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Storing the default `mkfs` arguments in a map with key per filesystem
type makes this a little more modular. It prepares th code for fetching
the `mkfs` arguments from the VolumeContext.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit makes use of crush location labels from node
labels to supply `crush_location` and `read_from_replica=localize`
options during rbd map cmd. Using these options, ceph
will be able to redirect reads to the closest OSD,
improving performance.
Signed-off-by: Rakshith R <rar@redhat.com>
The StagingTargetPath is an optional entry in
NodeExpandVolumeRequest, We cannot expect it to be
set always and at the same time cephcsi depended
on the StaingTargetPath to retrieve some metadata
information.
This commit will check all the mount ref and identifies
the stagingTargetPath by checking the image-meta.json
file exists and this is a costly operation as we need to
loop through all the mounts and check image-meta.json
in each mount but this is happens only if the
StaingTargetPath is not set in the NodeExpandVolumeRequest
fixes#3623
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
set disableInUseChecks on rbd volume struct
as it will be used later to check whether
the rbd image is allowed to mount on multiple
nodes.
fixes: #3604
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
We should not call ExpandVolume for the BackingSnapshot
subvolume as there wont be any real subvolume created for
it and even if we call it the ExpandVolume will fail
fail as there is no real subvolume exists.
This commits fixes by adjusting the `if` check to ensure
that ExpandVolume will only be called either the
VolumeRequest is to create from a snapshot or volume
and BackingSnapshot is not true.
sample code here https://go.dev/play/p/PI2tNii5tTg
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add Ceph FS fscrypt support, similar to the RBD/ext4 fscrypt
integration. Supports encrypted PVCs, snapshots and clones.
Requires kernel and Ceph MDS support that is currently not in any
stable release.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
this commit remove the protobuf dependency locking in the module
description.
Also, ptypes.TimestampProto is deprecated and this commit
make use of the timestamppb.New() for the construction.
ParseTime() function has been removed and callers adjusted to the
same.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We need to unset the metadata on the clone
and restore PVC if the parent PVC was created
when setmetadata was set to true and it was
set to false when restore and clone pvc was
created.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
`ceph osd blocklist range add/rm <ip>` cmd is outputting
"blocklisting cidr:10.1.114.75:0/32 until 202..." messages
incorrectly into stdErr. This commit ignores stdErr when err
is nil.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds code to setup encryption on a rbdVol
being repaired in a followup CreateVolume request.
This is fixes a bug wherein encryption metadata may not
have been set in previous request due to container restart.
Fixes: #3402
Signed-off-by: Rakshith R <rar@redhat.com>
Checking volume details for the existing volumeID
first. if details like OMAP, RBD Image, Pool doesnot
exists try to use clusterIDMapping to look for the
correct informations.
fixes: #2929
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Sometime the json unmarshal might
get success and return empty time
stamp. add a check to make sure the
time is not zero always.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
As per the csiaddon spec last sync time is
required parameter in the GetVolumeReplicationInfo
if we are failed to parse the description, return
not found error message instead of nil
which is empty response
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rados object.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>