Commit Graph

1348 Commits

Author SHA1 Message Date
Niels de Vos
437d90c84d rbd: do not start the healer for NBD on non-Kubernetes platforms
When running on Docker Swarm, the RBD-healer fails with an error like:

> healer had failures, err failed to get cluster config: unable to load
> in-cluster configuration, KUBERNETES_SERVICE_HOST and
> KUBERNETES_SERVICE_PORT must be defined

Before starting the healer, check if we're running on Kubernetes, so
that non-Kubernetes platforms do not get confusing warnings.

Updates: #3769
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-04-02 13:59:11 +00:00
monoamin
71decb822d rbd: Register FenceController only once
Running cephcsi in docker swarm currently requires serving both
the nodeserver and controllerserver over the same socket.
This leads to errors like

> FATAL: [core] grpc: Server.RegisterService found duplicate
> service registration for \"fence.FenceController\""

...since `FenceController` is registererd once per server type.

Commit proposes simple fix by registering `FenceController` only once
when at least one of `IsControllerServer` or `IsNodeServer` is `true`.

Signed-off-by: monoamin <precision1998@gmail.com>
2025-04-01 16:21:40 +00:00
Nikhil-Ladha
23fce43925 rbd: cleanup volume info from group even if image is not part of group
we should continue to cleanup the volume info like the
omap data, mappings from the group if the image is not
part of the goup anymore.

Signed-off-by: Nikhil-Ladha <nikhilladha1999@gmail.com>
2025-04-01 12:34:03 +00:00
Niels de Vos
3f33e87e70 rbd: improve the description for GetID() and GetName() interfaces
The `GetID()` and `GetName()` functions can be confusing, as names and
ID's are not always distinctive enough. The name is used to reference an
object that exists in a pool. The ID the CSI-handle formatted and can be
used to locate the entry for the object in the journal.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-27 14:09:44 +00:00
Niels de Vos
63df17171a rbd: use the existing VolumeGroup if contents are matching
When a VolumeGroup has been created through the CSI-Addons API, the
VolumeGroupSnapshot CSI API will now use the existing VolumeGroup. There
are checks in place to validate that the Volumes in the VolumeGroup
match the Volumes in the VolumeGroupSnapshot request.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-27 14:09:44 +00:00
Niels de Vos
e489413dbd rbd: introduce functions for comparing Volumes in a VolumeGroup
CompareVolumesInGroup() verifies that all the volumes are part of the
given VolumeGroup. It does so by obtaining the VolumeGroupID for each
volume with GetVolumeGroupByID().

The helper VolumesInSameGroup() verifies that all volumes belong to the
same (or no) VolumeGroup. It can be called by CSI(-Addons) procedures
before acting on a VolumeGroup.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-27 14:09:44 +00:00
Niels de Vos
32285c8365 rbd: add MakeVolumeGroupID() utility function
The Manager.MakeVolumeGroupID() function can be used to build a CSI
VolumeGroupID from the backend (pool and name of the RBD-group). This
will be used when checking if an RBD-image belongs to a group already.
It is also possible to resolve the VolumeGroup by passing the
VolumeGroupID to the existing Manager.GetVolumeGroupByID() function.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-27 14:09:44 +00:00
Niels de Vos
a8ee0fe304 rbd: add Manager.getVolumeGroupNamePrefix()
The `prefix` is passed to several functions, but it can easily be
obtained with a small helper function. This makes calling the functions
a little simpler.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-27 14:09:44 +00:00
Niels de Vos
45c91ab0f1 rbd: prevent panic in CreateVolumeGroup if volumeID was not found
When an incorrect volumeID is passed while creating a VolumeGroup, the
`.Destroy()` function caused a panic. By appending each volume to the
volumes slice, the slice won't contain any `nil` volumes anymore.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-27 14:09:44 +00:00
Praveen M
add4b36900 cleanup: move Destroy() method to journalledObject interface
VolumeGroup interface has more than 10 methods and it causes
golangci-lint to fail. Moving the `Destroy()` method to a base
interface journalledObject.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-03-27 09:59:12 +00:00
Praveen M
8d9f353f15 rbd: check for volume group existence
Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-03-27 09:59:12 +00:00
Praveen M
5cbc14454a cleanup: move internal/rbd/errors.go to internal/rbd/errors pacakge
Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-03-27 09:59:12 +00:00
mageekchiu
0c60fd28ea cephfs: upgrading mount syntax
The old syntax is almost deprecated,and there are reasons to upgrade it
- old syntax is lack of fsid(critical for debugging and observability)
- mds_namespace is deprecated, it might be inappropriate to continue using it
- kernel will try new syntax first and then the old one, it's a waste

Signed-off-by: mageekchiu <qiukang@mail.ustc.edu.cn>
2025-03-25 14:39:22 +00:00
Praveen M
0ed0af120b rbd: retain intermediate RBD snapshot on temp image
Currently, Ceph-CSI deletes intermediate RBD snapshot on
temporary cloned images (`csi-vol-xxxx-temp@csi-vol-xxxx`)
which is the parent of the final clone image.

The parent-child mirroring requires both the parent and child
images to be present (i.e, not in trash).

This commit makes enhancement to `createRBDClone` function by
introducing `deleteSnap` parameter. If `deleteSnap` is true,
the snapshot is deleted after the clone is created.

This is required to support mirroring of child image with its
parent image.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-03-18 13:42:11 +00:00
Rakshith R
6f802589aa rbd: add one depth for softlimit of snapshot for restore PVC
Currently, while preparing a volume for snapshot,
the depthToAvoidFlatten is set to 2. This accounts one
for snapshot and another since parent of the volume is
flattened.
This commit modifies the depth to 3 to also account for
future PVC restore since
- snapshot alone is useless and it is very likely to be restore
  at one point in time.
- this ensures snapshot is not flattened when restore does occur.
- flattening of snapshot in the above case will make the snapshot
  no longer eligible for changed block tracking(snap diff)
  operation.
- maintain similarity with PVC-PVC clone operation which currently
  depthToAvoidFlatten set to 1.

Signed-off-by: Rakshith R <rar@redhat.com>
2025-03-14 15:12:27 +00:00
Niels de Vos
7f7988be0d rbd: cleanup NodeServer.createTargetMountPath()
The inverse checking and returning of is-a-mounted-path makes it
difficult to understand the function. It is easier to follow the code
when the function just returns what it says it does, hence added the
comment for the function too.

Some errors were returned directly, others were converted to gRPC
errors. This has been corrected now too, and the caller converts the
plain error to a gRPC error now.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-14 10:27:13 +00:00
Niels de Vos
79cf0321dd util: do not use mount-utils.IsLikelyNotMountPoint anymore
`IsLikelyNotMountPoint()` is an optimized version for `IsMountPoint()`
which can not detect all type of mounts (anymore). The slower
`IsMountPoint()` is more safe to use. This can cause a slight
performance regression in the case there are many mountpoints on the
system, but correctness is more important than speed while mounting.

Fixes: #4633
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-14 10:27:13 +00:00
Rakshith R
796e6b6c44 rbd: use ListChildrenAttributes() instead of ListChildren()
This commit modifies listSnapAndChildren() to make use of
ListChildrenAttributes() instead of ListChildren() which
allows us to filter out images in trash.
This commit also order the alive images so that temp clone
images are followed by images backing volume snapshots so
that temp clone images are flattened first.

Signed-off-by: Rakshith R <rar@redhat.com>
2025-03-12 08:51:02 +00:00
Niels de Vos
15da101b1b util: move kernel version functions to pkg/util/kernel
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-07 16:05:04 +00:00
Niels de Vos
542ed3de63 util: move EncryptionType(s) to pkg/util/crypto
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-03-07 16:05:04 +00:00
Zerotens
5b587c9484 rbd: fix encrypted PVC with metadata KMS cannot be deleted
Signed-off-by: Zerotens <12968743+zerotens@users.noreply.github.com>
2025-02-25 13:51:42 +00:00
Niels de Vos
43b150f14d rbd: return gRPC code Aborted when the RBD-image is in-use on delete
According to the error scheme documented in the CSI specification, the
Aborted error code should be initiate retries, whereas the Internal
error code does not require this behaviour.

When an RBD-image is still in-use, it can not be removed. The
DeleteVolume procedure should be retried and will succeed once the
RBD-image is not in-use anymore.

Fixes: #5166
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-02-24 11:19:17 +00:00
Niels de Vos
ac8cda5e37 rbd: add validation to ToCSI() for rbdVolume and rbdSnapshot
After an unfortnate timed restart of the provisioner, a volume that got
cloned did not get a `rbdVolume.VolID` set. The `.VolID` is used as the
CSI Volume Handle, and is a required attribute.

The `rbdVolume` and `rbdSnapshot` structs have a `.ToCSI()` function
that can do the validation of required attributes. This is now added,
including unit-tests.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-02-20 10:14:29 +00:00
Niels de Vos
b3faa04504 rbd: always include the SourceVolumeID when returning a Snapshot
`doSnapshotClone()` returns a new `rbdVolume` object from a temporary
snapshot. This conversion drops the `SourceVolumeID` attribute, as a
`rbdVolume` does not have that.

After converting the `rbdVolume` back to a `rbdSnapshot`, the
`SourceVolumeID` attribute can be set again, and the `ToCSI()` function
can create an appropriate CSI Snapshot struct.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-02-20 10:14:29 +00:00
ecosysbin
0907ba98c4 rbd: Update return error massage
Issue: When delete pv failed, error message shows '*** Directory not empty ***'

the actual failed reason is 'access denied'

This commit ensures ceph-csi return right error massage.

Signed-off-by: ecosysbin <14729934+ecosysbin@user.noreply.gitee.com>
2025-02-19 15:23:21 +00:00
Rakshith R
b05d467679 rbd: fix bug in rbdVol.Exists() in PVC-PVC clone case
This commit fixes a bug in rbdVol.Exists() which caused
VolID not to be set in PVC-PVC clone case.

Signed-off-by: Rakshith R <rar@redhat.com>
2025-02-18 13:05:28 +00:00
Yite Gu
7595e20969 rbd: support QoS based on capacity for rbd volume
1. QoS provides settings for rbd volume read/write iops
   and read/write bandwidth.
2. All QoS parameters are placed in the SC,
   send QoS parameters from SC to Cephcsi through PVC create request.
3. We need provide QoS parameters in the SC as below:
   - BaseReadIops
   - BaseWriteIops
   - BaseReadBytesPerSecond
   - BaseWriteBytesPerSecond
   - ReadIopsPerGB
   - WriteIopsPerGB
   - ReadBpsPerGB
   - WriteBpsPerGB
   - BaseVolSizeBytes
   There are 4 base qos parameters among them, when users apply for
   a volume capacity equal to or less than BaseVolSizebytes, use base
   qos limit. For the portion of capacity exceeding BaseVolSizebytes,
   QoS will be increased in steps set per GB. If the step size parameter
   per GB is not provided, only base QoS limit will be used and not associated
   with capacity size.
4. If PVC has resize request, adjust the QoS limit
   according to the QoS parameters after resizing.

Signed-off-by: Yite Gu <guyite@bytedance.com>
2025-02-17 18:25:33 +00:00
Praveen M
e4d41c42d6 rbd: get volumegroup in secondary cluster
Currently, `GetVolumeGroup()` fetches the RBD group from the
pool using the clusterID & poolID encoded in the VolumeGroupHandle.
However, this approach may fail in a secondary mirrored cluster,
where the clusterID & poolID could differ.

This commit ensures that `GetVolumeGroup` leverages the
clusterIDMapping and RBDPoolIDMapping to locate the RBD group in the
appropriate  pool if it is not found in the pool corresponding
to the poolID encoded in the VolumeGroupHandle.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-02-17 13:33:21 +00:00
Praveen M
cbd73f296d cleanup: move ShouldRetryVolumeGeneration from internal/rbd to internal/util
Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-02-17 13:33:21 +00:00
Praveen M
6414e94401 cleanup: move ErrImageNotFound from rbd/errors to util/errors
Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-02-17 13:33:21 +00:00
Niels de Vos
b1834552c1 cleanup: drop deprecated Rbd prefix from go-ceph rbd.ImageOption*
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-01-30 13:27:28 +00:00
Niels de Vos
c905dd863c rbd: format log message correctly
When a `dataPool` is passed while creating a volume, there is a
`%!s(MISSING)` piece added to a debug log message. When concatinating
strings, the `%s` formatter is not needed.

Updates: #5103
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-01-30 13:27:28 +00:00
Praveen M
a3457da727 rbd: controller to regenerate volume group omap data
This commit adds new controller that watches for the
VolumeGroupReplicationContent and regenerates the OMAP data if
it doesn't exists.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-28 17:19:32 +00:00
Praveen M
f83a9f7eb8 rbd: add RegenerateVolumeGroupJournal method for Manager interface
This commit adds `RegenerateVolumeGroupJournal` to Manager
interface. RegenerateVolumeGroupJournal regenerate the omap
data for the volume group.

This performs the following operations:
  - extracts clusterID and Mons from the cluster mapping
  - Retrieves pool and journalPool parameters from the VolumeGroupReplicationClass
  - Reserves omap data
  - Add volumeIDs mapping to the reserved volume group omap object
  - Generate new volume group handle

Returns the generated volume group handler.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-28 17:19:32 +00:00
Praveen M
df4d2eb915 journal: pass groupUUID to be used for omap name reserve
This commit adds groupUUID param for `ReserveName` to be used for
OMAP name reserve instead of auto-generating.
This is useful for mirroring and metro-DR ensuring that mirrored
resources have consistent OMAP names across mirrored clusters.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-28 17:19:32 +00:00
Praveen M
ce767fe891 rbd: rename volumeNamePrefix to volumeGroupNamePrefix
Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-28 17:19:32 +00:00
Niels de Vos
ecd15970de cleanup: rename csiID to driverInstance
The attribute and variable `csiID` ise used for at least two different
things:

 - name of the driver instance, used for journalling metadata
 - objects of the CSIIdentifier struct, composing a volume-handle

By changing the name of the `csiID` attribute for driver instances to
`driverInstance`, any confusion should be prevented.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-01-28 10:19:58 +00:00
Niels de Vos
af0a223edb csiaddons: use rbd.Manager within ReclaimSpaceControllerServer
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-01-28 10:19:58 +00:00
Niels de Vos
6560eee3d8 csiaddons: use rbd.Manager for encryption key rotation
Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-01-28 10:19:58 +00:00
Niels de Vos
2dd235849e rbd: add sub-types for large Volume type
Introduce `snapshottableVolume` and `csiAddonsVolume` types which group
related functions together.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-01-28 10:19:58 +00:00
Niraj Yadav
c308e096da rbd: Use assume_storage_prezeroed when formatting
Instead of passing lazy_itable_init=1 and lazy_journal_init=1 to
mkfs.ext4, pass assume_storage_prezeroed=1 which is
stronger and allows the filesystem to skip inode table zeroing
completely instead of simply doing it lazily.

The support for this flag is checked by trying to format a fake
temporary image with mkfs.ext4 and checking its STDERR.

Closes: #4948
Signed-off-by: Niraj Yadav <niryadav@redhat.com>
2025-01-24 11:58:33 +00:00
Praveen M
8a66575825 rbd: use correct radosnamespace
Issue: When an RBD image is created in a non-default namespace,
the OMAP data for the PersistentVolume fails to regenerate
because it still attempts to locate the RBD image in the default
namespace.

This commit ensures the correct radosNamespace is retrieved from
the ceph-csi-config.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-21 16:12:23 +00:00
Praveen M
0cfb2b012b rbd: correct default encryption type
Problem: When the encryptionType is not specified in the StorageClass,
the default type (block) is used and stored in OMAP. However, during
OMAP regeneration in a secondary cluster, the default type is incorrectly
set to none. This discrepancy leads to errors during PVC cloning,
with the message: `cannot create encrypted volume from unencrypted volume.`

Solution: Update the default encryption type to consistently use
block instead of none.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-17 11:07:26 +00:00
Niels de Vos
e89fe5ad1f rbd: add context in reported errors by GetVolumeReplicationInfo
Logged errors are much more helpful when there is some context around
the message about what went wrong.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
2025-01-15 08:36:39 +00:00
yati1998
4101b63e02 rbd: add check to getVolumeReplicationInfo
this commit adds a check to getVolumeReplicationInfo
to include status not found error while getting the
remote status.
This helps the failover to be done even if remote site status
is not found

Signed-off-by: yati1998 <ypadia@redhat.com>
2025-01-14 17:25:10 +00:00
Praveen M
eebfd15e78 rbd: rename groupNamePrefix to volumeGroupNamePrefix
CephFS uses the parameter `volumeGroupNamePrefix` for creating VolumeGroups.
This commit renames `groupNamePrefix` to `volumeGroupNamePrefix` for RBD
VolumeGroup creation to ensure consistent naming.

Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-09 11:59:16 +00:00
Niraj Yadav
e96404b297 cephfs: use userid and keys for provisioning
This patch modifies the code to use userID and
userKey for provisioning of both static and dynamic
PVs.

In case user credentials are not found admin credentials
are used as a fallback and for backwards compatibility.

Signed-off-by: Niraj Yadav <niryadav@redhat.com>
2025-01-08 13:48:36 +00:00
Praveen M
54a8b50957 ci: non-constant format string (govet)
Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-08 11:56:24 +00:00
Praveen M
96408c01c8 ci: address return value is not checked (errcheck)
Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-08 11:56:24 +00:00
Praveen M
d46029ca1f ci: address arguments have the wrong order (staticcheck)
Signed-off-by: Praveen M <m.praveen@ibm.com>
2025-01-08 11:56:24 +00:00