When running on Docker Swarm, the RBD-healer fails with an error like:
> healer had failures, err failed to get cluster config: unable to load
> in-cluster configuration, KUBERNETES_SERVICE_HOST and
> KUBERNETES_SERVICE_PORT must be defined
Before starting the healer, check if we're running on Kubernetes, so
that non-Kubernetes platforms do not get confusing warnings.
Updates: #3769
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Running cephcsi in docker swarm currently requires serving both
the nodeserver and controllerserver over the same socket.
This leads to errors like
> FATAL: [core] grpc: Server.RegisterService found duplicate
> service registration for \"fence.FenceController\""
...since `FenceController` is registererd once per server type.
Commit proposes simple fix by registering `FenceController` only once
when at least one of `IsControllerServer` or `IsNodeServer` is `true`.
Signed-off-by: monoamin <precision1998@gmail.com>
we should continue to cleanup the volume info like the
omap data, mappings from the group if the image is not
part of the goup anymore.
Signed-off-by: Nikhil-Ladha <nikhilladha1999@gmail.com>
The `GetID()` and `GetName()` functions can be confusing, as names and
ID's are not always distinctive enough. The name is used to reference an
object that exists in a pool. The ID the CSI-handle formatted and can be
used to locate the entry for the object in the journal.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
When a VolumeGroup has been created through the CSI-Addons API, the
VolumeGroupSnapshot CSI API will now use the existing VolumeGroup. There
are checks in place to validate that the Volumes in the VolumeGroup
match the Volumes in the VolumeGroupSnapshot request.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
CompareVolumesInGroup() verifies that all the volumes are part of the
given VolumeGroup. It does so by obtaining the VolumeGroupID for each
volume with GetVolumeGroupByID().
The helper VolumesInSameGroup() verifies that all volumes belong to the
same (or no) VolumeGroup. It can be called by CSI(-Addons) procedures
before acting on a VolumeGroup.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
The Manager.MakeVolumeGroupID() function can be used to build a CSI
VolumeGroupID from the backend (pool and name of the RBD-group). This
will be used when checking if an RBD-image belongs to a group already.
It is also possible to resolve the VolumeGroup by passing the
VolumeGroupID to the existing Manager.GetVolumeGroupByID() function.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
The `prefix` is passed to several functions, but it can easily be
obtained with a small helper function. This makes calling the functions
a little simpler.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
When an incorrect volumeID is passed while creating a VolumeGroup, the
`.Destroy()` function caused a panic. By appending each volume to the
volumes slice, the slice won't contain any `nil` volumes anymore.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
VolumeGroup interface has more than 10 methods and it causes
golangci-lint to fail. Moving the `Destroy()` method to a base
interface journalledObject.
Signed-off-by: Praveen M <m.praveen@ibm.com>
The old syntax is almost deprecated,and there are reasons to upgrade it
- old syntax is lack of fsid(critical for debugging and observability)
- mds_namespace is deprecated, it might be inappropriate to continue using it
- kernel will try new syntax first and then the old one, it's a waste
Signed-off-by: mageekchiu <qiukang@mail.ustc.edu.cn>
Currently, Ceph-CSI deletes intermediate RBD snapshot on
temporary cloned images (`csi-vol-xxxx-temp@csi-vol-xxxx`)
which is the parent of the final clone image.
The parent-child mirroring requires both the parent and child
images to be present (i.e, not in trash).
This commit makes enhancement to `createRBDClone` function by
introducing `deleteSnap` parameter. If `deleteSnap` is true,
the snapshot is deleted after the clone is created.
This is required to support mirroring of child image with its
parent image.
Signed-off-by: Praveen M <m.praveen@ibm.com>
Currently, while preparing a volume for snapshot,
the depthToAvoidFlatten is set to 2. This accounts one
for snapshot and another since parent of the volume is
flattened.
This commit modifies the depth to 3 to also account for
future PVC restore since
- snapshot alone is useless and it is very likely to be restore
at one point in time.
- this ensures snapshot is not flattened when restore does occur.
- flattening of snapshot in the above case will make the snapshot
no longer eligible for changed block tracking(snap diff)
operation.
- maintain similarity with PVC-PVC clone operation which currently
depthToAvoidFlatten set to 1.
Signed-off-by: Rakshith R <rar@redhat.com>
The inverse checking and returning of is-a-mounted-path makes it
difficult to understand the function. It is easier to follow the code
when the function just returns what it says it does, hence added the
comment for the function too.
Some errors were returned directly, others were converted to gRPC
errors. This has been corrected now too, and the caller converts the
plain error to a gRPC error now.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
`IsLikelyNotMountPoint()` is an optimized version for `IsMountPoint()`
which can not detect all type of mounts (anymore). The slower
`IsMountPoint()` is more safe to use. This can cause a slight
performance regression in the case there are many mountpoints on the
system, but correctness is more important than speed while mounting.
Fixes: #4633
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit modifies listSnapAndChildren() to make use of
ListChildrenAttributes() instead of ListChildren() which
allows us to filter out images in trash.
This commit also order the alive images so that temp clone
images are followed by images backing volume snapshots so
that temp clone images are flattened first.
Signed-off-by: Rakshith R <rar@redhat.com>
According to the error scheme documented in the CSI specification, the
Aborted error code should be initiate retries, whereas the Internal
error code does not require this behaviour.
When an RBD-image is still in-use, it can not be removed. The
DeleteVolume procedure should be retried and will succeed once the
RBD-image is not in-use anymore.
Fixes: #5166
Signed-off-by: Niels de Vos <ndevos@ibm.com>
After an unfortnate timed restart of the provisioner, a volume that got
cloned did not get a `rbdVolume.VolID` set. The `.VolID` is used as the
CSI Volume Handle, and is a required attribute.
The `rbdVolume` and `rbdSnapshot` structs have a `.ToCSI()` function
that can do the validation of required attributes. This is now added,
including unit-tests.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
`doSnapshotClone()` returns a new `rbdVolume` object from a temporary
snapshot. This conversion drops the `SourceVolumeID` attribute, as a
`rbdVolume` does not have that.
After converting the `rbdVolume` back to a `rbdSnapshot`, the
`SourceVolumeID` attribute can be set again, and the `ToCSI()` function
can create an appropriate CSI Snapshot struct.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Issue: When delete pv failed, error message shows '*** Directory not empty ***'
the actual failed reason is 'access denied'
This commit ensures ceph-csi return right error massage.
Signed-off-by: ecosysbin <14729934+ecosysbin@user.noreply.gitee.com>
1. QoS provides settings for rbd volume read/write iops
and read/write bandwidth.
2. All QoS parameters are placed in the SC,
send QoS parameters from SC to Cephcsi through PVC create request.
3. We need provide QoS parameters in the SC as below:
- BaseReadIops
- BaseWriteIops
- BaseReadBytesPerSecond
- BaseWriteBytesPerSecond
- ReadIopsPerGB
- WriteIopsPerGB
- ReadBpsPerGB
- WriteBpsPerGB
- BaseVolSizeBytes
There are 4 base qos parameters among them, when users apply for
a volume capacity equal to or less than BaseVolSizebytes, use base
qos limit. For the portion of capacity exceeding BaseVolSizebytes,
QoS will be increased in steps set per GB. If the step size parameter
per GB is not provided, only base QoS limit will be used and not associated
with capacity size.
4. If PVC has resize request, adjust the QoS limit
according to the QoS parameters after resizing.
Signed-off-by: Yite Gu <guyite@bytedance.com>
Currently, `GetVolumeGroup()` fetches the RBD group from the
pool using the clusterID & poolID encoded in the VolumeGroupHandle.
However, this approach may fail in a secondary mirrored cluster,
where the clusterID & poolID could differ.
This commit ensures that `GetVolumeGroup` leverages the
clusterIDMapping and RBDPoolIDMapping to locate the RBD group in the
appropriate pool if it is not found in the pool corresponding
to the poolID encoded in the VolumeGroupHandle.
Signed-off-by: Praveen M <m.praveen@ibm.com>
When a `dataPool` is passed while creating a volume, there is a
`%!s(MISSING)` piece added to a debug log message. When concatinating
strings, the `%s` formatter is not needed.
Updates: #5103
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit adds new controller that watches for the
VolumeGroupReplicationContent and regenerates the OMAP data if
it doesn't exists.
Signed-off-by: Praveen M <m.praveen@ibm.com>
This commit adds `RegenerateVolumeGroupJournal` to Manager
interface. RegenerateVolumeGroupJournal regenerate the omap
data for the volume group.
This performs the following operations:
- extracts clusterID and Mons from the cluster mapping
- Retrieves pool and journalPool parameters from the VolumeGroupReplicationClass
- Reserves omap data
- Add volumeIDs mapping to the reserved volume group omap object
- Generate new volume group handle
Returns the generated volume group handler.
Signed-off-by: Praveen M <m.praveen@ibm.com>
This commit adds groupUUID param for `ReserveName` to be used for
OMAP name reserve instead of auto-generating.
This is useful for mirroring and metro-DR ensuring that mirrored
resources have consistent OMAP names across mirrored clusters.
Signed-off-by: Praveen M <m.praveen@ibm.com>
The attribute and variable `csiID` ise used for at least two different
things:
- name of the driver instance, used for journalling metadata
- objects of the CSIIdentifier struct, composing a volume-handle
By changing the name of the `csiID` attribute for driver instances to
`driverInstance`, any confusion should be prevented.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Instead of passing lazy_itable_init=1 and lazy_journal_init=1 to
mkfs.ext4, pass assume_storage_prezeroed=1 which is
stronger and allows the filesystem to skip inode table zeroing
completely instead of simply doing it lazily.
The support for this flag is checked by trying to format a fake
temporary image with mkfs.ext4 and checking its STDERR.
Closes: #4948
Signed-off-by: Niraj Yadav <niryadav@redhat.com>
Issue: When an RBD image is created in a non-default namespace,
the OMAP data for the PersistentVolume fails to regenerate
because it still attempts to locate the RBD image in the default
namespace.
This commit ensures the correct radosNamespace is retrieved from
the ceph-csi-config.
Signed-off-by: Praveen M <m.praveen@ibm.com>
Problem: When the encryptionType is not specified in the StorageClass,
the default type (block) is used and stored in OMAP. However, during
OMAP regeneration in a secondary cluster, the default type is incorrectly
set to none. This discrepancy leads to errors during PVC cloning,
with the message: `cannot create encrypted volume from unencrypted volume.`
Solution: Update the default encryption type to consistently use
block instead of none.
Signed-off-by: Praveen M <m.praveen@ibm.com>
this commit adds a check to getVolumeReplicationInfo
to include status not found error while getting the
remote status.
This helps the failover to be done even if remote site status
is not found
Signed-off-by: yati1998 <ypadia@redhat.com>
CephFS uses the parameter `volumeGroupNamePrefix` for creating VolumeGroups.
This commit renames `groupNamePrefix` to `volumeGroupNamePrefix` for RBD
VolumeGroup creation to ensure consistent naming.
Signed-off-by: Praveen M <m.praveen@ibm.com>
This patch modifies the code to use userID and
userKey for provisioning of both static and dynamic
PVs.
In case user credentials are not found admin credentials
are used as a fallback and for backwards compatibility.
Signed-off-by: Niraj Yadav <niryadav@redhat.com>