The Volume interface will make it easier to work with the rbdImage
struct, as the functions are cleaner defined. This benefits work that is
needed for VolumeGroups and other CSI-Addons procedures.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
In the future we'll introduce a more standard interface for objects like
Volumes and Snapshots. It is useful to have the context passed as 1st
argument to all functions of those objects, including their Destroy()
function.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit adds support for flattenMode option
for replication.
If the flattenMode is set to "force" in
volumereplicationclass parameters, cephcsi will
add a task to flatten the image if it has parent.
This enable cephcsi to then mirror such images after
flattening them.
The error message when the image's parent is
in trash or unmirrored is improved as well.
Signed-off-by: Rakshith R <rar@redhat.com>
ensure a clean and isolated environment for testing purposes.
Signed-off-by: Mayank Pal <mayankpal9654@gmail.com>
ci: Use temporary directory for unit tests
remove err = os.Mkdir('/etc/ceph-csi-config', 0o600)
Signed-off-by: Mayank Pal <mayankpal9654@gmail.com>
ci: Use temporary directory for unit tests
remove err = os.Mkdir('/etc/ceph-csi-config', 0o600)
Signed-off-by: Mayank Pal <mayankpal9654@gmail.com>
ci: Use temporary directory for unit tests
remove if err
Signed-off-by: Mayank Pal <mayankpal9654@gmail.com>
golangci-lint reports these:
The copy of the 'for' variable "kmsID" can be deleted (Go 1.22+)
(copyloopvar)
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit modifies a test case to check creation of
PVC-PVC clone of a restored PVC when parent snapshot
is deleted.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds ParentInTrash parameter in rbdImage struct
and makes use of it in getParent() function in order to avoid
error in case the parent is present but in trash.
Signed-off-by: Rakshith R <rar@redhat.com>
Currently we are assuming that only one
rbd mirror daemon running on the ceph cluster
but that is not true for many cases and it
can be more that one, this PR make this as a
configurable parameter.
fixes: #4312
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit replaces the user implemented function
`CheckSliceContains()` with `slices.Contains()`
function introduced in Go 1.21.
Signed-off-by: Praveen M <m.praveen@ibm.com>
This commit removes the `topologyConstrainedPools` parameter
from PV volumeAttributes as it is not required.
Signed-off-by: Praveen M <m.praveen@ibm.com>
Everytime a connection is copied with the .Copy() function, it needs to
be destroyed once the object is not needed anymore. This was not done
consistently, a few more locations require the freeing of the connection
resources.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
By returning a connected rbdVolume in parseVolCreateRequest(), the
CreateVolume() function can be simplified a little. There is no need to
call the additional Connect() and detect failures with it.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Not all snapshot objects are free'd correctly after they were allocated.
It is possible that some connections to the Ceph cluster were never
closed. This does not need to be a noticeable problem, as connections
are re-used where possible, but it isn't clean either.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Just like GenVolFromVolID() the genSnapFromSnapID() function can return
a snapshot. There is no need to allocated an empty snapshot and pass
that to the genSnapFromSnapID() function.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
if rbd storage class is created with topologyconstraintspools
replicated pool was still mandatory, making the pool optional if the
topologyconstraintspools is requested
Closes: https://github.com/ceph/ceph-csi/issues/4380
Signed-off-by: parth-gr <partharora1010@gmail.com>
The only encoding version that exists is `1`. There is no need to have
multiple constants for that version across different packages. Because
there is only one version, `GenerateVolID()` does not really require it,
and it can use a default version.
If there is a need in the future to support an other encoding version,
this can be revisited with a cleaner solution.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
The `rbdGetDeviceList()` function uses two very similar types for
converting krbd and NBD device information from JSON. There is no need
to use this distinction, and callers of `rbdGetDeviceList()` should not
need to care about it either.
By introducing a `deviceInfo` interface with Get-functions, the
`rbdGetDeviceList()` function becomes a little simpler, with a clearly
defined API for the returned list.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit makes use of crush location labels from node
labels to supply `crush_location` and `read_from_replica=localize`
options during mount. Using these options, cephfs
will be able to redirect reads to the closest OSD,
improving performance.
Signed-off-by: Praveen M <m.praveen@ibm.com>
Implemented the capability to include read affinity options
for individual clusters within the ceph-csi-config ConfigMap.
This allows users to configure the crush location for each
cluster separately. The read affinity options specified in
the ConfigMap will supersede those provided via command line arguments.
Signed-off-by: Praveen M <m.praveen@ibm.com>
This PR updates the snapshot RbdImageName in
`createSnapshot` method. This resolves the
incorrect statement logged during snapshot creation.
Signed-off-by: Praveen M <m.praveen@ibm.com>
This commit updates the snapshot RbdImageName with the clone
RbdImageName before snapshot creation. This will fix the
incorrect log statement.
Signed-off-by: Praveen M <m.praveen@ibm.com>
During the Demote volume store
the image creation timestamp.
During Resync do below operation
* Check image creation timestamp
stored during Demote operation
and current creation timestamp during Resync
and check both are equal and its for
force resync then issue resync
* If the image on both sides is
not in unknown state, check
last_snapshot_timestamp on the
local mirror description, if its present
send volumeReady as false or else return
error message.
If both the images are in up+unknown the
send volumeReady as true.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit makes sure sparsify() is not run when rbd
image is in use.
Running rbd sparsify with workload doing io and too
frequently is not desirable.
When a image is in use fstrim is run and sparsify will
be run only when image is not mapped.
Signed-off-by: Rakshith R <rar@redhat.com>
When a volume has AccessType=Block and is encrypted with LUKS, a resize
of the filesystem on the (decrypted) block-device is attempted. This
should not be done, as the application that requested the Block volume
is the only authoritive reader/writer of the data.
In particular VirtualMachines that use RBD volumes as a disk, usually
have a partition table on the disk, instead of only a single filesystem.
The `resizefs` command will not be able to resize the filesystem on the
block-device, as it is a partition table.
When `resizefs` fails during NodeStageVolume, the volume is unstaged and
an error is returned.
Resizing an encrypted block-device requires `cryptsetup resize` so that
the LUKS header on the RBD-image is updated with the correct size. But
there is no need to call `resizefs` in this case.
Fixes: #3945
Signed-off-by: Niels de Vos <ndevos@ibm.com>
this commit migrates the replication controller server
from internal/rbd and adds it to csi-addons.
Signed-off-by: riya-singhal31 <rsinghal@redhat.com>
this commit removes grpc import from replication.go
and replaced it with usual errors and passed gRPC
responses in csi-addons
Signed-off-by: riya-singhal31 <rsinghal@redhat.com>
Add `mkfsOptions` to the StorageClass and pass them to the `mkfs`
command while creating the filesystem on the RBD device.
Fixes: #374
Signed-off-by: Niels de Vos <ndevos@ibm.com>
Storing the default `mkfs` arguments in a map with key per filesystem
type makes this a little more modular. It prepares th code for fetching
the `mkfs` arguments from the VolumeContext.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit makes use of crush location labels from node
labels to supply `crush_location` and `read_from_replica=localize`
options during rbd map cmd. Using these options, ceph
will be able to redirect reads to the closest OSD,
improving performance.
Signed-off-by: Rakshith R <rar@redhat.com>
The StagingTargetPath is an optional entry in
NodeExpandVolumeRequest, We cannot expect it to be
set always and at the same time cephcsi depended
on the StaingTargetPath to retrieve some metadata
information.
This commit will check all the mount ref and identifies
the stagingTargetPath by checking the image-meta.json
file exists and this is a costly operation as we need to
loop through all the mounts and check image-meta.json
in each mount but this is happens only if the
StaingTargetPath is not set in the NodeExpandVolumeRequest
fixes#3623
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
set disableInUseChecks on rbd volume struct
as it will be used later to check whether
the rbd image is allowed to mount on multiple
nodes.
fixes: #3604
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
this commit remove the protobuf dependency locking in the module
description.
Also, ptypes.TimestampProto is deprecated and this commit
make use of the timestamppb.New() for the construction.
ParseTime() function has been removed and callers adjusted to the
same.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We need to unset the metadata on the clone
and restore PVC if the parent PVC was created
when setmetadata was set to true and it was
set to false when restore and clone pvc was
created.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit adds code to setup encryption on a rbdVol
being repaired in a followup CreateVolume request.
This is fixes a bug wherein encryption metadata may not
have been set in previous request due to container restart.
Fixes: #3402
Signed-off-by: Rakshith R <rar@redhat.com>
Checking volume details for the existing volumeID
first. if details like OMAP, RBD Image, Pool doesnot
exists try to use clusterIDMapping to look for the
correct informations.
fixes: #2929
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Sometime the json unmarshal might
get success and return empty time
stamp. add a check to make sure the
time is not zero always.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
As per the csiaddon spec last sync time is
required parameter in the GetVolumeReplicationInfo
if we are failed to parse the description, return
not found error message instead of nil
which is empty response
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rados object.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If a PV is reattached to a new PVC in a different
namespace we need to update the namespace name
in the rbd image metadata.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Different places have different meaningful fallback. When parsing
from user we should default to block, when parsing stored config we
should default to invalid and handle that as an error.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
Integrate basic fscrypt functionality into RBD initialization. To
activate file encryption instead of block introduce the new
'encryptionType' storage class key.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
In preparation of fscrypt support for RBD filesystems, rename block
encryption related function to include the word 'block'. Add struct
fields and IsFileEncrypted.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
fscrypt support requires keys longer than 20 bytes. As a preparation,
make the new passphrase length configurable, but default to 20 bytes.
Signed-off-by: Marcel Lauhoff <marcel.lauhoff@suse.com>
The error message return from the GRPC
should be of GRPC error messages only
not the normal go errors. This commits
returns GRPC error if setAllMetadata
fails.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If any operations fails after the volume creation
we will cleanup the omap objects, but it is missing
if setAllMetadata fails. This commits adds the code
to cleanup the rbd image if metadata operation fails.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
CephFS does not have a concept of "free inodes", inodes get allocated
on-demand in the filesystem.
This confuses alerting managers that expect a (high) number of free
inodes, and warnings get produced if the number of free inodes is not
high enough. This causes alerts to always get reported for CephFS.
To prevent the false-positive alerts from happening, the
NodeGetVolumeStats procedure for CephFS (and CephNFS) will not contain
inodes in the reply anymore.
See-also: https://bugzilla.redhat.com/2128263
Signed-off-by: Niels de Vos <ndevos@redhat.com>
To address the problem that snapshot
schedules are triggered for volumes
that are promoted, a dummy image was
disabled/enabled for replication.
This was done as a workaround, because the
promote operation was not triggering
the schedules for the image being promoted.
The bugs related to the same have been fixed in
RBD mirroring functionality and hence the
workaround #2656 can be removed from the code base.
ceph tracker https://tracker.ceph.com/issues/53914
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit gets the description from remote status
instead of local status.
Local status doesn't have ',' due to which we get
array index out of range panic.
Fixes: #3388
Signed-off-by: Yati Padia <ypadia@redhat.com>
Co-authored-by: shyam Ranganathan <srangana@redhat.com>
This commit implements getVolumeReplicationInfo
to get the last sync time and update it in volume
replication CR.
Signed-off-by: yati1998 <ypadia@redhat.com>
If the image is mirroring enabled
and primary consider it for mapping,
if the image is mirroring enabled but
not primary yet. return error message
until the image is marked as primary.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
csi-addons server will advertise replication capability and
replication service will run with csi-addons server too.
Signed-off-by: Rakshith R <rar@redhat.com>
Current code uses an !A && !B condition incorrectly to
test A:Up and B:status for a remote peer image.
This should be !A || !B as we require both conditions to
be in the specified state (Up: true, and status Unknown).
This is corrected by this commit, and further fixes:
- check and return ready only when a remote site is
found in the status output
- check if all peer sites are ready, if multiple are found
and return ready appropriately
Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
During ResyncVolume we check if the image
is in an error state, and we resync.
After resync, the image will move to
either the `Error` or the `Resyncing` state.
And if the image is in the above two
conditions, we will return a successful
response and Ready=false so that the
consumer can wait until the volume is
ready to use. If the image is in any
other state we return an error message
to indicate the syncing is not going on.
The whole resync and image state change
depends on the rbd mirror daemon. If the
mirror daemon is not running, the image
can be in Resyncing or Unknown state.
The Ramen marks the volume replication as
secondary, and once the resync starts, it
will delete the volume replication CR as a
cleanup process.
As we dont have a check for the rbd mirror
daemon, we are returning a resync success
response and Ready=false. Due to this false
response Ramen is assuming the resync started
and deleted the volume replication CR, and
because of this, the cluster goes into a bad
state and needs manual intervention.
fixes#3289
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
IsNotMountPoint() is deprecated and Mounter.IsMountPoint() is
recommended to be used instead.
Reported-by: golangci/staticcheck
Signed-off-by: Niels de Vos <ndevos@redhat.com>
previously, it was a requirement to have attacher sidecar for CSI
drivers and there had an implementation of dummy mode of operation.
However skipAttach implementation has been stabilized and the dummy
mode of operation is going to be removed from the external-attacher.
Considering this driver work on volumeattachment objects for NBD driver
use cases, we have to implement dummy controllerpublish and unpublish
and thus keep supporting our operations even in absence of dummy mode
of operation in the sidecar.
This commit make a NOOP controller publish and unpublish for RBD driver.
CephFS driver does not require attacher and it has already been made free
from the attachment operations.
Ref# https://github.com/ceph/ceph-csi/pull/3149
Ref# https://github.com/kubernetes-csi/external-attacher/issues/226
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
After Failover of workloads to the secondary
cluster when the primary cluster is down,
RBD Image is not marked healthy, and VR
resources are not promoted to the Primary,
In VolumeReplication, the `CURRENT STATE`
remains Unknown and doesn't change to Primary.
This happens because the primary cluster went down,
and we have force promoted the image on the
secondary cluster. and the image stays in
up+stopping_replay or could be any other states.
Currently assumption was that the image will
always be `up+stopped`. But the image will be in
`up+stopped` only for planned failover and it
could be in any other state if its a forced
failover. For this reason, removing
checkHealthyPrimary from the PromoteVolume RPC call.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Recently the k8s.io/mount-utils package added more runtime dectection.
When creating a new Mounter, the detect is run every time. This is
unfortunate, as it logs a message like the following:
```
mount_linux.go:283] Detected umount with safe 'not mounted' behavior
```
This message might be useful, so it probably good to keep it.
In Ceph-CSI there are various locations where Mounter instances are
created. Moving that to the DefaultNodeServer type reduces it to a
single place. Some utility functions need to accept the additional
parameter too, so that has been modified as well.
See-also: kubernetes/kubernetes#109676
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the Ceph user is restricted to a specific namespace in the pool, it is
crucial that evey interaction with the cluster is done within that namespace.
This wasn't the case in `getCloneDepth()`.
This issue was causing snapshot creation to fail with
> Failed to check and update snapshot content: failed to take snapshot of the
> volume X: "rpc error: code = Internal desc = rbd: ret=-1, Operation not
> permitted"
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
There are regular reports that identify a non-error as the cause of
failures. The Kubernetes mount-utils package has detection for systemd
based environments, and if systemd is unavailable, the following error
is logged:
Cannot run systemd-run, assuming non-systemd OS
systemd-run output: System has not been booted with systemd as init
system (PID 1). Can't operate.
Failed to create bus connection: Host is down, failed with: exit status 1
Because of the `failed` and `exit status 1` error message, users might
assume that the mounting failed. This does not need to be the case. The
container-images that the Ceph-CSI projects provides, do not use
systemd, so the error will get logged with each mount attempt.
By using the newer MountSensitiveWithoutSystemd() function from the
mount-utils package where we can, the number of confusing logs get
reduced.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit removes the clone incase
unsetAllMetadata or copyEncryptionConfig or
expand fails for createVolumeFromSnapshot
and CreateSnapshot.
It also removes the clone in case of
any failure in createCloneFromImage.
issue: #3103
Signed-off-by: Yati Padia <ypadia@redhat.com>