For NodeUnstageVolume its a two step process,
first unmount the volume and than unmap the volume.
Currently, we are logging only after rbd unmapping is done.
sometimes it becomes difficult to debug with above logging
whether more time is spent in unmount or unmap.
This commits adds one more debug log after unmount is done.
with this we can identify where exactly more time is spent
by looking at the logs.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It seems that writing more than 1 GiB per WriteSame() operation causes
an EINVAL (22) "Invalid argument" error. Splitting the writes in blocks
of maximum 1 GiB should prevent that from happening.
Not all volumes are of a size that is the multiple of the stripe-size.
WriteSame() needs to write full blocks of data, so in case there is a
small left-over, it will be filled with WriteAt().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Introduce initKMS() as a function of rbdVolume. KMS functionality does
not need to pollute general RBD image functions. Encryption functions
are now in internal/rbd.encryption.go, so move initKMS() there as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When and RBD image is expanded, the additional extents need to get
allocated when the image was thick-provisioned.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When images get resized/expanded, the additional space needs to be
allocated if the image was initially thick-provisioned. By marking the
image with a "thick-provisioned" key in the metadata, future operations
can check the need.
A missing "thick-provisioned" key indicates that the image has not been
thick-provisioned.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Write blocks of stripe-size to allocate RBD images when
Thick-Provisioning is enabled in the StorageClass.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add an option to the StorageClass to support creating fully allocated
(thick provisioned) RBD images
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When a volume was provisioned by an old Ceph-CSI provisioner, the
metadata of the RBD image will contain `requiresEncryption` to indicate
a passphrase needs to be created. New Ceph-CSI provisioners create the
passphrase in the CreateVolume request, and set `encryptionPrepared`
instead.
When a new node-plugin detects that `requiresEncryption` is set in the
RBD image metadata, it will fallback to the old behaviour.
In case `encryptionPrepared` is read from the RBD image metadata, the
passphrase is used to cryptsetup/format the image.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This adds internal/rbd/encryption.go which will be used to include other
encryption functionality to support additional KMS related functions. It
will work together with the shared API from internal/util/kms.go.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Have the provisioner create the passphrase for the volume, instead of
doign it lazily at the time the volume is used for the 1st time. This
prevents potential races where pods on different nodes try to store
different passphrases at the (almost) same time.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It seems that calls to addRbdManagerTask() do not include the
rados-namespace in the image location. Functions calling
addRbdManagerTask() construct the image location themselves, but should
use rbdVolume.String() to include all the attributes.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Make rbdplugin pod work in a non-initial network namespace (i.e. with
"hostNetwork: false") by skipping waiting for udev events when mapping
and unmapping images. CSI use case is very simple: all that is needed
is a device node which is immediately fed to mkfs, so we should be able
to tolerate udev not being finished with the device just fine.
Fixes: #1323
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
With the new support for passing --options, referring to ExecCommand()
argument slices as mapOptions and options is confusing.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
mount packges is moved from
k8s.io/utils/mount to a new repository
k8s.io/mount-utils. updated code to use
the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add a new method to the EncryptionKMS interface so that resources can be
freed when EncryptionKMS instances get freed.
With the move to using the libopenstorage API, a temporary file needs to
store the optional CA certificate. The Destroy() method of the
vaultConnection type now removes this file.
The rbdVolume uses the EncryptionKMS type now, so call the new Destroy()
method from withing rbdVolume.Destroy().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The Owner of an RBD image (Kubernetes Namespace, tenant) can be used to
identify additional configuration options. This will be used for
fetching the right Vault Token when encrypting/decrypting volumes.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
If cephcsi encounters any error after
reservation, as a cleanup operation
it should revert back the reservation.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the omap data already exits return nil.
so that omap generator will not try to reserve
anything again.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
earlier if the depth check fails the
complete vol struct was getting logged,
this commits logs only the pool and image
name.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In the case of Disaster Recovery failover, the
user expected to create the static PVC's. We have
planned not to go with the PVC name and namespace
for many reasons (as in kubernetes it's planned to
support PVC transfer to a new namespace with a
different name and with new features coming in
like data populator etc). For now, we are
planning to go with static PVC's to support
async mirroring.
During Async mirroring only the RBD images are
mirrored to the secondary site, and when the
user creates the static PVC's on the failover
we need to regenerate the omap data. The
volumeHandler in PV spec is an encoded string
which contains clusterID and poolID and image UUID,
The clusterID and poolID won't remain same on both
the clusters, for that cephcsi need to generate the
new volume handler and its to create a mapping
between new volume handler and old volume handler
with that whenever cephcsi gets csi requests it
check if the mapping exists it will pull the new
volume handler and continues other operations.
The new controller watches for the PVs created,
It checks if the omap exists if it doesn't it
will regenerate the entire omap data.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
in case of mirrored image, if the image is
primary a watcher will be added by the rbd
mirror deamon on the rbd image.
we have to consider 2 watcher to check image
is in use.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase of async mirroring the volume UUID is
retrieved from the volume name, instead of cephcsi
generating a new UUID it should reserve the passed
UUID it will be useful when we support both metro DR
and async mirroring on a kubernetes clusters.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
`cap` builtin function returns the capacity of a type. Its not
good practice to use this builtin function for other variable
names, removing it here
Ref# https://golang.org/pkg/builtin/#cap
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
volJournal.Connect() got the error on err2 variable, however
the return was on variable err which hold the error return of
DecomposeCSIID() which is wrong. This cause the error return wrongly
parsed and pushed from the caller. From now on, we are reusing the
err variable to hold and revert the error of volJournal.Connect().
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
All the previous condition checks exit from the function and
when it reach to this block its obvious that error is non nil,
we dont need an extra check here.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
The error check condition in genVolFromID() is always false as far as
code reading/workflow goes. Removing it for the same reason.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
cephcsi uses cli for fetching snap list as well as to check the
snapshot namespace, replaced that with go-ceph calls.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
It doesnot make sense to allow the creation of empty
volumes which is going to be accessed with readonly mode,
this commit allows the creation of volume which is having
readonly capabilities only if the content source is set
for the volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Form kubernetes v1.19 onwards NodeRequest is getting volume path
in StagingTargetPath instead of VolumePath, cephcsi should also
use the same.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
updateVolWithImageInfo() is currently executing an rbd command
to get details about an RBD image. Replaced it with the
required go-ceph functions.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
in case of clone failure, we need to first delete
the clone and the snapshot from which we created
the clone, then as part of cleanup we need to remove
the temporary cloned image and the temporary snapshot
created on the parent image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
At CSI spec < 1.2.0, there was no volumecapability in the
expand request. However its available from v1.2+ which allows
us to declare the node operations based on the volume mode.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
if the image is created without flattening image-feature
the image will get few image-features by default, deep-flatten
is one of them. if the image doesnot have any parent
the rbd image flattening will fail, This commit discards
error message if the image doesnt have any parent.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
merging of https://github.com/ceph/ceph-csi/pull/1035
broken the cephcsi building. This commits fixes
the build issue.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Make sure to operate within the namespace if any given
when dealing with rbd images and snapshots and their journals.
Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>
if we are not able to fetch the cluster-ID from
the createSnapshot request and also if we are
not able to get the monitor information from
the cluster-ID return error instead of using
the parent image information.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Replaced command execution with go-ceph Resize() function.
Volsize is being updated before waiting for resize() to return,
fixed it to get updated only after resize() is successful.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
While adding the context.Context to the resizeRBDimage() function, it
became a little ugly. So renaming the function to resize() and making it
a method of the rbdVolume type.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Most consumers of util.ExecCommand() need to convert the returned []byte
format of stdout and/or stderr to string. By having util.ExecCommand()
return strings instead, the code gets a little simpler.
A few commands return JSON that needs to be parsed. These commands will
be replaced by go-ceph implementations later on. For now, convert the
strings back to []byte when needed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
All calls to util.ExecCommand() now pass the context.Context. In some
cases this is not possible or needed, and util.ExecCommand() will not
log the command.
This should make debugging easier when command executions fail.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Current versions of the mkfs.xfs binary enable reflink support by
default. This causes problems on systems where the kernel does not
support this feature. When the kernel the feature does not support, but
the filesystem has it enabled, the following error is logged in `dmesg`:
XFS: Superblock has unknown read-only compatible features (0x4) enabled
Introduce a check to see if mkfs.xfs supports the `-m reflink=` option.
In case it does, pass `-m reflink=0` while creating the filesystem.
The check is executed once during the first XFS filesystem creation. The
result of the check is cached until the nodeserver restarts.
Fixes: #966
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new rbdVolume.isInUse() method will replace the rbdStatus()
function. This removes one more rbd command execution in the
DeleteVolume path.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This change replaces the sentinel errors in rbd module with
standard errors created with errors.New().
Related: #1203
Signed-off-by: Sven Anderson <sven@redhat.com>
The sentinel error code had additional fields in the errors, that are
used nowhere. This leads to unneccesarily complicated code. This
change replaces the sentinel errors in utils with standard errors
created with errors.New() and adds a simple JoinErrors() function to
be able to combine sentinel errors from different code tiers.
Related: #1203
Signed-off-by: Sven Anderson <sven@redhat.com>
Take operation locks on the resources before operating
on the resouces. This allows us to do parallel operations
for some RPC calls such as Clone and Restore of PVC.
This operations will only be blocked if the image is
expanding or Snapshot and RBD image is getting deleted.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Direct usage of numbers should be avoided.
Issue reported:
mnd: Magic number: X, in <argument> detected (gomnd)
Signed-off-by: Yug <yuggupta27@gmail.com>
Add explanation to nolint directives.
Issue reported:
whyNoLint: include an explanation for nolint directive (gocritic)
Signed-off-by: Yug <yuggupta27@gmail.com>
If the requested volume size and the snapshot or the
parent volume from which the clone is to be created
is not equal cephcsi returns an error message.
updates #1188
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>