For NodeUnstageVolume its a two step process,
first unmount the volume and than unmap the volume.
Currently, we are logging only after rbd unmapping is done.
sometimes it becomes difficult to debug with above logging
whether more time is spent in unmount or unmap.
This commits adds one more debug log after unmount is done.
with this we can identify where exactly more time is spent
by looking at the logs.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It seems that writing more than 1 GiB per WriteSame() operation causes
an EINVAL (22) "Invalid argument" error. Splitting the writes in blocks
of maximum 1 GiB should prevent that from happening.
Not all volumes are of a size that is the multiple of the stripe-size.
WriteSame() needs to write full blocks of data, so in case there is a
small left-over, it will be filled with WriteAt().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Introduce initKMS() as a function of rbdVolume. KMS functionality does
not need to pollute general RBD image functions. Encryption functions
are now in internal/rbd.encryption.go, so move initKMS() there as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When and RBD image is expanded, the additional extents need to get
allocated when the image was thick-provisioned.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When images get resized/expanded, the additional space needs to be
allocated if the image was initially thick-provisioned. By marking the
image with a "thick-provisioned" key in the metadata, future operations
can check the need.
A missing "thick-provisioned" key indicates that the image has not been
thick-provisioned.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Write blocks of stripe-size to allocate RBD images when
Thick-Provisioning is enabled in the StorageClass.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add an option to the StorageClass to support creating fully allocated
(thick provisioned) RBD images
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When a volume was provisioned by an old Ceph-CSI provisioner, the
metadata of the RBD image will contain `requiresEncryption` to indicate
a passphrase needs to be created. New Ceph-CSI provisioners create the
passphrase in the CreateVolume request, and set `encryptionPrepared`
instead.
When a new node-plugin detects that `requiresEncryption` is set in the
RBD image metadata, it will fallback to the old behaviour.
In case `encryptionPrepared` is read from the RBD image metadata, the
passphrase is used to cryptsetup/format the image.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This adds internal/rbd/encryption.go which will be used to include other
encryption functionality to support additional KMS related functions. It
will work together with the shared API from internal/util/kms.go.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Have the provisioner create the passphrase for the volume, instead of
doign it lazily at the time the volume is used for the 1st time. This
prevents potential races where pods on different nodes try to store
different passphrases at the (almost) same time.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It seems that calls to addRbdManagerTask() do not include the
rados-namespace in the image location. Functions calling
addRbdManagerTask() construct the image location themselves, but should
use rbdVolume.String() to include all the attributes.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Make rbdplugin pod work in a non-initial network namespace (i.e. with
"hostNetwork: false") by skipping waiting for udev events when mapping
and unmapping images. CSI use case is very simple: all that is needed
is a device node which is immediately fed to mkfs, so we should be able
to tolerate udev not being finished with the device just fine.
Fixes: #1323
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
With the new support for passing --options, referring to ExecCommand()
argument slices as mapOptions and options is confusing.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
mount packges is moved from
k8s.io/utils/mount to a new repository
k8s.io/mount-utils. updated code to use
the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add a new method to the EncryptionKMS interface so that resources can be
freed when EncryptionKMS instances get freed.
With the move to using the libopenstorage API, a temporary file needs to
store the optional CA certificate. The Destroy() method of the
vaultConnection type now removes this file.
The rbdVolume uses the EncryptionKMS type now, so call the new Destroy()
method from withing rbdVolume.Destroy().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The Owner of an RBD image (Kubernetes Namespace, tenant) can be used to
identify additional configuration options. This will be used for
fetching the right Vault Token when encrypting/decrypting volumes.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
If cephcsi encounters any error after
reservation, as a cleanup operation
it should revert back the reservation.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the omap data already exits return nil.
so that omap generator will not try to reserve
anything again.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
earlier if the depth check fails the
complete vol struct was getting logged,
this commits logs only the pool and image
name.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In the case of Disaster Recovery failover, the
user expected to create the static PVC's. We have
planned not to go with the PVC name and namespace
for many reasons (as in kubernetes it's planned to
support PVC transfer to a new namespace with a
different name and with new features coming in
like data populator etc). For now, we are
planning to go with static PVC's to support
async mirroring.
During Async mirroring only the RBD images are
mirrored to the secondary site, and when the
user creates the static PVC's on the failover
we need to regenerate the omap data. The
volumeHandler in PV spec is an encoded string
which contains clusterID and poolID and image UUID,
The clusterID and poolID won't remain same on both
the clusters, for that cephcsi need to generate the
new volume handler and its to create a mapping
between new volume handler and old volume handler
with that whenever cephcsi gets csi requests it
check if the mapping exists it will pull the new
volume handler and continues other operations.
The new controller watches for the PVs created,
It checks if the omap exists if it doesn't it
will regenerate the entire omap data.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
in case of mirrored image, if the image is
primary a watcher will be added by the rbd
mirror deamon on the rbd image.
we have to consider 2 watcher to check image
is in use.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase of async mirroring the volume UUID is
retrieved from the volume name, instead of cephcsi
generating a new UUID it should reserve the passed
UUID it will be useful when we support both metro DR
and async mirroring on a kubernetes clusters.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>