Commit Graph

168 Commits

Author SHA1 Message Date
Niels de Vos
fe0f169875 rbd: write max 1gb per WriteSame() operation
It seems that writing more than 1 GiB per WriteSame() operation causes
an EINVAL (22) "Invalid argument" error. Splitting the writes in blocks
of maximum 1 GiB should prevent that from happening.

Not all volumes are of a size that is the multiple of the stripe-size.
WriteSame() needs to write full blocks of data, so in case there is a
small left-over, it will be filled with WriteAt().

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-03-11 10:57:31 +00:00
Niels de Vos
165a837bca rbd: move KMS initialization into rbdVol.initKMS()
Introduce initKMS() as a function of rbdVolume. KMS functionality does
not need to pollute general RBD image functions. Encryption functions
are now in internal/rbd.encryption.go, so move initKMS() there as well.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-24 13:16:11 +00:00
Niels de Vos
cf6dae86e9 rbd: move encryptDevice() to a method of rbdVolume
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-24 13:16:11 +00:00
Niels de Vos
fb065b0f39 rbd: move openEncryptedDevice() to a method of rbdVolume
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-24 13:16:11 +00:00
Niels de Vos
b5020657e6 rbd: add "--options notrim" when mapping a thick-provisioned image
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-19 11:55:40 +00:00
Niels de Vos
cc96bdaac3 rbd: allocate extents when expanding an image
When and RBD image is expanded, the additional extents need to get
allocated when the image was thick-provisioned.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-19 11:55:40 +00:00
Niels de Vos
294a0973bd rbd: mark images thick-provisioned in metadata
When images get resized/expanded, the additional space needs to be
allocated if the image was initially thick-provisioned. By marking the
image with a "thick-provisioned" key in the metadata, future operations
can check the need.

A missing "thick-provisioned" key indicates that the image has not been
thick-provisioned.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-19 11:55:40 +00:00
Niels de Vos
74d218df8d rbd: disable rbd_discard_on_zeroed_write_same for thick-allocation
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-19 11:55:40 +00:00
Niels de Vos
5522a05f59 rbd: thick-provision images on request
Write blocks of stripe-size to allocate RBD images when
Thick-Provisioning is enabled in the StorageClass.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-19 11:55:40 +00:00
Madhu Rajanna
c417a5d0ba rbd: add support for thick provisioning option
Add an option to the StorageClass to support creating fully allocated
(thick provisioned) RBD images

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-19 11:55:40 +00:00
Niels de Vos
4937e59c4d rbd: add backwards compatible encryption in NodeStageVolume
When a volume was provisioned by an old Ceph-CSI provisioner, the
metadata of the RBD image will contain `requiresEncryption` to indicate
a passphrase needs to be created. New Ceph-CSI provisioners create the
passphrase in the CreateVolume request, and set `encryptionPrepared`
instead.

When a new node-plugin detects that `requiresEncryption` is set in the
RBD image metadata, it will fallback to the old behaviour.

In case `encryptionPrepared` is read from the RBD image metadata, the
passphrase is used to cryptsetup/format the image.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-17 17:51:13 +00:00
Niels de Vos
ee79b22c97 rbd: move encryption function to encryption.go
This adds internal/rbd/encryption.go which will be used to include other
encryption functionality to support additional KMS related functions. It
will work together with the shared API from internal/util/kms.go.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-17 17:51:13 +00:00
Niels de Vos
9b6c2117f3 rbd: set encryption passphrase on CreateVolume
Have the provisioner create the passphrase for the volume, instead of
doign it lazily at the time the volume is used for the 1st time. This
prevents potential races where pods on different nodes try to store
different passphrases at the (almost) same time.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-17 17:51:13 +00:00
Niels de Vos
d534ee9ce8 rbd: include rados-namespace when calling addRbdManagerTask()
It seems that calls to addRbdManagerTask() do not include the
rados-namespace in the image location. Functions calling
addRbdManagerTask() construct the image location themselves, but should
use rbdVolume.String() to include all the attributes.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-12 12:02:14 +00:00
Niels de Vos
8d0b39e690 rbd: log error when scheduling flattening fails
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-02-12 12:02:14 +00:00
Niels de Vos
0b7521162c cleanup: rewrite ifElseChains to switch statements
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-01-27 13:03:56 +00:00
Ilya Dryomov
04644c1d58 rbd: enable mapping and unmapping from a network namespace
Make rbdplugin pod work in a non-initial network namespace (i.e. with
"hostNetwork: false") by skipping waiting for udev events when mapping
and unmapping images.  CSI use case is very simple: all that is needed
is a device node which is immediately fed to mkfs, so we should be able
to tolerate udev not being finished with the device just fine.

Fixes: #1323
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-01-07 15:34:05 +00:00
Ilya Dryomov
c2493686b7 rbd: introduce appendDeviceTypeAndOptions()
Factor out --device-type and --options formatting.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-01-07 15:34:05 +00:00
Ilya Dryomov
d3f31187fc rbd: rename ndbType parameter
Fix "ndb" typo.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-01-07 15:34:05 +00:00
Ilya Dryomov
5631b83dd0 rbd: rename mapOptions and options argument slices
With the new support for passing --options, referring to ExecCommand()
argument slices as mapOptions and options is confusing.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2021-01-07 15:34:05 +00:00
Seena Fallah
fdec9f65b8 rbd: fix namespace json parser for xbdDeviceInfo
rbd device list --format=json returns namespace as a namespace not radosNamespace

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2021-01-05 11:26:09 +00:00
Madhu Rajanna
9c7176dbb4 rbd: update mount packges in import path
mount packges is moved from
k8s.io/utils/mount to a new repository
k8s.io/mount-utils. updated code to use
the same.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-12-17 16:04:54 +00:00
Niels de Vos
8f91c672d4 util: add EncryptionKMS.Destroy()
Add a new method to the EncryptionKMS interface so that resources can be
freed when EncryptionKMS instances get freed.

With the move to using the libopenstorage API, a temporary file needs to
store the optional CA certificate. The Destroy() method of the
vaultConnection type now removes this file.

The rbdVolume uses the EncryptionKMS type now, so call the new Destroy()
method from withing rbdVolume.Destroy().

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-14 14:45:09 +00:00
Niels de Vos
f08182e2fc rbd: pass Owner to GetKMS()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-14 14:45:09 +00:00
Niels de Vos
16cb43f0f9 rbd: store csi.storage.k8s.io/pvc/namespace metadata as Owner
The Owner of an RBD image (Kubernetes Namespace, tenant) can be used to
identify additional configuration options. This will be used for
fetching the right Vault Token when encrypting/decrypting volumes.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 13:58:48 +00:00
Niels de Vos
f8ebc6aa3f cleanup: return error type in ensureEncryptionMetadataSet()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
d8e443ab49 cleanup: return error type in cleanupRBDImageMetadataStash()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
f262673b60 cleanup: return error type in lookupRBDImageMetadataStash()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
8e589587ae cleanup: return error type in stashRBDImageMetadata()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
57ce07f54e cleanup: return error type in updateVolWithImageInfo()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
610162b5f4 cleanup: return error type in genVolFromVolumeOptions()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
729e2419ef cleanup: return error type in detachRBDImageOrDeviceSpec()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
7eae69f10c cleanup: return error type in rbdGetDeviceList()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
4dde3fc9e0 cleanup: return error type in encryptDevice()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
d6fb8f302d cleanup: return error type in NodeServer.processEncryptedDevice()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
8019e4d1bc rbd: return CSI status-error on resize failure
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
65a10fd553 cleanup: standardize error format in NodeServer.NodeStageVolume()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Niels de Vos
cc3f146ad1 cleanup: return error type in rbdVolume.checkCloneImage()
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-12-09 08:35:35 +00:00
Madhu Rajanna
43fde0a30a cleanup: add a helper function storeImageID
added a helper function storeImageID to reduce
code duplication.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-12-07 11:03:27 +00:00
Madhu Rajanna
c40872df00 rbd: undo reservation incase of errors
If cephcsi encounters any error after
reservation, as a cleanup operation
it should revert back the reservation.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-12-07 11:03:27 +00:00
Madhu Rajanna
99dbe27921 rbd: return nil if the omap data exists
If the omap data already exits return nil.
so that omap generator will not try to reserve
anything again.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-12-07 11:03:27 +00:00
Madhu Rajanna
8ebb9a1ba0 cleanup: fix misspell words
fixed misspell words detected by  codespell

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-30 08:46:48 +01:00
Madhu Rajanna
39b1f2b4d3 cleanup: fix mispell words
fixed mispell words in the repo.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-29 12:47:46 +05:30
Madhu Rajanna
6091490393 rbd: improve logging in getCloneDepth
earlier if the depth check fails the
complete vol struct was getting logged,
this commits logs only the pool and image
name.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
b3120926b9 rbd: remove extra Destory of parent volume
removed extra Destory of the parent volume.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
68bd44beba rbd: add new controller to regenerate omap data
In the case of Disaster Recovery failover, the
user expected to create the static PVC's. We have
planned not to go with the PVC name and namespace
for many reasons (as in kubernetes it's planned to
support PVC transfer to a new namespace with a
different name and with new features coming in
like data populator etc). For now, we are
planning to go with static PVC's to support
async mirroring.

During Async mirroring only the RBD images are
mirrored to the secondary site, and when the
user creates the static PVC's on the failover
we need to regenerate the omap data. The
volumeHandler in PV spec is an encoded string
which contains clusterID and poolID and image UUID,
The clusterID and poolID won't remain same on both
the clusters, for that cephcsi need to generate the
new volume handler and its to create a mapping
between new volume handler and old volume handler
with that whenever cephcsi gets csi requests it
check if the mapping exists it will pull the new
volume handler and continues other operations.

The new controller watches for the PVs created,
It checks if the omap exists if it doesn't it
will regenerate the entire omap data.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
14700b89d1 rbd: update inuse logic of a rbd image
in case of mirrored image, if the image is
primary a watcher will be added by the rbd
mirror deamon on the rbd image.
we have to consider 2 watcher to check image
is in use.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
ba84f14241 journal: create object with provided UUID
incase of async mirroring the volume UUID is
retrieved from the volume name, instead of cephcsi
generating a new UUID it should reserve the passed
UUID it will be useful when we support both metro DR
and async mirroring on a kubernetes clusters.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
8d3a44d0c4 rbd: add minsnapshotsonimage flag
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time.  Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC  creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-18 05:59:20 +00:00
Mudit Agarwal
0ecfd0e72c rbd: replace go-ceph GetParentInfo() with GetParent()
GetParent() is a newer and better version of
GetParentInfo() in go-ceph.

Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
2020-11-03 08:00:12 +00:00