Before RBD map operation, we do check the
watchers on the RBD image. In the case of
RWO volume. cephcsi makes sure only one
client is using the RBD image. If the rbd
image is mirrored, by default mirroring
daemon will add a watcher on the image
and as we are using go-ceph a watcher will
be added as we have opened the image So
we will have two watchers on an image if
mirroring is enabled. This holds when the
rbd mirror daemon is running, In case if
the mirror daemon is not running there will
be only one watcher on the rbd image
(which is placed by go-ceph image open)
we should not block the map operation if
the mirroring daemon is not running as
its Async mirroring. This commit adds a
check to make sure no more than 2 watchers
if the image is mirrored or no more than 1
watcher if it is not mirrored image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Since rbdImage is a common struct for
rbdVolume and rbdSnapshot, it description
was matching to only snapshot.
This commit makes the comments generic for
both volumes and snapshots.
Signed-off-by: Yug <yuggupta27@gmail.com>
By default, the write buffer size in libfuse2 is 2KiB
`fuse_big_writes = true` option is used to override this limit.
This commit makes `fuse_big_writes = true` option as default
in ceph.conf.
Closes: #1928
Signed-off-by: Rakshith R <rar@redhat.com>
If the pool or few keys are missing in the omap.
GetImageAttributes function returns nil error message and few
empty items in imageAttributes struct. if the image is not
found and the entiries are missing use
the volumeId present on the PV annotation for further operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase if the image is promoted and demoted the
image state will be set to up+unknown if the image
on the remote cluster is still in demoted state.
when user changes the state from primary to secondary
and still the image is in demoted (secondary) state
in the remote cluster. the image state on both the cluster
will be on unknown state.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It helps to get a stack trace when debugging issues. Certain things are
considered bugs in the code (like missing attributes in a struct), and
might cause a panic in certain occasions.
In this case, a missing string will not panic, but the behaviour will
also not be correct (DEKs getting encrypted, but unable to decrypt).
Clearly logging this as a BUG is probably better than calling panic().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It is possible that when a provisioner restarts after a snapshot was
cloned, but before the newly restored image had its encryption metadata
set, the new image is not marked as encrypted. This will prevent
attaching/mounting the image, as the encryption key will not be fetched,
or is not available in the DEKStore.
By actively repairing the encryption configuration when needed, this
problem should be addressed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
buildCreateVolumeResponse() exists exactly for the need to create a
csi.CreateVolumeResponse based on an rbdVolume. Calling this helper
reduces the code duplication in CreateVolume().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbdVolume that needs its encryption configured is constructed in the
Exists() method. It is suitable to move the copyEncryptionConfig() call
there as well, so that the object is completely constructed in a single
place.
Golang-ci:gocyclo complained about the increased complexity of the
Exists() function. Moving the repairing of the ImageID into its own
helper function makes the code a little easier to understand.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Introduce helper function cloneFromSnapshot() that takes care of the
procedures that are needed when an existing snapshot has been found.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When a source volume is encrypted, the passphrase needs to be copied and
stored for the newly cloned volume.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Cloning volumes requires copying the DEK from the source to the newly
cloned volume. Introduce copyEncryptionConfig() as a helper for that.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new StoreCryptoPassphrase() method makes it possible to store an
unencrypted passphrase newly encrypted in the DEKStore.
Cloning volumes will use this, as the passphrase from the original
volume will need to get copied as part of the metadata for the volume.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Without this, the rbdVolume can not connect to the Ceph cluster and
configure the (optional) encryption.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The ControllerServer should not need to care about support for
encryption, ideally it is transparantly handled by the rbdVolume type
and its internal API.
Deleting the DEK was one of the last remainders that was explicitly done
inside the ControllerServer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the KMS configuration can not be found, it is useful to know what
configurations are available. This aids troubleshooting when typos in
the KMS ID are made.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Currently default ControllerGetCapabilities function is being
used which throws 'runtime error: invalid memory address or
nil pointer dereference' when `--controllerServer=true` is not
set in provisioner deployment args.
This commit adds a check to prevent it.
Fixes: 1925
Signed-off-by: Rakshith R <rar@redhat.com>
nolint directive needs to be followed by comma separated
list of linters. This commit changes to gocognit:gocyclo
which was not recognised to linters which show error for
the function.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit appends stderr to error in both kernel and
ceph-fuse mounter functions to better be able to debug
errors.
Signed-off-by: Rakshith R <rar@redhat.com>
there can be a change we can reconcile same
PV parallelly we can endup in generating and
deleting multiple omap keys. to be on safer
side taking lock to process one volumeHandle
at a time.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In the case of the Async DR, the volumeID will
not be the same if the clusterID or the PoolID
is different, With Earlier implementation, it
is expected that the new volumeID mapping is
stored in the rados omap pool. In the case of the
ControllerExpand or the DeleteVolume Request,
the only volumeID will be sent it's not possible
to find the corresponding poolID in the new cluster.
With This Change, it works as below
The csi-rbdplugin-controller will watch for the PV
objects, when there are any PV objects created it
will check the omap already exists, If the omap doesn't
exist it will generate the new volumeID and it checks for
the volumeID mapping entry in the PV annotation, if the
mapping does not exist, it will add the new entry
to the PV annotation.
The cephcsi will check for the PV annotations if the
omap does not exist if the mapping exists in the PV
annotation, it will use the new volumeID for further
operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Refactored deeply nested if statement in vault_tokens.go to
reduce cognitive complexity by adding fetchTenantConfig function.
Signed-off-by: Rakshith R <rar@redhat.com>
It seems that newer versions of some tools/libraries identify encrypted
filesystems with `crypto_LUKS` instead of `crypt`.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case of the DR the image on the primary site cannot be
demoted as the cluster is down, during failover the image need
to be force promoted. RBD returns `Device or resource busy`
error message if the image cannot be promoted for above reason.
Return FailedPrecondition so that replication operator can send
request to force promote the image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
instead of fetching the force option from the
parameters. Use the Force field available in
the PromoteVolume Request.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as the org github.com/kube-storage is renamed
to github.com/csi-addons as the name kube-storage
was more generic.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The new Amazon Metadata KMS provider uses a CMK stored in AWS KMS to
encrypt/decrypt the DEK which is stored in the volume metadata.
Updates: #1921
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Amazon KMS expects a Secret with sensitive account and key information
in the Kubernetes Namespace where the Ceph-CSI Pods are running. It will
fetch the contents of the Secret itself.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
These functions can now be re-used easier. The Amazon KMS needs to know
the Namespace of the Pod for reading a Secret with more key/values.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Incase of resync the image will get deleted, gets
recreated and its a a time consuming operation.
It makes sense to return aborted error instead
of not found as we have omap data only the image
is missing in rbd pool.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Do resync if the image is in unknow or in error
state.
Check for the current image state for up+stopped
or up+replaying and also all peer site status
should be un up+stopped to confirm that resyncing
is done and image can be promoted and used.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added replication related operations as a method
of rbdImage as these methods can be easily used
when we introduce volumesnaphot mirroring operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
the rbd mirror state can be in enabled,disabled
or disabling state. If the mirroring is not disabled
yet and still in disabling state. we need to
check for it and return abort error message if
the mirroring is still getting disabled.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added ReplicationServer struct for the replication related
operation it also embed the ControllerServer which
already implements the helper functions like locking/unlocking etc.
removed getVolumeFromID and cleanup functions for better
code readability and easy maintaince.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
After translating options from the ConfigMap into the common Vault
parameters, the generated configuration is not used. Instead, the
untranslated version of the configuration is passed on to the
vaultConnection initialization function, which then can detects missing
options.
By passing the right configuration to the initializatino function,
things work as intended.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When using .BinaryData, the contents of the configuration is not parsed
correctly. Whereas the parsing works fine whet .Data is used.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit fixes bug in unmount function which caused
unmountVolume to fail when targetPath was already unmounted.
Signed-off-by: Rakshith R <rar@redhat.com>
There is no need for each EncryptionKMS to implement the same GetID()
function. We have a VolumeEncryption type that is more suitable for
keeping track of the KMS-ID that was used to get the configuration of
the KMS.
This does not change any metadata that is stored anywhere.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
GetKMS() is the public API that initilizes the KMS providers on demand.
Each provider identifies itself with a KMS-Type, and adds its own
initialization function to a switch/case construct. This is not well
maintainable.
The new GetKMS() can be used the same way, but uses the new kmsManager
interface to create and configure the KMS provider instances.
All existing KMS providers are converted to use the new kmsManager
plugins API.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The KMSProvider struct is a simple, extendable type that can be used to
register KMS providers with an internal kmsManager.
Helper functions for creating and configuring KMS providers will also be
located in the new kms.go file. This makes things more modular and
better maintainable.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Current rbd plugin only supports the layering feature
for rbd image. Add exclusive-lock and journaling image
features for the rbd.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: woohhan <woohyung_han@tmax.co.kr>
Failed to delete voluesnapshot when backend subvolume
(pvc) and ceph fs subvolume snapshot is deleted
Fixes#1647
Signed-off-by: Yati Padia <ypadia@redhat.com>
rbdVolumes can have several resources that get allocated during its
usage. Only destroying the IOContext may not be suffiecient and can
cause resource leaks.
Use rbdVolume.Destroy() when the rbdVolume is not used anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Connections are reference counted, so just assigning the connection to
an other object for re-use is not correct. This can cause connections to
be garbage collected while something else is still using it.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
as RBD is implementing the replication
we are registering it. For CephFS, its
not implementing the replication we are
passing nil so we dont want to register
it.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Because rbdVolume and rbdSnapshot are very similar, they can be based
off a common struct rbdImage that contains the common attributes and
functions.
This makes it possible to re-use functions for snapshots, and prevents
further duplication or code.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbdSnapshot and rbdVolume structs have many common attributes. In
order to combine these into an rbdImage struct that implements shared
functionality, having the same attribute for the ID makes things much
easier.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This new KMS is based on the (default) SecretsKMS, but instead of using
the passphrase for all volumes, the passphrase is used to
encrypt/decrypt a Data-Encryption-Key that is stored in the metadata of
the volume.
CC: Patrick Uiterwijk <puiterwijk@redhat.com> - for encryption guidance
Signed-off-by: Niels de Vos <ndevos@redhat.com>
By adding these methods, a KMS can explicitly encrypt/decrypt the DEK if
there is no transparent way of doing so.
Hashicorp Vault encrypts the DEK when it it stored, and decrypts it when
fetched. Therefor there is no need to do any encryption in this case.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
NewVolumeEncryption() will return an indication that an alternative
DEKStore needs to be configured in case the KMS does not support it.
setKMS() will also set the DEKStore if needed, so renaming it to
configureEncryption() makes things clearer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Use DEKStore API for Fetching and Storing passphrases.
Drop the fallback for the old KMS interface that is now provided as
DEKStore. The original implementation has been re-used for the DEKStore
interface.
This also moves GetCryptoPassphrase/StoreNewCryptoPassphrase functions
to methods of VolumeEncryption.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
DEKStore is a new interface that will be used for Storing and Fetching
DEKs. The existing implementations for KMS already function as a
DEKStore, and will be updated to match the interface.
By splitting KMS and DEKStore into two components, the encryption
configuration for volumes becomes more modular. This makes it possible
to implement a DEKStore where the encrypted DEK for a volume is stored
in the metadata of the volume (RBD image).
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Prepare for grouping encryption related functions together. The main
rbdVolume object should not be cluttered with KMS or DEK procedures.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Prepared for an enhanced API to communicate with a KMS and keep the DEK
storage separate. The crypto.go file is already mixed with different
functions, so moving the KMS part into its own file, just like we have
for Hashicorp Vault KMS's.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
For NodeUnstageVolume its a two step process,
first unmount the volume and than unmap the volume.
Currently, we are logging only after rbd unmapping is done.
sometimes it becomes difficult to debug with above logging
whether more time is spent in unmount or unmap.
This commits adds one more debug log after unmount is done.
with this we can identify where exactly more time is spent
by looking at the logs.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It seems that writing more than 1 GiB per WriteSame() operation causes
an EINVAL (22) "Invalid argument" error. Splitting the writes in blocks
of maximum 1 GiB should prevent that from happening.
Not all volumes are of a size that is the multiple of the stripe-size.
WriteSame() needs to write full blocks of data, so in case there is a
small left-over, it will be filled with WriteAt().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Introduce initKMS() as a function of rbdVolume. KMS functionality does
not need to pollute general RBD image functions. Encryption functions
are now in internal/rbd.encryption.go, so move initKMS() there as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
when user provides an option for VolumeNamePrefix
create subvolume with the prefix which will be easy
for user to identify the subvolumes belongs to
the storageclass.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When and RBD image is expanded, the additional extents need to get
allocated when the image was thick-provisioned.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When images get resized/expanded, the additional space needs to be
allocated if the image was initially thick-provisioned. By marking the
image with a "thick-provisioned" key in the metadata, future operations
can check the need.
A missing "thick-provisioned" key indicates that the image has not been
thick-provisioned.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Write blocks of stripe-size to allocate RBD images when
Thick-Provisioning is enabled in the StorageClass.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add an option to the StorageClass to support creating fully allocated
(thick provisioned) RBD images
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When a volume was provisioned by an old Ceph-CSI provisioner, the
metadata of the RBD image will contain `requiresEncryption` to indicate
a passphrase needs to be created. New Ceph-CSI provisioners create the
passphrase in the CreateVolume request, and set `encryptionPrepared`
instead.
When a new node-plugin detects that `requiresEncryption` is set in the
RBD image metadata, it will fallback to the old behaviour.
In case `encryptionPrepared` is read from the RBD image metadata, the
passphrase is used to cryptsetup/format the image.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This adds internal/rbd/encryption.go which will be used to include other
encryption functionality to support additional KMS related functions. It
will work together with the shared API from internal/util/kms.go.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Storing a passphrase is now done while the volume is created. There is
no need to (re)generate a passphrase when it can not be found.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Have the provisioner create the passphrase for the volume, instead of
doign it lazily at the time the volume is used for the 1st time. This
prevents potential races where pods on different nodes try to store
different passphrases at the (almost) same time.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
"VAULT_SKIP_VERIFY" is a standard Hashicorp Vault environment variable
(a string) that needs to get converted to the "vaultCAVerify"
configuration option in the Ceph-CSI format.
The value of "VAULT_SKIP_VERIFY" means the reverse of "vaultCAVerify",
this part was missing in the original conversion too.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It seems that calls to addRbdManagerTask() do not include the
rados-namespace in the image location. Functions calling
addRbdManagerTask() construct the image location themselves, but should
use rbdVolume.String() to include all the attributes.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are many usecases with adding the subvolume
path to the PV object. the volume context returned
in the createVolumeResponse is added to the PV object
by the external provisioner.
More Details about the usecases are in below link
https://github.com/rook/rook/issues/5471
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>