DEKStore is a new interface that will be used for Storing and Fetching
DEKs. The existing implementations for KMS already function as a
DEKStore, and will be updated to match the interface.
By splitting KMS and DEKStore into two components, the encryption
configuration for volumes becomes more modular. This makes it possible
to implement a DEKStore where the encrypted DEK for a volume is stored
in the metadata of the volume (RBD image).
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Prepare for grouping encryption related functions together. The main
rbdVolume object should not be cluttered with KMS or DEK procedures.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Prepared for an enhanced API to communicate with a KMS and keep the DEK
storage separate. The crypto.go file is already mixed with different
functions, so moving the KMS part into its own file, just like we have
for Hashicorp Vault KMS's.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
For NodeUnstageVolume its a two step process,
first unmount the volume and than unmap the volume.
Currently, we are logging only after rbd unmapping is done.
sometimes it becomes difficult to debug with above logging
whether more time is spent in unmount or unmap.
This commits adds one more debug log after unmount is done.
with this we can identify where exactly more time is spent
by looking at the logs.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It seems that writing more than 1 GiB per WriteSame() operation causes
an EINVAL (22) "Invalid argument" error. Splitting the writes in blocks
of maximum 1 GiB should prevent that from happening.
Not all volumes are of a size that is the multiple of the stripe-size.
WriteSame() needs to write full blocks of data, so in case there is a
small left-over, it will be filled with WriteAt().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Introduce initKMS() as a function of rbdVolume. KMS functionality does
not need to pollute general RBD image functions. Encryption functions
are now in internal/rbd.encryption.go, so move initKMS() there as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
when user provides an option for VolumeNamePrefix
create subvolume with the prefix which will be easy
for user to identify the subvolumes belongs to
the storageclass.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When and RBD image is expanded, the additional extents need to get
allocated when the image was thick-provisioned.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When images get resized/expanded, the additional space needs to be
allocated if the image was initially thick-provisioned. By marking the
image with a "thick-provisioned" key in the metadata, future operations
can check the need.
A missing "thick-provisioned" key indicates that the image has not been
thick-provisioned.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Write blocks of stripe-size to allocate RBD images when
Thick-Provisioning is enabled in the StorageClass.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add an option to the StorageClass to support creating fully allocated
(thick provisioned) RBD images
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When a volume was provisioned by an old Ceph-CSI provisioner, the
metadata of the RBD image will contain `requiresEncryption` to indicate
a passphrase needs to be created. New Ceph-CSI provisioners create the
passphrase in the CreateVolume request, and set `encryptionPrepared`
instead.
When a new node-plugin detects that `requiresEncryption` is set in the
RBD image metadata, it will fallback to the old behaviour.
In case `encryptionPrepared` is read from the RBD image metadata, the
passphrase is used to cryptsetup/format the image.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This adds internal/rbd/encryption.go which will be used to include other
encryption functionality to support additional KMS related functions. It
will work together with the shared API from internal/util/kms.go.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Storing a passphrase is now done while the volume is created. There is
no need to (re)generate a passphrase when it can not be found.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Have the provisioner create the passphrase for the volume, instead of
doign it lazily at the time the volume is used for the 1st time. This
prevents potential races where pods on different nodes try to store
different passphrases at the (almost) same time.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
"VAULT_SKIP_VERIFY" is a standard Hashicorp Vault environment variable
(a string) that needs to get converted to the "vaultCAVerify"
configuration option in the Ceph-CSI format.
The value of "VAULT_SKIP_VERIFY" means the reverse of "vaultCAVerify",
this part was missing in the original conversion too.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It seems that calls to addRbdManagerTask() do not include the
rados-namespace in the image location. Functions calling
addRbdManagerTask() construct the image location themselves, but should
use rbdVolume.String() to include all the attributes.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are many usecases with adding the subvolume
path to the PV object. the volume context returned
in the createVolumeResponse is added to the PV object
by the external provisioner.
More Details about the usecases are in below link
https://github.com/rook/rook/issues/5471
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
the tenant/namespace is needed to read the certificates,
this commit sets the tenant in kms object.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
currently, the keys for kms certificates/keys in a
secret is ca.cert, tls.cert and
tls.key, this commit changes the key from ca.cert
and tls.cert to cert and tls.key to key.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
if are reading the kms data from the file.
than only we need to unmarshal. If we are reading
from the configmap it already returns the unmarshal
data.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The configuration option `EnvVaultInsecure` is expected to be a string,
not a boolean. By converting the bool back to a string (after
verification), it is now possible to skip the certificate validation
check by setting `vaultCAVerify: false` in the Vault configuration.
Fixes: #1852
Reported-by: Bryon Nevis <bryon.nevis@intel.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the KMS VaultTokens is configured through a Kubernetens ConfigMap,
the `VAULT_SKIP_VERIFY` option was not taken into account. The option
maps to the `vaultCAVerify` value in the configuration file, but has the
reverse meaning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This can happen when the subvolume is in snapshot-retained state.
We should not return error for such case as it is a valid situation.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
Currently cephcsi is returning an error
if the ENV variable is set, but it should not.
This commit fixes the the POD_NAMESPACE env
variable issue and as well as the KMS_CONFIG_NAME
ENV variable.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Make rbdplugin pod work in a non-initial network namespace (i.e. with
"hostNetwork: false") by skipping waiting for udev events when mapping
and unmapping images. CSI use case is very simple: all that is needed
is a device node which is immediately fed to mkfs, so we should be able
to tolerate udev not being finished with the device just fine.
Fixes: #1323
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
With the new support for passing --options, referring to ExecCommand()
argument slices as mapOptions and options is confusing.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently, we are using bool pointer to find out
the ceph cluster supports resize or not. This
commit replaces the bool pointer with enum.
Signed-off-by: Yati Padia <ypadia@redhat.com>
Fixes#1764
mount packges is moved from
k8s.io/utils/mount to a new repository
k8s.io/mount-utils. updated code to use
the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
if the kms encryption configmap is not mounted
as a volume to the CSI pods, add the code to
read the configuration from the kubernetes. Later
the code to fetch the configmap will be moved to
the new sidecar which is will talk to respective
CO to fetch the encryption configurations.
The k8s configmap uses the standard vault spefic
names to add the configurations. this will be converted
back to the CSI configurations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
A tenant can place a ConfigMap in their Kubernetes Namespace with
configuration options that differ from the global (by the Storage Admin
set) values.
The ConfigMap needs to be located in the Tenants namespace, as described
in the documentation
See-also: docs/design/proposals/encryption-with-vault-tokens.md
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Added a option to pass the client certificate
and the client certificate key for the vault token
based encryption.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Tenants (Kubernetes Namespaces) can use their own Vault Token to manage
the encryption keys for PVCs. The working is documented in #1743.
See-also: #1743Closes: #1500
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add a new method to the EncryptionKMS interface so that resources can be
freed when EncryptionKMS instances get freed.
With the move to using the libopenstorage API, a temporary file needs to
store the optional CA certificate. The Destroy() method of the
vaultConnection type now removes this file.
The rbdVolume uses the EncryptionKMS type now, so call the new Destroy()
method from withing rbdVolume.Destroy().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Make it possible to calle initConnection() multiple times. This enables
the VaultTokensKMS type to override global settings with options from a
per-tenant configuration.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This makes it possible to pass a more complex configuration to the
initialize functions for KMS's. The upcoming VaultTokensKMS can use
overrides for configiration options on a per tenant basis. Without this
change, it would not be possible to consume the JSON configuration file.
See-also: #1743
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The Owner of an RBD image (Kubernetes Namespace, tenant) can be used to
identify additional configuration options. This will be used for
fetching the right Vault Token when encrypting/decrypting volumes.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
if the user has created a static PV for a RBD
image which is not created by CSI driver, dont
generate the OMAP data.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If cephcsi encounters any error after
reservation, as a cleanup operation
it should revert back the reservation.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the omap data already exits return nil.
so that omap generator will not try to reserve
anything again.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
clusterAdditionalInfo map is holding a localClusterState
for checking ceph cluster supports resize and subvolumegroup
is created or not, currently we are checking if the key is present
in a map and localClusterStatelocalClusterState.resizeSupported
is set to false to call ceph fs subvolume resize to check command is
supported or not, if a structure is initialized all its members
are set to default value. so we will never going to check the
ceph fs subvolume resize command is supported in backend or not, we are
always using ceph fs subvolume create to resize subvolume. in some
ceph version ceph fs subvolume create wont work to resize a subvolume.
This commit changes the resizeSupported from bool to *bool for
proper handling of this scenario.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
go ceph returns NotImplementedError for invalid
commands,cephcsi is using errors.As to find out
the error.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
go ceph returns NotImplementedError for invalid
commands,cephcsi is using errors.As to find out
the error.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In order to re-use the configuration of Vault, split a new
vaultConnection type from the VaultKMS type.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
getCloneInfo() does not need to return a full CloneStatus struct that
only has one member. Instead, it can just return the value of the single
member, so the JSON type/struct does not need to be exposed.
This makes the API for getCloneInfo() a little simpler, so it can be
replaced by a go-ceph implementation later on.
As the function does not return any of the unused attributes anymore, it
is renamed to getCloneStatu() as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Instead of the hand-rolled Vault usage, use the libopenstorage/secrets
package that provides a nice API. The support for Vault becomes much
simpler and maintainable that way.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
earlier if the depth check fails the
complete vol struct was getting logged,
this commits logs only the pool and image
name.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In the case of Disaster Recovery failover, the
user expected to create the static PVC's. We have
planned not to go with the PVC name and namespace
for many reasons (as in kubernetes it's planned to
support PVC transfer to a new namespace with a
different name and with new features coming in
like data populator etc). For now, we are
planning to go with static PVC's to support
async mirroring.
During Async mirroring only the RBD images are
mirrored to the secondary site, and when the
user creates the static PVC's on the failover
we need to regenerate the omap data. The
volumeHandler in PV spec is an encoded string
which contains clusterID and poolID and image UUID,
The clusterID and poolID won't remain same on both
the clusters, for that cephcsi need to generate the
new volume handler and its to create a mapping
between new volume handler and old volume handler
with that whenever cephcsi gets csi requests it
check if the mapping exists it will pull the new
volume handler and continues other operations.
The new controller watches for the PVs created,
It checks if the omap exists if it doesn't it
will regenerate the entire omap data.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
in case of mirrored image, if the image is
primary a watcher will be added by the rbd
mirror deamon on the rbd image.
we have to consider 2 watcher to check image
is in use.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase of async mirroring the volume UUID is
retrieved from the volume name, instead of cephcsi
generating a new UUID it should reserve the passed
UUID it will be useful when we support both metro DR
and async mirroring on a kubernetes clusters.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The function isCloneRetryError verifies
if the clone error is `pending` or
`in-progress` error.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
In certain cases, clone status can be 'pending'.
In that case, abort error message should be
returned similar to that during 'in-progress'
state.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
There is a type-check on BytesQuota after calling SubVolumeInfo() to see
if the value is supported. In case no quota is configured, the value
Infinite is returned. This can not be converted to an int64, so the
original code returned an error.
It seems that attaching/mounting sometimes fails with the following
error:
FailedMount: MountVolume.MountDevice failed for volume "pvc-0e8fdd18-873b-4420-bd27-fa6c02a49496" : rpc error: code = Internal desc = subvolume csi-vol-0d68d71a-1f5f-11eb-96d2-0242ac110012 has unsupported quota: infinite
By ignoring the quota of Infinite, and not setting a quota in the
Subvolume object, this problem should not happen again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The implementation of getOMapValues assumed that the number of key-value
pairs assigned to the object would be close to the number of keys
being requested. When the number of keys on the object exceeded the
"listExcess" value the function would fail to read additional keys
even if they existed in the omap.
This change sets a large fixed "chunk size" value and keeps reading
key-value pairs as long as the callback gets called and increments
the numKeys counter.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
When using go-ceph and the volumeOptions.Connect() call, the credentials
are not needed once the connection is established.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reduce the number of calls to the `ceph fs` executable to improve
performance of CephFS volume resizing.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This prepares resizeVolume() so that the volumeOptions.conn can be used
for connecting with go-ceph and use the connection cache.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
We have below exported function in credentials.go which is not
called from anywhere in the repo. Removing it for the same reason.
```
// NewCredentials generates new credentials when id and key
// are provided.
func NewCredentials(id, key string) (*Credentials, error) {
...
```
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
`cap` builtin function returns the capacity of a type. Its not
good practice to use this builtin function for other variable
names, removing it here
Ref# https://golang.org/pkg/builtin/#cap
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
golang-ci suddenly complains about the following issue
internal/cephfs/util.go:41:1: directive `// nolint:unparam // todo:program values has to be revisited later` is unused for linter unparam (nolintlint)
// nolint:unparam // todo:program values has to be revisited later
^
Dropping the comment completely seems to fix it. Ideally
execCommandJSON() will get removed once the migration to go-ceph is
complete.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
volJournal.Connect() got the error on err2 variable, however
the return was on variable err which hold the error return of
DecomposeCSIID() which is wrong. This cause the error return wrongly
parsed and pushed from the caller. From now on, we are reusing the
err variable to hold and revert the error of volJournal.Connect().
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Previously the purgeVolume error was ignored due to wrong error variable
check in the createVolume. With this change it checks on the proper error.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
The allocated, and potentially connected, volumeOptions object in
newVolumeOptionsFromVolID() is not cleaned-up in case of errors. This
could cause resource leaks.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Without connection, follow-up oparations on the volumeOptions object
will cause a panic. This should fix a regression in CephFS testing.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
All the previous condition checks exit from the function and
when it reach to this block its obvious that error is non nil,
we dont need an extra check here.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
There is no need to pass all secrets on to newVolumeOptions(), it only
needs the credentials. As the caller of newVolumeOptions() already has
the credentials generated, just pass them along instead of the raw
secrets.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The credentials are not used anymore, the volume object is already
connected to the cluster when createVolume() is called.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The go-ceph CephFS Admin API needs a rados.Conn to connect the Admin
interface to the Ceph cluster. As the rados.Conn is internal to the
ClusterConnection type, add GetFSAdmin() to create a new FSAdmin
instance for a specific connection.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add the ClusterConnection to the volumeOptions type, so that future use
of go-ceph can connect to the Ceph cluster.
Once a volumeOptions object is not needed anymore, it needs to get
destroyed to free associated resources like the ClusterConnection.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The error check condition in genVolFromID() is always false as far as
code reading/workflow goes. Removing it for the same reason.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
cephcsi uses cli for fetching snap list as well as to check the
snapshot namespace, replaced that with go-ceph calls.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
return a proper error message to the user when
the subvolume has the snapshots and it cannot
be removed until the snapshots on the subvolume
have to be deleted.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When passing
fuseMountOptions: debug
in the StorageClass, the mount options passed on the ceph-fuse
commandline result in "-o nonempty ,debug". The additional space before
the ",debug" causes the mount command to fail.
Fixes: 1485
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The subvolume features consists the list of features, which if includes
"snapshot-autoprotect", result in query based approach to
detecting the need for protect/unprotect.
If "snapshot-autoprotect" feature is present, The UnprotectSnapshot
call should be treated as a no-op
Signed-off-by: Yug <yuggupta27@gmail.com>
The subvolume features consists the list of features, which if includes
"snapshot-autoprotect", result in query based approach to
detecting the need for protect/unprotect.
If "snapshot-autoprotect" feature is present, The ProtectSnapshot
call should be treated as a no-op
Signed-off-by: Yug <yuggupta27@gmail.com>
Use subvolume info to fetch the subvolume path.
If `subvolume info` command is not available,
use `getpath` command instead.
Signed-off-by: Yug <yuggupta27@gmail.com>
Snapshots can be retained even after subvolume deletion in
Ceph 14.2.12. Adding support for the same in ceph-csi.
Signed-off-by: Yug <yuggupta27@gmail.com>
It doesnot make sense to allow the creation of empty
volumes with readonly access, this commit allows the
creation of volume which is having readonly capabilities
only if the content source is set for the volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It doesnot make sense to allow the creation of empty
volumes which is going to be accessed with readonly mode,
this commit allows the creation of volume which is having
readonly capabilities only if the content source is set
for the volume.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This spec add the extra capability to node and controller
volume to report volume condition of a pv..etc.
Refer # https://github.com/ceph/ceph-csi/issues/1356
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Form kubernetes v1.19 onwards NodeRequest is getting volume path
in StagingTargetPath instead of VolumePath, cephcsi should also
use the same.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
updateVolWithImageInfo() is currently executing an rbd command
to get details about an RBD image. Replaced it with the
required go-ceph functions.
Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
in case of clone failure, we need to first delete
the clone and the snapshot from which we created
the clone, then as part of cleanup we need to remove
the temporary cloned image and the temporary snapshot
created on the parent image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
we should not return the CLI errors in GRPC errors
we need to return proper readable error messages
to the user for better understanding and better
debugging.
updates #1242
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
At CSI spec < 1.2.0, there was no volumecapability in the
expand request. However its available from v1.2+ which allows
us to declare the node operations based on the volume mode.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
currently the lock is not released which is
taken on the request name. this is causing issues
when the subvolume is requested for delete.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
execCommandErr returns both error and stderror
message. checking strings.HasPrefix is not helpful
as the stderr will be the first string. its good
to do string comparison and find out that error
is volume not found error.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
rename FatalLog to FatalLogMsg to keep functions
in parity functions ends with Msg if its only message
logging.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as we have 2 functions for logging. one for logging
with message and another one is for logging with
context. renamed ErrorLog to ErrorLogMsg to log
with messages.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
if the image is created without flattening image-feature
the image will get few image-features by default, deep-flatten
is one of them. if the image doesnot have any parent
the rbd image flattening will fail, This commit discards
error message if the image doesnt have any parent.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
merging of https://github.com/ceph/ceph-csi/pull/1035
broken the cephcsi building. This commits fixes
the build issue.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Make sure to operate within the namespace if any given
when dealing with rbd images and snapshots and their journals.
Signed-off-by: Mehdy Khoshnoody <mehdy.khoshnoody@gmail.com>