If the image is in a secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote cluster
and the local image is getting replayed. Delete the
OMAP data generated as we cannot delete the
secondary image. When the image on the primary
cluster gets deleted/mirroring disabled, the image on
all the remote (secondary) clusters will get
auto-deleted. This helps in garbage collecting
the OMAP, PVC and PV objects after failback operation.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the image is in secondary state and its
up+replaying means its an healthy secondary
and the image is primary somewhere in the remote
cluster and the local image is getting replayed.
Return success for the Disabling mirroring as
we cannot disable the mirroring on the secondary
state, when the image on the remote site gets
disabled the image on all the remote (secondary)
will get auto deleted. This helps in garbage
collecting the volume replication kuberentes
artifacts
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit adds functionality of extracting encryption kmsID,
owner from volumeAttributes in RegenerateJournal() and adds utility
functions ParseEncryptionOpts and FetchEncryptionKMSID.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit refractors RegenerateJournal() to take in
volumeAttributes map[string]string as argument so it
can extract required attributes internally.
Signed-off-by: Rakshith R <rar@redhat.com>
For clusterMappingConfigFile using different
file name so that multiple unit test cases can
work without any data race.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
consider the empty mirroring mode when
validating the snapshot interval and
the scheduling time.
Even if the mirroring Mode is not set
validate the snapshot scheduling details
as cephcsi sets the mirroring mode to default
snapshot.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit fixes snapshot id idempotency issue by
always returning an error when flattening is in progress
and not using `readyToUse:false` response.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit fixes a bug in checkCloneImage() which was caused
by checking cloned image before checking on temp-clone image snap
in a subsequent request which lead to stale images. This was solved
by checking temp-clone image snap and flattening temp-clone if
needed.
This commit also fixes comparison bug in flattenCloneImage().
Signed-off-by: Rakshith R <rar@redhat.com>
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
rbd flatten functions is a CLI call and it expects
the creds as the input and copying of creds is
required when we generate the temp clone image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Volume generated from snap using genrateVolFromSnap
already copies volume ID correctly, therefore removing
`vol.VolID = rbdVol.VolID` which wrongly copies parent
Volume ID instead leading to error from copyEncryption()
on parent and clone volume ID being equal.
Signed-off-by: Rakshith R <rar@redhat.com>
Golang-ci complains about the following:
internal/util/vault_tokens.go:99:20: string `true` has 4 occurrences, but such constant `vaultDefaultDestroyKeys` already exists (goconst)
v.VaultCAVerify = "true"
^
This occurence of "true" can be replaced by vaultDefaultCAVerify so
address the warning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Hashicorp Vault does not completely remove the secrets in a kv-v2
backend when the keys are deleted. The metadata of the keys will be
kept, and it is possible to recover the contents of the keys afterwards.
With the new `vaultDestroyKeys` configuration parameter, this behaviour
can now be selected. By default the parameter will be set to `true`,
indicating that the keys and contents should completely be destroyed.
Setting it to any other value will make it possible to recover the
deleted keys.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Whenever Ceph-CSI receives a CSI/Replication
request it will first decode the
volumeHandle and try to get the required
OMAP details if it is not able to
retrieve, receives a `Not Found` error
message and Ceph-CSI will check for the
clusterID mapping. If the old volumeID
`0001-00013-site1-storage-0000000000000001
-b0285c97-a0ce-11eb-8c66-0242ac110002`
contains the `site1-storage` as the clusterID,
now Ceph-CSI will look for the corresponding
clusterID `site2-storage` from the above configmap.
If the clusterID mapping is found now Ceph-CSI
will look for the poolID mapping ie mapping between
`1` and `2`. Example:- pool with name exists on
both the clusters with different ID's Replicapool
with ID `1` on site1 and Replicapool with ID `2`
on site2. After getting the required mapping Ceph-CSI
has the required information to get more details
from the rados OMAP. If we have multiple clusterID mapping
it will loop through all the mapping and checks the
corresponding pool to get the OMAP data. If the clusterID
mapping does not exist Ceph-CSI will return an `Not Found`
error message to the caller.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added helper function to read the clusterID mapping
from the mounted file.
The clusterID mapping contains below mappings
* ClusterID mappings (to cluster to which we are failingover
and from which cluster failover happened)
* RBD PoolID mapping of between the clusters.
* CephFS FscID mapping between the clusters.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The VAULT_AUTH_MOUNT_PATH is a Vault configuration parameter that allows
a user to set a non default path for the Kubernetes ServiceAccount
integration. This can already be configured for the Vault KMS, and is
now added to the Vault Tenant SA KMS as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new `vaultAuthNamespace` configuration parameter can be set to the
Vault Namespace where the authentication is setup in the service. Some
Hashicorp Vault deployments use sub-namespaces for their users/tenants,
with a 'root' namespace where the authentication is configured. This
requires passing of different Vault namespaces for different operations.
Example:
- the Kubernetes Auth mechanism is configured for in the Vault
Namespace called 'devops'
- a user/tenant has a sub-namespace called 'devops/website' where the
encryption passphrases can be placed in the key-value store
The configuration for this, then looks like:
vaultAuthNamespace: devops
vaultNamespace: devops/homepage
Note that Vault Namespaces are a feature of the Hashicorp Vault
Enterprise product, and not part of the Open Source version. This
prevents adding e2e tests that validate the Vault Namespace
configuration.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
- mount host's /etc/selinux in node plugins
- process mount options in all code paths for cephfs volume options
Signed-off-by: Alexandre Lossent <alexandre.lossent@cern.ch>
This commit uses `string.SplitN` instead of `string.Split`.
The path for pids.max has extra `:` symbols in it due to which
getCgroupPidsFile() splits the string into 5 tokens instead of
3 leading to loss of part of the path.
As a result, the below error is reported:
`Failed to get the PID limit, can not reconfigure: open
/sys/fs/cgroup/pids/system.slice/containerd.service/
kubepods-besteffort-pod183b9d14_aed1_4b66_a696_da0c738bc012.slice/pids.max:
no such file or directory`
SplitN takes an argument n and splits the string
accordingly which helps us to get the desired
file path.
Fixes: #2337
Co-authored-by: Yati Padia <ypadia@redhat.com>
Signed-off-by: Yati Padia <ypadia@redhat.com>
Currently we have a bug that we are not using rados
namespace when adding ceph manager command to
remove the image from the trash. This commit
adds the missing rados namespace when adding
ceph manager task.
without fix the image will be moved to trash
and no task will be added to remove from the
trash. it will become ceph responsibility to
remove the image from trash when it will cleanup
the trash.
workaroud: manually purge the trash
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
RBD image metadata keys that start with '.rbd' are expected to be
internal to RBD itself and are not mirrored to remote sites. Renaming
the keys (dropping the '.' prefix) and using the new MigrateMetadata()
function now makes the keys available on remote sites too.
Closes: #2219
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new MigrateMetadata() function can be used to get the metadata of an
image with a deprecated and new key. Renaming metadata keys can be done
easily this way.
A default value will be set in the image metadata when it is missing
completely. But if the deprecated key was set, the data is stored under
the new key and the deprecated key is removed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Currently, getImageMirroringStatus() is using RBD CLI.
This commit converts RBD CLI to go-ceph API.
Fixes: #2120
Signed-off-by: Yati Padia <ypadia@redhat.com>
Previously in ControllerExpandVolume() we had a check for encrypted
volumes and we use to fail for all expand requests on an encrypted
volume. Also for Block VolumeMode PVCs NodeExpandVolume used to be
ignored/skipped.
With these changes, we add support for the expansion of encrypted volumes.
Also for raw Block VolumeMode PVCs with Encryption we call NodeExpandVolume.
That said,
With LUKS1, cryptsetup utility doesn't prompt for a passphrase on resizing
the crypto mapper device. This is because LUKS1 devices don't use kernel
keyring for volume keys.
Whereas, LUKS2 devices use kernel keyring for volume key by default, i.e.
cryptsetup utility asks for a passphrase if it detects volume key was
previously passed to dm-crypt via kernel keyring service, we are overriding
the default by --disable-keyring option during cryptsetup open command.
So that at the time of crypto mapper device resize we will not be
prompted for any passphrase.
Fixes: #1469
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
With Luks1 device:
$ cryptsetup status /dev/mapper/crypto-rbd0
/dev/mapper/crypto-rbd0 is active and is in use.
type: LUKS1
cipher: aes-xts-plain64
keysize: 512 bits
key location: dm-crypt
device: /dev/rbd0
sector size: 512
offset: 4096 sectors
size: 4190208 sectors
mode: read/write
With Luks2 device:
$ cryptsetup status /dev/mapper/crypto-rbd0
/dev/mapper/crypto-rbd0 is active and is in use.
type: LUKS2
cipher: aes-xts-plain64
keysize: 512 bits
key location: dm-crypt
device: /dev/rbd0
sector size: 512
offset: 32768 sectors
size: 4161536 sectors
mode: read/write
This could lead to failures with unmap in the NodeUnstageVolume path
for the encrypted volumes.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This commit modifies the error of godot, cyclop,
paralleltest linter caused due to merged PRs.
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
At present while acquiring the deleteLock on the volume, we check
for ongoing clone and snapshot creation operations on the same.
Considering snapshot and clone controllers does not allow parent
volume deletion on subjected operations, we can be free from this
extra check.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
It seems that the version of the key/value engine can not always be
detected for Hashicorp Vault. In certain cases, it is required to
configure the `VAULT_BACKEND` (or `vaultBackend`) option so that a
successful connection to the service can be made.
The `kv-v2` is the current default for development deployments of
Hashicorp Vault (what we use for automated testing). Production
deployments default to version 1 for now.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
nlreturn linter requires a new line before return
and branch statements except when the return is alone
inside a statement group (such as an if statement) to
increase code clarity. This commit addresses such issues.
Updates: #1586
Signed-off-by: Rakshith R <rar@redhat.com>
This commit resolves errorlint issues
which checks for the code that will cause
problems with the error wrapping scheme.
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
revive linter checks for var-declaration
format.
For example:
"e2e/rbd_helper.go:441:36: var-declaration:
should drop = nil from declaration of
var noPVCValidation; it is the zero value (revive)
var noPVCValidation validateFunc = nil"
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
snapshot controller make sure the pvc which is the source for the
snapshot request wont get deleted while snapshot is getting created,
so we dont need to check for any ongoing delete operation here on the
volume.
Subjected code path in snapshot controller:
```
pvc, err := ctrl.getClaimFromVolumeSnapshot(snapshot)
.
..
pvcClone.ObjectMeta.Finalizers = append(pvcClone.ObjectMeta.Finalizers, utils.PVCFinalizer)
_, err = ctrl.client.CoreV1().PersistentVolumeClaims(pvcClone.Namespace).Update(..)
```
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Problem:
-------
For rbd nbd userspace mounter backends, after a restart of the nodeplugin
all the mounts will start seeing IO errors. This is because, for rbd-nbd
backends there will be a userspace mount daemon running per volume, post
restart of the nodeplugin pod, there is no way to restore the daemons
back to life.
Solution:
--------
The volume healer is a one-time activity that is triggered at the startup
time of the rbd nodeplugin. It navigates through the list of volume
attachments on the node and acts accordingly.
For now, it is limited to nbd type storage only, but it is flexible and
can be extended in the future for other backend types as needed.
From a few feets above:
This solves a severe problem for nbd backed csi volumes. The healer while
going through the list of volume attachments on the node, if finds the
volume is in attached state and is of type nbd, then it will attempt to
fix the rbd-nbd volumes by sending a NodeStageVolume request with the
required volume attributes like secrets, device name, image attributes,
and etc.. which will finally help start the required rbd-nbd daemons in
the nodeplugin csi-rbdplugin container. This will allow reattaching the
backend images with the right nbd device, thus allowing the applications
to perform IO without any interruptions even after a nodeplugin restart.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
As part of stage transaction if the mounter is of type nbd, then capture
device path after a successful rbd-nbd map.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
clone controller make sure there is no delete operation happens
on the source PVC which has been referred as the datasource of
clone PVC, we are safe to operate without looking at delete
operation lock in this case.
Subjected code in the controller:
...
if claim.Spec.DataSource != nil && rc.clone {
err = p.setCloneFinalizer(ctx, claim)
...
}
if !checkFinalizer(claim, pvcCloneFinalizer) {
claim.Finalizers = append(claim.Finalizers, pvcCloneFinalizer)
_, err := p.client.CoreV1().PersistentVolumeClaims(claim.Namespace).Update(..claim..)
}
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering kubelet make sure the stage and publish operations
are serialized, we dont need any extra locking in nodePublish
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering kubelet make sure the stage and publish operations
are serialized, we dont need any extra locking in nodePublish
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering kubelet make sure the unstage and unpublish operations
are serialized, we dont need any extra locking in nodeUnpublish
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Considering kubelet make sure the unstage and unpublish operations
are serialized, we dont need any extra locking in nodeUnpublish
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
parseTenantConfig() only allowed configuring a defined set of options,
and KMSs were not able to re-use the implementation. Now, the function
parses the ConfigMap from the Tenants Namespace and returns a map with
options that the KMS supports.
The map that parseTenantConfig() returns can be inspected by the KMS,
and applied to the vaultTenantConnection type by calling parseConfig().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This new KMS uses a Kubernetes ServiceAccount from a Tenant (Namespace)
to connect to Hashicorp Vault. The provisioner and node-plugin will
check for the configured ServiceAccount and use the token that is
located in one of the linked Secrets. Subsequently the Vault connection
is configured to use the Kubernetes token from the Tenant.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This makes the Tenant configuration for Hashicorp Vault KMS connections
more modular. Additional KMS implementations that use Hashicorp Vault
with per-Tenant options can re-use the new vaultTenantConnection.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit resolves parallel test issues
and also excludes internal/util/conn_pool_test.go
as those test can't run in parallel.
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
This commit resolves godot linter issue
which says "Comment should end in a period (godot)".
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
This commit adds t.Helper() to the test helper
function. With this call go test prints correct
lines of code for failed tests. Otherwise,
printed lines will be inside helpers functions.
For more details check: https://github.com/kulti/thelper
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
* Make kernel version parsing to support more (valid) version strings
* Put version string parsing into a separate, testable function
* Fixes#2248 (Kernel Subversion Parsing Failure)
Signed-off-by: Jonas Zeiger <jonas.zeiger@talpidae.net>
This commit adds capability to `metadata` encryption
to be able to fetch `encryptionPassphrase` from user
specified secret name and namespace(if not specified,
will default to namespace where PVC was created).
This behavior is followed if `secretName` key is found
in the encryption configuration else defaults to fetching
`encryptionPassphrase` from storageclass secrets.
Closes: 2107
Signed-off-by: Rakshith R <rar@redhat.com>
setting metadata in isThickProvisioned method
helps us to avoid checking thick metakey and
deprecated metakey for both thick and thin
provisioned images and also this will easily
help us to migrated the deprecated key to new key.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
isThickProvisioned is already method of the rbdImage
to keep similar thick provisioner related functions
common making isThickProvisioned as method of rbdImage.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
instead of checking the parent is thick provisioned
or not we can decide based on the rbdVol generated
from the request. If the request is to create a Thick
Image. set metadata without checking the parent.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
As image metadata key starting with '.rbd' will not
be copied when we do clone or mirroring, deprecating
the old key for the same reason use
'csi.ceph.com/thick-provisioned' to set image metadata.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Mirror-snapshots can also be automatically created on a
periodic basis if mirror-snapshot schedules are defined.
The mirror-snapshot can be scheduled globally, per-pool,
or per-image levels. Multiple mirror-snapshot schedules
can be defined at any level.
To create a mirror-snapshot schedule with rbd, specify
the mirror snapshot schedule add command along with an
optional pool or image name; interval; and optional start time:
The interval can be specified in days, hours, or minutes
using d, h, m suffix respectively. The optional start-time
can be specified using the ISO 8601 time format. For example:
```
$ rbd --cluster site-a mirror snapshot schedule
add --pool image-pool --image image1 24h 14:00:00-05:00
```
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
New go-ceph admin package api's expects to
pass the rados connection as argument.
added new method called GetRBDAdmin
to get admin connection to administrate
rbd volumes.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit updates controller-runtime to v0.9.2 and
makes changes in persistentvolume.go to add context to
various functions and function calls made here instead of
context.TODO().
Signed-off-by: Rakshith R <rar@redhat.com>
Updated kubernetes packages to latest release.
resizefs package has been included into k8s.io/mount-utils
package. updated code to use the same.
Updates: #1968
Signed-off-by: Rakshith R <rar@redhat.com>
We have many declarations and invocations..etc with long lines which are
very difficult to follow while doing code reading. This address the issues
in 'internal/util' package files to restrict the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We have many declarations and invocations..etc with long lines which are
very difficult to follow while doing code reading. This address the issues
in below files, and restrict the line length to 120 chars.
-internal/rbd/rbd_attach.go
-internal/rbd/rbd_journal.go
-internal/rbd/rbd_util.go
-internal/rbd/replicationcontrollerserver.go
-internal/rbd/snapshot.go
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We have many declarations and invocations..etc with long lines which are
very difficult to follow while doing code reading. This address the issues
in 'internal/rbd/*server.go' and 'internal/rbd/driver.go' files to restrict
the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We have many declarations and invocations..etc with long lines which are
very difficult to follow while doing code reading. This address the issues
in 'internal/journal' package to restrict the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We have many declarations and invocations..etc with long lines which are
very difficult to follow while doing code reading. This address the issues
in 'internal/csi-common' package to restrict the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We have many declarations and invocations..etc with long lines which are
very difficult to follow while doing code reading. This address the issues
in 'internal/controller' package to restrict the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
We have many declarations and invocations..etc
with long lines which are very difficult to follow while doing
code reading. This address the issues in 'internal/cephfs' package to
restrict the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Added helper func isNotMountPoint to check mountPoint,
validate error and reduce complexity of NodeStageVolume.
Signed-off-by: Rakshith R <rar@redhat.com>
This commit adds checks for missing `imageFeatures` parameter
in createvolumerequest and nodestagerequest(only for static PVs).
Missing `imageFeatures` parameter is ignored in case of non-static
PVs to ensure backwards compatibility with older versions which
did not have `imageFeatures` as required parameter.
Signed-off-by: Rakshith R <rar@redhat.com>
The Go linter paralleltest checks that the t.Parallel
gets called for the test method and for the range of
test cases within the test.
Updates: #2025
Signed-off-by: Yati Padia <ypadia@redhat.com>
While cleaning up snapshots, not all object may exist after a partial
provisioning attempt. In case objects are missing, do not try to delete
them.
Fixes: #2192
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When cloning a volume from a (CSI) snapshot, we use DeepCopy() and do
not need an RBD snapshot as source.
Suggested-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case restoring a snapshot of a thick-PVC failed during DeepCopy(),
the image will exist, but have partial contents. Only when the image has
the thick-provisioned metadata set, it has completed DeepCopy().
When the metadata is missing, the image is deleted, and an error is
returned to the caller. Kubernetes will automatically retry provisioning
on the ABORTED error, and the restoring will get restarted from the
beginning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Wrapcheck is a simple Go linter to check that errors
from external packages are wrapped during return to
help identify the error source during debugging.
This commit addresses the wrapcheck error
Updates:#2025
Signed-off-by: Yati Padia <ypadia@redhat.com>
the parent volume(CreateVolume) and the clone volume
(CreateSnapshot) are both indepedent and parent volume
can be deleted anytime. To check the thick provision
during Snapshot restore(CreateVolume from snapshot)
we need the thick provision metadata so for the same
reason setting the thick provision metadata on the
clone image we are creating at the CreateSnapshot time.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
RbdSnapName holds the actual RBD image name which
got created during the CreateSnapshot operation.
RbdImageName holds the name of the parent from
which the snapshot is created. and the parent
is independent of snapshot and it can be deleted
any time for the same reason using the RbdSnapName
to check the rbd image details.
generate a temporary volume from the snapshot which
replaces the rbdImageName with RbdSnapName and use
it to check the image metadata.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added validation to allow only Restore of Thick PVC
snapshot to a thick clone and creation of thick clone
from thick PVC.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
isThickProvisioned can be used for both snapshot
and clone validation if isThickProvisioned is method
of common rbdImage structure.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In case cloning a thick-PVC failed during DeepCopy(), the image will
exist, but have partial contents. Only when the image has the
thick-provisioned metadata set, it has completed DeepCopy().
When the metadata is missing, the image is deleted, and an error is
returned to the caller. Kubernetes will automatically retry provisioning
on the ABORTED error, and the cloning will get restarted from the
beginning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Not all Linux kernels support the deep-flatten feature. Disabling the
feature makes it possible to map RBD images on older kernels (like what
minikube uses).
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Thick-provisioned images are independent, cloned images or snapshots are
deep-flattened during creation. There is no need to try and flatten them
again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
To create a full-allocated RBD image from a snapshot/clone DeepCopy()
can be used. This is needed when the parent of the new volume is
thick-provisioner, so that the new volume is independent of the parent
and thick-provisioned as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
isCompatibleEncryption is used to validate the
requested volume and the existing volume and
the destination volume name wont be generated yet
and logging the destination volume prints the empty
image name with pool name.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
actual error will be present in the stdErr not the error
when we try to add a task to flatten the rbd image. This
commits corrects the error checking when the image does
not have a parent.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit addresses the following issue:
'nolint:gocyclo // complexity needs to be reduced.'
is unused for linter "gocyclo" (nolintlint)
Updates:#2025
Signed-off-by: Yati Padia <ypadia@redhat.com>
This commit returns actual error returned by the go-ceph API
to the function GetPoolName(..) instead of just returning
ErrPoolNotFound everytime there is error getting the pool id.
There is a issue reported in which the snapshot creation
takes much more time to reach True state
(i.e., between 2-7 mins) and keeps trying to create with
below error though pool is present:
rpc error: code = NotFound desc = pool not found: pool ID (21)
not found in Ceph cluster.
Since we cannot interpret the actual error for the delay in
snapshot creation, it is required to return the actual error
as well so that we can uderstand the reason.
Signed-off-by: Yati Padia <ypadia@redhat.com>
While traversing image chain, the parent
image can be present in a different pool
that the one child is in. So, updating
pool name in the next itteration to
that of the Parent.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
Validate Snapshot request to check if the
passed pool name is not empty.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
if the parent and child clones are in
different namespaces we need to open a new
ioctx for pools.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
when clones are created in different pool,we
need to retrieve the parent pool to get the
information of the parent image.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
as we are supporting the creation of clone to a new
pool we need to pass the correct parent volume
to cleanup the snapshot on parent volume.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
added support to create image in different pool.
if the snapshot/rbd image exists in one pool we
can create a clone the clone of the rbd image to
a different pool.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
In golang world, if you split an empty string that does not contain
the seperator, you get an array with one empty string. This results
in volumes failing to mount with "invalid feature " (note extra space
because it's trying to check if 'empty string' is a valid feature).
This patch checks if the string is empty, and if so, it just decides
to skip the entire validation and returning nothing.
Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>
CreateVolume will fail in below cases
* If the snapshot is encrypted and requested volume
is not encrypted
* If the snapshot is not encrypted and requested
volume is encrypted
* If the parent volume is encrypted and requested volume
is not encrypted
* If the parent volume is not encrypted and requested
volume is encrypted
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
if it is an error from the IsMountPoint
function and the error is not IsNotExist return
it as a internal server error.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently we are relaying on the error output from
the umount command we run on the nodes when mounting
the volume but we are not checking for all the error
message to verify the volume is mounted or not.
This commits uses IsMountPoint function in util
to check the mountpoint.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as callers are already taking care of returing
the GRPC error code return the actual error
from the IsMountPoint function.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
when using pre-provisioned volumes, pass these parameters:
- kernelMountOptions
- fuseMountOptions
- subVolumeGroup
in spec.csi.volumeAttributes in PV declaration
Signed-off-by: YingshuoTao <frigid.blues@gmail.com>
Move the repairing of a volume/snapshot from CreateVolume to its own
function. This reduces the complexity of the code, and makes the
procedure easier to understand. Further enhancements to repairing an
exsiting volume can be done in the new function.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit addresses ifshort linter issues which
checks if short syntax for if-statements is possible.
updates: #1586
Signed-off-by: Rakshith R <rar@redhat.com>
This commit removes calling of .String() when logging
since `%s`,`%v` or `%q` will call an existing .String() function
automatically.
Fixes: #2051
Signed-off-by: Rakshith R <rar@redhat.com>
The NodeGetVolumeStats procedure can now be used to fetch the capacity
of the RBD block-device. By default this is a thin-provisioned device,
which means that the capacity is not reserved in the Ceph cluster. This
makes it possible to over-provision the cluster.
In order to detect the amount of storage used by the RBD block-device
(when thin-provisioned), it is required to connect to the Ceph cluster.
Unfortunately, the NodeGetVolumeStats CSI procedure does not provide
enough parameters to connect to the Ceph cluster and fetch more details
about the RBD image.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The CephFS NodeServer should handle the CephFS specific requests. This
is not something that the NodeServer for RBD should handle.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
flatten the image if the deep-flatten feature
is present on the images in the chain or if the
images in chain is not zero, as we cannot check
the deep-flatten feature the images which are
in trash.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
For flatten we call checkImageChainHasFeature
which internally calls to getImageInfo returns
the parent name even if the parent is in the trash,
when we try to open the parent image to get its
information it fails as the image not found.
we should treat error as nil if the parent is not found.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The new gosec 2.7.0 complains like:
G304 (CWE-22): Potential file inclusion via variable (Confidence: HIGH, Severity: MEDIUM)
Updates: #2025
Signed-off-by: Niels de Vos <ndevos@redhat.com>
To recover from split brain (up+error) state the image need to be
demoted and requested for resync on site-a and then the image on site-b
should gets demoted.The volume should be marked to ready=true when the
image state on both the clusters are up+unknown because during the last
snapshot syncing the data gets copied first and then image state on the
site-a changes to up+unknown.
If the image state on both the sites are up+unknown consider that
complete data is synced as the last snapshot
gets exchanged between the clusters.
* create 10 GB of file and validate the data after resync
* Do Failover when the site-a goes down
* Force promote the image and write data in GiB
* Once the site-a comes back, Demote the image and issue resync
* Demote the image on site-b
* The status will get reflected on the other site when the last
snapshot sync happens
* The image will go to up+unknown state. and complete data will
be copied to site a
* Promote the image on site-a and use it
```bash
csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006:
global_id: e7f9ec55-06ab-46cb-a1ae-784be75ed96d
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:11:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:11:41
```
* Do Failover when the site-a goes down
* Force promote the image on site-b and write data in GiB
* Demote the image on site-b
* Once the site-a comes back, Demote the image on site-a
* The images on the both site will go to split brain state
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+error
description: split-brain
service: a on minicluster2
last_update: 2021-04-28 07:25:41
peer_sites:
name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph
state: up+error
description: split-brain
last_update: 2021-04-28 07:25:26
```
* Issue resync
* The images cannot be resynced because when we issue resync
on site a the image on site-b was in demoted state
* To recover from this state (promote and then demote the
image on site-b after sometime)
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:32:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:32:41
```
* Once the data is copied we can see that the image state
is moved to up+unknown on both sites
* Promote the image on site-a and use it
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
when a Snapshot is encrypted during a CreateSnapshot
operation, the encryption key gets created in the KMS
when we delete the Snapshot the key from the KMS
should also gets deleted.
When we create a volume from snapshot we are copying
required information but we missed to copy the
encryption information, This commit adds the missing
information to delete the encryption key.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
At present we return the volume connect error if the clone
from snapshot fails when rbdvolume is encrypted, which is incorrect.
This patch correctly return the failed copy encryption error to the
caller
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
Before RBD map operation, we do check the
watchers on the RBD image. In the case of
RWO volume. cephcsi makes sure only one
client is using the RBD image. If the rbd
image is mirrored, by default mirroring
daemon will add a watcher on the image
and as we are using go-ceph a watcher will
be added as we have opened the image So
we will have two watchers on an image if
mirroring is enabled. This holds when the
rbd mirror daemon is running, In case if
the mirror daemon is not running there will
be only one watcher on the rbd image
(which is placed by go-ceph image open)
we should not block the map operation if
the mirroring daemon is not running as
its Async mirroring. This commit adds a
check to make sure no more than 2 watchers
if the image is mirrored or no more than 1
watcher if it is not mirrored image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Since rbdImage is a common struct for
rbdVolume and rbdSnapshot, it description
was matching to only snapshot.
This commit makes the comments generic for
both volumes and snapshots.
Signed-off-by: Yug <yuggupta27@gmail.com>
By default, the write buffer size in libfuse2 is 2KiB
`fuse_big_writes = true` option is used to override this limit.
This commit makes `fuse_big_writes = true` option as default
in ceph.conf.
Closes: #1928
Signed-off-by: Rakshith R <rar@redhat.com>
If the pool or few keys are missing in the omap.
GetImageAttributes function returns nil error message and few
empty items in imageAttributes struct. if the image is not
found and the entiries are missing use
the volumeId present on the PV annotation for further operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase if the image is promoted and demoted the
image state will be set to up+unknown if the image
on the remote cluster is still in demoted state.
when user changes the state from primary to secondary
and still the image is in demoted (secondary) state
in the remote cluster. the image state on both the cluster
will be on unknown state.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It helps to get a stack trace when debugging issues. Certain things are
considered bugs in the code (like missing attributes in a struct), and
might cause a panic in certain occasions.
In this case, a missing string will not panic, but the behaviour will
also not be correct (DEKs getting encrypted, but unable to decrypt).
Clearly logging this as a BUG is probably better than calling panic().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It is possible that when a provisioner restarts after a snapshot was
cloned, but before the newly restored image had its encryption metadata
set, the new image is not marked as encrypted. This will prevent
attaching/mounting the image, as the encryption key will not be fetched,
or is not available in the DEKStore.
By actively repairing the encryption configuration when needed, this
problem should be addressed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>