Whenever Ceph-CSI receives a CSI/Replication
request it will first decode the
volumeHandle and try to get the required
OMAP details if it is not able to
retrieve, receives a `Not Found` error
message and Ceph-CSI will check for the
clusterID mapping. If the old volumeID
`0001-00013-site1-storage-0000000000000001
-b0285c97-a0ce-11eb-8c66-0242ac110002`
contains the `site1-storage` as the clusterID,
now Ceph-CSI will look for the corresponding
clusterID `site2-storage` from the above configmap.
If the clusterID mapping is found now Ceph-CSI
will look for the poolID mapping ie mapping between
`1` and `2`. Example:- pool with name exists on
both the clusters with different ID's Replicapool
with ID `1` on site1 and Replicapool with ID `2`
on site2. After getting the required mapping Ceph-CSI
has the required information to get more details
from the rados OMAP. If we have multiple clusterID mapping
it will loop through all the mapping and checks the
corresponding pool to get the OMAP data. If the clusterID
mapping does not exist Ceph-CSI will return an `Not Found`
error message to the caller.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added helper function to read the clusterID mapping
from the mounted file.
The clusterID mapping contains below mappings
* ClusterID mappings (to cluster to which we are failingover
and from which cluster failover happened)
* RBD PoolID mapping of between the clusters.
* CephFS FscID mapping between the clusters.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit uses trailer-exists instead
of signed-off-by to verify the sign-off-by
message.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Suggested-by: Ade Attwood
The VAULT_AUTH_MOUNT_PATH is a Vault configuration parameter that allows
a user to set a non default path for the Kubernetes ServiceAccount
integration. This can already be configured for the Vault KMS, and is
now added to the Vault Tenant SA KMS as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new `vaultAuthNamespace` configuration parameter can be set to the
Vault Namespace where the authentication is setup in the service. Some
Hashicorp Vault deployments use sub-namespaces for their users/tenants,
with a 'root' namespace where the authentication is configured. This
requires passing of different Vault namespaces for different operations.
Example:
- the Kubernetes Auth mechanism is configured for in the Vault
Namespace called 'devops'
- a user/tenant has a sub-namespace called 'devops/website' where the
encryption passphrases can be placed in the key-value store
The configuration for this, then looks like:
vaultAuthNamespace: devops
vaultNamespace: devops/homepage
Note that Vault Namespaces are a feature of the Hashicorp Vault
Enterprise product, and not part of the Open Source version. This
prevents adding e2e tests that validate the Vault Namespace
configuration.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
- mount host's /etc/selinux in node plugins
- process mount options in all code paths for cephfs volume options
Signed-off-by: Alexandre Lossent <alexandre.lossent@cern.ch>
expandPVCSize() uses the namespace of the PVC that was checked. In case
the .Get() call fails, the PVC will not have its namespace set, and
subsequent tries will fail with errors like:
Error getting pvc in namespace: '': etcdserver: request timed out
waiting for PVC (9 seconds elapsed)
Error getting pvc in namespace: '': an empty namespace may not be set when a resource name is provided
By using the original namespace of the PVC stored in a separate variable
as is done with the name of the PVC, this problem should not occur
anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
currently PR merging is blocked due to
commitlint issue. disabling commitlint
or the release branches now. more details at
https://github.com/ceph/ceph-csi/pull/2342
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In case listing the Kubernetes Services fails, the following error is
returned immediately:
failed to create configmap with error failed to list services: etcdserver: request timed out
Wrapping the listing of the Services in a PollImmediate() routine, adds
a retry in case of common temporary issues.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
added design doc to handle volumeID mapping in case
of the failover in the Disaster Recovery.
update #2118
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
registry.centos.org is not officially maintained by the CentOS
infrastructure team. The container images on quay.io are the official
once and we should use those instead.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit uses `string.SplitN` instead of `string.Split`.
The path for pids.max has extra `:` symbols in it due to which
getCgroupPidsFile() splits the string into 5 tokens instead of
3 leading to loss of part of the path.
As a result, the below error is reported:
`Failed to get the PID limit, can not reconfigure: open
/sys/fs/cgroup/pids/system.slice/containerd.service/
kubepods-besteffort-pod183b9d14_aed1_4b66_a696_da0c738bc012.slice/pids.max:
no such file or directory`
SplitN takes an argument n and splits the string
accordingly which helps us to get the desired
file path.
Fixes: #2337
Co-authored-by: Yati Padia <ypadia@redhat.com>
Signed-off-by: Yati Padia <ypadia@redhat.com>
Until we have a real fix, just to avoid occasionally file system entering
into read-only on nodeplugin restart, lets sync data from the application
pod.
Updates: #2204
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
commitlint 13.1.0 is causing issues when
PR is backported from devel branch to release
branch
https://github.com/ceph/ceph-csi/pull/2332#issuecomment-888325775
Lets revert back to commitlint 12.1.4 where we have
not seen any issue with backports to release
branch.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
There are reports where CephFS deploying failed with etcdserver
timeouts:
INFO: Running '/usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -'
INFO: rc: 1
FAIL: failed to create CephFS provisioner rbac with error error running /usr/bin/kubectl --server=https://192.168.39.187:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-ea434921 create --namespace=cephcsi-e2e-ea434921 -f -:
Command stdout:
role.rbac.authorization.k8s.io/cephfs-external-provisioner-cfg created
rolebinding.rbac.authorization.k8s.io/cephfs-csi-provisioner-role-cfg created
stderr:
Error from server: error when creating "STDIN": etcdserver: request timed out
Error from server: error when creating "STDIN": etcdserver: request timed out
Error from server: error when creating "STDIN": etcdserver: request timed out
error:
exit status 1
By using retryKubectlInput() helper function, a retry will be done, and
the failure should not be fatal any longer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Mainly removed rbd-nbd mounter specified at the pre-upgrade
considerations affecting the restarts.
Also updated the 3.3 tags to 3.4
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The new `backport-to-release-v3.4` label can be added to PRs and Mergify
will create a backport once the PR for the devel branch has been merged.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Currently we have a bug that we are not using rados
namespace when adding ceph manager command to
remove the image from the trash. This commit
adds the missing rados namespace when adding
ceph manager task.
without fix the image will be moved to trash
and no task will be added to remove from the
trash. it will become ceph responsibility to
remove the image from trash when it will cleanup
the trash.
workaroud: manually purge the trash
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Add documenation for Disaster Recovery
which steps to Failover and Failback in case
of a planned migration or a Disaster.
Signed-off-by: Yug Gupta <yuggupta27@gmail.com>
RBD image metadata keys that start with '.rbd' are expected to be
internal to RBD itself and are not mirrored to remote sites. Renaming
the keys (dropping the '.' prefix) and using the new MigrateMetadata()
function now makes the keys available on remote sites too.
Closes: #2219
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new MigrateMetadata() function can be used to get the metadata of an
image with a deprecated and new key. Renaming metadata keys can be done
easily this way.
A default value will be set in the image metadata when it is missing
completely. But if the deprecated key was set, the data is stored under
the new key and the deprecated key is removed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Currently, getImageMirroringStatus() is using RBD CLI.
This commit converts RBD CLI to go-ceph API.
Fixes: #2120
Signed-off-by: Yati Padia <ypadia@redhat.com>
When running 'make containerized-test' the following error gets
reported:
yamllint -s -d '{extends: default, rules: {line-length: {allow-non-breakable-inline-mappings: true}},ignore: charts/*/templates/*.yaml}' ./scripts/golangci.yml
./scripts/golangci.yml
179:81 error line too long (84 > 80 characters) (line-length)
The golangci.yml.in is used to generate golangci.yml, addressing the
line-length there resolves the issue.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
framework.RunKubectl() returns an error that does not end with
"etcdserver: request timed out", but contains the text somewhere in the
middle:
error running /usr/bin/kubectl --server=https://192.168.39.57:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-a44ec4b4 create -f -:
Command stdout:
stderr:
Error from server: error when creating "STDIN": etcdserver: request timed out
error:
exit status 1
isRetryableAPIError() should return `true` for this case as well, so
instead of using HasSuffix(), we'll use Contains().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Previously in ControllerExpandVolume() we had a check for encrypted
volumes and we use to fail for all expand requests on an encrypted
volume. Also for Block VolumeMode PVCs NodeExpandVolume used to be
ignored/skipped.
With these changes, we add support for the expansion of encrypted volumes.
Also for raw Block VolumeMode PVCs with Encryption we call NodeExpandVolume.
That said,
With LUKS1, cryptsetup utility doesn't prompt for a passphrase on resizing
the crypto mapper device. This is because LUKS1 devices don't use kernel
keyring for volume keys.
Whereas, LUKS2 devices use kernel keyring for volume key by default, i.e.
cryptsetup utility asks for a passphrase if it detects volume key was
previously passed to dm-crypt via kernel keyring service, we are overriding
the default by --disable-keyring option during cryptsetup open command.
So that at the time of crypto mapper device resize we will not be
prompted for any passphrase.
Fixes: #1469
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
With Luks1 device:
$ cryptsetup status /dev/mapper/crypto-rbd0
/dev/mapper/crypto-rbd0 is active and is in use.
type: LUKS1
cipher: aes-xts-plain64
keysize: 512 bits
key location: dm-crypt
device: /dev/rbd0
sector size: 512
offset: 4096 sectors
size: 4190208 sectors
mode: read/write
With Luks2 device:
$ cryptsetup status /dev/mapper/crypto-rbd0
/dev/mapper/crypto-rbd0 is active and is in use.
type: LUKS2
cipher: aes-xts-plain64
keysize: 512 bits
key location: dm-crypt
device: /dev/rbd0
sector size: 512
offset: 32768 sectors
size: 4161536 sectors
mode: read/write
This could lead to failures with unmap in the NodeUnstageVolume path
for the encrypted volumes.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This commit modifies the error of godot, cyclop,
paralleltest linter caused due to merged PRs.
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
This commit disables the forbidigo linter as
this linter forbids the use of fmt.Printf
but we need to use it in various part of
our codebase.
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
This commit disables the exhaustivestruct linter
as it is meant to be used only for special cases.
We don't need to enable this for our project.
Fixes: #2224
Signed-off-by: Yati Padia <ypadia@redhat.com>
This PR updates the static check tools to
the latest version.
Further needs to resolve all the errors after
updating the version.
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>