Commit Graph

3663 Commits

Author SHA1 Message Date
Niels de Vos
b132696e54 rbd: note that thick-provisioning is deprecated
Thick-provisioning was introduced to make accounting of assigned space
for volumes easier. When thick-provisioned volumes are the only consumer
of the Ceph cluster, this works fine. However, it is unlikely that this
is the case. Instead, accounting of the requested (thin-provisioned)
size of volumes is much more practical as different types of volumes can
be tracked.

OpenShift already provides cluster-wide quotas, which can combine
accounting of requested volumes by grouping different StorageClasses.

In addition to the difficult practise of allowing only thick-provisioned
RBD backed volumes, the performance makes thick-provisioning
troublesome. As volumes need to be completely allocated, data needs to
be written to the volume. This can take a long time, depending on the
size of the volume. Provisioning, cloning and snapshotting becomes very
much noticeable, and because of the additional time consumption, more
prone to failures.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-27 06:54:07 +00:00
Madhu Rajanna
0838845c6a cleanup: remove FIXME from ResyncVolume
as the complexity of ResyncVolume is
reduced removing the FIXME which is not valid
anymore.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-26 12:00:36 +00:00
Madhu Rajanna
2017b8c621 rbd: log mirror daemon state for replication
log the mirror deamon state in the local and
remote cluster for better debugging.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-26 12:00:36 +00:00
Madhu Rajanna
7472338334 rbd: remove unwanted const
for comparing the image states use the states
defined in the go-ceph avoid creating of the
deplicate const in cephcsi.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-26 12:00:36 +00:00
Madhu Rajanna
b92a6f5ccb rbd: log the remote site details during resync
logging the remote site details during resyncing
for better debugging.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-26 12:00:36 +00:00
Madhu Rajanna
1fd2f28fee rbd: check local image state for resyncing
below are the local states of the mirrored image

"unknown"  -> If the image is in an error state
means data is completely synced
"error" -> If the image is in an error state
means it needs resync
"syncing"
"starting_replay"
"replaying"
"stopping_replay"
"stopped"

If the resync is successfully started which
means the image will be in "replaying" state.
we can consider "replaying" state to report
resync succesfully going on state.

we are discarding the intermediate states like
"syncing", "starting_replay" and "stopping_replay".

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-26 12:00:36 +00:00
dependabot[bot]
c8e78089f7 rebase: bump github.com/aws/aws-sdk-go from 1.41.5 to 1.41.10
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.41.5 to 1.41.10.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.41.5...v1.41.10)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-26 06:55:16 +00:00
Rakshith R
41d894f98a e2e: add test cases for EnsureImageCleanup
This tests pvc,pvcsmartclone,snapshot deletion when
underlying images are in trash.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-20 18:25:31 +00:00
Rakshith R
12cd05a408 rbd: add EnsureImageCleanup to snapshot deletion
Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-20 18:25:31 +00:00
Rakshith R
1849076aab rbd: add EnsureImageCleanup to ensure image cleanup from trash
After moving moving image to trash, if `trash remove` step fails,
then external-provisioner will issue subsequent requests, in which
image will be absent in pool( will be in trash) and omap cleanup will
be done with stale image left in trash with no `trash remove` step on it.

To avoid this scenario list trash images and find corresponding id for given
image name and add a task to flatten when we encounter a ErrImageNotFound.

Fixes: #1728

Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-20 18:25:31 +00:00
dependabot[bot]
5280b67327 rebase: bump github.com/hashicorp/vault/api from 1.1.1 to 1.2.0
Bumps [github.com/hashicorp/vault/api](https://github.com/hashicorp/vault) from 1.1.1 to 1.2.0.
- [Release notes](https://github.com/hashicorp/vault/releases)
- [Changelog](https://github.com/hashicorp/vault/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/vault/compare/v1.1.1...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/vault/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-20 13:57:39 +00:00
Niels de Vos
9bd9f5e91d rebase: update github.com/hashicorp/vault/sdk to latest
The github.com/hashicorp/vault/sdk was listed in the replace section,
most likely because using a newer version failed. By adding a missing
tagged version to the `exclude` section in go.mod, updating the package
works fine.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-20 13:57:39 +00:00
Niels de Vos
6d3e25f069 util: NodeGetVolumeStatsResponse.Usage may not contain negative values
Following the CSI specification, values that are included in the
VolumeUsage MUST NOT be negative. However, CephFS seems to return -1 for
the number of inodes that are available. Instead of returning a
negative value, set it to 0 so that it will not get included in the
encoded JSON response.

Updates: #2579
See-also: 5b0d454015/spec.md (L2477-L2487)
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-20 07:18:48 +00:00
dependabot[bot]
6ffb91c047 rebase: bump github.com/aws/aws-sdk-go from 1.41.0 to 1.41.5
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.41.0 to 1.41.5.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.41.0...v1.41.5)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 17:52:53 +00:00
dependabot[bot]
a66012a5d4 rebase: bump github.com/ceph/go-ceph from 0.11.0 to 0.12.0
Bumps [github.com/ceph/go-ceph](https://github.com/ceph/go-ceph) from 0.11.0 to 0.12.0.
- [Release notes](https://github.com/ceph/go-ceph/releases)
- [Changelog](https://github.com/ceph/go-ceph/blob/master/docs/release-process.md)
- [Commits](https://github.com/ceph/go-ceph/compare/v0.11.0...v0.12.0)

---
updated-dependencies:
- dependency-name: github.com/ceph/go-ceph
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-19 13:27:19 +00:00
Robert Vasek
fedbb01ec3 doc: add proposal doc for CephFS snapshots as shallow RO volumes
This patch adds a proposal document for "CephFS snapshots
as shallow RO volumes".

Updates: #2142
Signed-off-by: Robert Vasek <robert.vasek@cern.ch>
2021-10-19 11:35:02 +00:00
Niels de Vos
85c84910d3 e2e: add a monitor container to the vault Pod
The command `vault monitor` can be used to stream logging from the Vault
service. This is very helpful while debugging Vault configuration
failures.

By adding a 2nd container to the Vault deployment, it is now possible to
get the messages from the Vault service by running

    $ kubectl logs -c monitor <vault-pod-0123abcd>

This will be very useful when the e2e tests do not delete the deployment
after a failure and fetch the logs from all containers.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-19 03:37:42 +00:00
Madhu Rajanna
0d51f6d833 rbd: check local image description for split-brain
In some corner case like `re-player shutdown` the
local image will not be in error state. It would
be also worth considering `description` field to
make sure about split-brain.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-18 11:22:03 +00:00
dependabot[bot]
7c4b29bd57 rebase: bump sigs.k8s.io/controller-runtime from 0.10.1 to 0.10.2
Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.10.1 to 0.10.2.
- [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases)
- [Commits](https://github.com/kubernetes-sigs/controller-runtime/compare/v0.10.1...v0.10.2)

---
updated-dependencies:
- dependency-name: sigs.k8s.io/controller-runtime
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-18 07:17:30 +00:00
Niels de Vos
cff0e04e3c build: remove unneeded empty YAML document from deployment artifacts
The generated files under the deploy/ directory contain an empty YAML
document that may cause confusion for some versions of kubectl. Dropping
the unneeded `---` start of the file for the header should make parsing
of the deployment artifacts a little less error prone.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-15 16:08:59 +00:00
Niels de Vos
c443320126 deploy: move rbd/ceph-csi-config ConfigMap to API
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-15 16:08:59 +00:00
Niels de Vos
584d43a132 deploy: move rbd/CSIDriver to API
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-15 16:08:59 +00:00
Rakshith R
f9c369918c ci: disable rook deployed csi drivers to speed up e2e
Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-15 11:15:51 +00:00
Rakshith R
0abd2e785c e2e: increase E2E_TIMEOUT to 120m
This commit increases E2E_TIMEOUT to 120m, to avoid
frequent test fails due to timeout.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-15 08:35:07 +00:00
dependabot[bot]
3934599b0e rebase: bump github.com/onsi/ginkgo from 1.16.4 to 1.16.5
Bumps [github.com/onsi/ginkgo](https://github.com/onsi/ginkgo) from 1.16.4 to 1.16.5.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v1.16.4...v1.16.5)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-13 09:21:21 +00:00
dependabot[bot]
574852e27c rebase: bump github.com/aws/aws-sdk-go from 1.40.55 to 1.41.0
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.40.55 to 1.41.0.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.40.55...v1.41.0)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-12 07:18:32 +00:00
Humble Chirammal
819f4f9048 e2e: adjust migration tests to use clusterID in the volume context
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-11 10:06:30 +00:00
Humble Chirammal
c584fa20da rbd: use clusterID from volumeContext at nodestage
previously we were retriving clusterID using the monitors field
in the volume context at node stage code path. however it is possible to
retrieve or use clusterID directly from the volume context. This
commit also remove the getClusterIDFromMigrationVolume() function
which was used previously and its tests

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-11 10:06:30 +00:00
Humble Chirammal
4e61156dc4 rbd: change iteration variable name in the migration test to be specific
we reuse or overload the variable name in the test execution at present.
This commit use a different variable name as initialized in each run

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-11 10:06:30 +00:00
Prasanna Kumar Kalever
a01b9821ee e2e: set rbd-nbd mounter tests cephLogStrategy to preserve
This is to preserve the rbd-nbd logs post unmap, so that the CI can dump
the available logs from logdir.

Fixes: #2451
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-10-08 14:00:42 +00:00
Madhu Rajanna
90ecd2d7e8 rbd: use go-ceph to get mirroring info
use go-ceph api to get image mirroring info.

closes #2558

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-07 08:02:06 +00:00
dependabot[bot]
b9beb2106b rebase: bump github.com/aws/aws-sdk-go from 1.40.50 to 1.40.55
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.40.50 to 1.40.55.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.40.50...v1.40.55)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-10-06 17:28:25 +00:00
Madhu Rajanna
8ebc0659ab rbd: perform resize of file system for static volume
For static volume, the user will manually mounts
already existing image as a volume to the application
pods. As its a rbd Image, if the PVC is of type
fileSystem the image will be mapped, formatted
and mounted on the node,
If the user resizes the image on the ceph cluster.
User cannot not automatically resize the filesystem
created on the rbd image. Even if deletes and
recreates the kubernetes objects, the new size
will not be visible on the node.

With this changes During the NodeStageVolumeRequest
the nodeplugin will check the size of the mapped rbd
image on the node using the devicePath. and also
the rbd image size on the ceph cluster.

If the size is not matching it will do the file
system resize on the node as part of the
NodeStageVolumeRequest RPC call.

The user need to do below operation to see new size
* Resize the rbd image in ceph cluster
* Scale down all the application pods using the static
PVC.
* Make sure no application pods which are using the
static PVC is running on a node.
* Scale up all the application pods.

Validate the new size in application pod mounted
volume.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-06 13:15:00 +00:00
Madhu Rajanna
fe9020260d rbd: move flattening to helper function
in NodeStage operation we are flattening
the image to support mounting on the older
clients. this commits moves it to a helper
function to reduce code complexity.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-06 13:15:00 +00:00
Madhu Rajanna
cda2abca5d rbd: use NewMetricsBlock to get size
instead of lsblk command use NewMetricsBlock
function from the kubernetes package to get
the size.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-10-06 13:15:00 +00:00
Niels de Vos
97525f5e74 ci: add make go-test-api to GitHub Action
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-05 11:26:50 +00:00
Niels de Vos
5ea99fdd5b build: add yamlgen to build deployment files
This initial version of yamlgen generates deploy/scc.yaml based on the
deployment artifact that is provided by the new api/deploy/ocp package.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-05 11:26:50 +00:00
Niels de Vos
36e099d939 deploy: add API for getting OpenShift SecurityContextConstraints
This new Go module allows other projects (like Rook) to consume
deployment details (the SCC to get started) directly from Ceph-CSI.

Based-on: https://github.com/rook/rook/blob/3c93eb3/cluster/examples/kubernetes/ceph/operator-openshift.yaml#L47-L90
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-10-05 11:26:50 +00:00
Rakshith R
f60b097f5f e2e: add testcase for thick encrypted PVC restore
Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-05 07:46:57 +00:00
Rakshith R
ded75eb099 rbd: copyEncryptionConfig for thickProvisioned snap restore too
This commit adds bugfix to copy encryption passphrase for thick
provisioned PVC restored from snapshot.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-05 07:46:57 +00:00
Rakshith R
b471cac6bd e2e: add nolint:param to retryKubectlArgs
Currently only kubectlCreate arg is used with retryKubectlArgs(),
But it maybe used later on.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-05 07:46:57 +00:00
Rakshith R
dac4e76ae1 e2e: add testcase for PVC restore from vaultKMS to vaultTenantSAKMS
Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-05 07:46:57 +00:00
Rakshith R
f63ed2ca5a e2e: modify validatePVCSnapshot() to use restoreSCName & restoreKMS
Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-05 07:46:57 +00:00
Rakshith R
59b7a26175 rbd: modify copyEncryptionConfig to accept copyOnlyPassphrase arg
During PVC snapshot/clone both kms config and passphrase needs to copied,
while for PVC restore only passphrase needs to be copied to dest rbdvol
since destination storageclass may have another kms config.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-10-05 07:46:57 +00:00
Humble Chirammal
3c9d7e3cd5 rbd: detect migration volID in DeleteVolume() and delete rbd image
This commit adds the logic to detect a passed in volumeID
is a migrated volume ID and if yes, the driver connect to the
backend cluster and clean/delete the image. The logic
only applied if its a migration volume ID. The migration volume ID
carry the information like mons, pool and image name which is
good enough for the driver to identify and connect to the backend
cluster for its operations.

migration volID format:
<mig>_mons-<monsHash>_image-<imageUID>_<poolHash>

Details on the hash values:

* MonsHash: this carry a hash value (md5sum) which will be acted as the
`clusterID` for the operations in this context.

* ImageUID: this is the unique UUID generated by kubernetes for the created
volume.

* PoolHash: this is an encoded string of pool name.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-04 16:06:31 +00:00
Humble Chirammal
b778fe51a4 e2e: add test for migration volID detection and delete of image
This commit add test for migration delete volID detection scenario
by passing a custom volID and with the entries in configmap changed
to simulate the situation. The staticPV function also changed its
accept the annotation map which make it more general usage.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-04 16:06:31 +00:00
Humble Chirammal
1171111a94 e2e: deletePodWithLabel fails on unparam linter
this commit address the unparam linter error on deletePodWithLabel
function.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-04 16:06:31 +00:00
Humble Chirammal
a4a2dc93c1 e2e: change createCustomConfigmap to be more general
createCustomConfigmap helps to create a custom cluster entry in
the configmap, however this was coupled with subvolumegroup filling
in the cluster configuration. This commit helps to make it more
general and the subvolumegroup filling is controlled now with a flag

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-10-04 16:06:31 +00:00
Rakshith R
90d246fc55 doc: update dev standup meeting link
Signed-off-by: Rakshith R <rar@redhat.com>
2021-09-30 13:27:11 +00:00
Niels de Vos
dfc8f64bdd ci: require passing of k8s-e2e-external-storage jobs for merge
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-09-29 12:45:02 +05:30