Commit Graph

2012 Commits

Author SHA1 Message Date
Niels de Vos
4bcc934873 cleanup: address pylint "consider-using-with" in tracevol.py
pylint started to report errors like the following:

    troubleshooting/tools/tracevol.py:97:10: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)

There probably has been an update of Pylint in the test-container that
is more strict than previous versions.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
(cherry picked from commit 8447a1feab)
2021-09-08 09:13:38 +00:00
Rakshith R
e8520edf8d ci: add support to create extra disks through minikube
Signed-off-by: Rakshith R <rar@redhat.com>
(cherry picked from commit 1b64a0a505)
2021-09-08 09:13:38 +00:00
Rakshith R
b6d1cc2317 rebase: update minikube to v1.23.0
See-also: https://github.com/kubernetes/minikube/releases/tag/v1.23.0

Signed-off-by: Rakshith R <rar@redhat.com>
(cherry picked from commit 08c10c9f94)
2021-09-08 09:13:38 +00:00
Rakshith R
1948dce69b ci: internally create & delete cephcsi namespace in install-helm.sh
This ensures the kubectl call is retried with kubectl_retry function.

Updates: #2309

Signed-off-by: Rakshith R <rar@redhat.com>
(cherry picked from commit 7fba62dd47)
2021-08-11 11:02:46 +00:00
Rakshith R
c9eb7bce7c ci: use kubectl_retry in install_helm.sh script
Signed-off-by: Rakshith R <rar@redhat.com>
(cherry picked from commit eb8c1cd5ab)

# Conflicts:
#	scripts/install-helm.sh
2021-08-11 11:02:46 +00:00
Rakshith R
661602d731 ci: modify kubectl_retry() to handle NotFound on delete cmd
Signed-off-by: Rakshith R <rar@redhat.com>
(cherry picked from commit 2b19197e2f)
2021-08-11 11:02:46 +00:00
Rakshith R
416782d878 ci: move kubectl_retry() to utils.sh to be able to import it
Signed-off-by: Rakshith R <rar@redhat.com>
(cherry picked from commit a15892a87a)
2021-08-11 11:02:46 +00:00
Niels de Vos
a5211dcf0e util: allow configuring VAULT_BACKEND for Vault connection
It seems that the version of the key/value engine can not always be
detected for Hashicorp Vault. In certain cases, it is required to
configure the `VAULT_BACKEND` (or `vaultBackend`) option so that a
successful connection to the service can be made.

The `kv-v2` is the current default for development deployments of
Hashicorp Vault (what we use for automated testing). Production
deployments default to version 1 for now.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
(cherry picked from commit 82557e3f34)
2021-08-10 10:08:16 +00:00
Niels de Vos
c61e6b3f8c e2e: disable iss validation in Hashicorp Vault
Testing encrypted PVCs does not work anymore since Kubernetes v1.21. It
seems that disabling the iss validation in Hashicorp Vault is a
relatively simple workaround that we can use instead of the more complex
securing of the environment like should be done in production
deployments.

Updates: #1963
See-also: external-secrets/kubernetes-external-secrets#721
Signed-off-by: Niels de Vos <ndevos@redhat.com>
(cherry picked from commit fd9fee74de)
2021-06-29 11:43:11 +00:00
Niels de Vos
14e5a5cfa2 cleanup: prevent panic in cleanUpSnapshot
While cleaning up snapshots, not all object may exist after a partial
provisioning attempt. In case objects are missing, do not try to delete
them.

Fixes: #2192
Signed-off-by: Niels de Vos <ndevos@redhat.com>
(cherry picked from commit 0ee0c12027)
2021-06-25 12:12:52 +00:00
Yug
d8d46575a1 helm: remove function keyword
Getting rid of function keyword for two reasons:
1. Defining a function without 'function' keyword is more
   portable as it is compatible with Bourne/Korn/POSIX scripts
2. To ensure the coding style is same for the file.

Signed-off-by: Yug <yuggupta27@gmail.com>
(cherry picked from commit e47738fa75)
2021-06-24 18:23:15 +00:00
Yug
ad5eb89243 helm: add support to pass --namespace option
The current approach uses hard-coded command line
arguments which is not very robust;
To maintain backward compatibility, script will
also keep working as the previous approach.

Signed-off-by: Yug <yuggupta27@gmail.com>
(cherry picked from commit cc72de4b1c)
2021-06-24 18:23:15 +00:00
Yug
c5bd77e5fe e2e: provide an option if tests run on helm deployment
add an e2eArg `helmTest` to specify if tests are running
on ceph-csi deployment via helm.
For testing in CI, Storageclass and secret deployment
is enabled on helm installation.

Signed-off-by: Yug <yuggupta27@gmail.com>
(cherry picked from commit a4548c3983)
2021-06-24 18:23:15 +00:00
Madhu Rajanna
f59051aa2e rbd: set thick provision metadata on clone volume
the parent volume(CreateVolume) and the clone volume
(CreateSnapshot) are both indepedent and parent volume
can be deleted anytime. To check the thick provision
during Snapshot restore(CreateVolume from snapshot)
we need the thick provision metadata so for the same
reason setting the thick provision metadata on the
clone image we are creating at the CreateSnapshot time.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 591ba3f580)
2021-06-21 09:38:26 +00:00
Madhu Rajanna
a17eb07947 rbd: use RbdSnapName to check the image details
RbdSnapName holds the actual RBD image name which
got created during the CreateSnapshot operation.
RbdImageName holds the name of the parent from
which the snapshot is created. and the parent
is independent of snapshot and it can be deleted
any time for the same reason using the RbdSnapName
to check the rbd image details.

generate a temporary volume from the snapshot which
replaces the rbdImageName with RbdSnapName and use
it to check the image metadata.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 6d14eeee70)
2021-06-21 09:38:26 +00:00
Madhu Rajanna
96fce60b4e rbd: add validation for thick restore/clone
added validation to allow only Restore of Thick PVC
snapshot to a thick clone and creation of thick clone
from thick PVC.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 7966d2e5c1)
2021-06-21 09:38:26 +00:00
Madhu Rajanna
c5cafe3128 rbd: make isThickProvisioned method of rbdImage
isThickProvisioned can be used for both snapshot
and clone validation if isThickProvisioned is method
of common rbdImage structure.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit fc442221e4)
2021-06-21 09:38:26 +00:00
Madhu Rajanna
9fc0999a82 rbd: check stdErr for does not have a parent error
actual error will be present in the stdErr not the error
when we try to add a task to flatten the rbd image. This
commits corrects the error checking when the image does
not have a parent.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 05b8433b89)
2021-06-18 12:48:50 +00:00
Madhu Rajanna
d4b1e09815 rebase: update go-ceph to v0.10.0
This commit updates the go-ceph to latest
release. More details about release at
https://github.com/ceph/go-ceph/releases/tag/v0.10.0

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-06-18 07:09:17 +00:00
Humble Chirammal
65dc573302 cleanup: correct typo in travis scripts
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-06-18 03:32:30 +00:00
Humble Chirammal
9f5a2b5c8f cleanup: fix codespell error in internal/utils package
Codespell checker report below error:
```
Resulting CLI options  --check-filenames --check-hidden --skip .git,./vendor --ignore-words-list ExtraVersion,extraversion,ba
1
Error: ./internal/util/aws_metadata.go:96: Kubenetes ==> Kubernetes
```
This commit address the same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-06-18 03:32:30 +00:00
Humble Chirammal
d5576fd8ae cleanup: correct createORDeleteCephfsResources() function name
Along with correcing the name of the function, other typos are also
addressed

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-06-18 03:32:30 +00:00
Humble Chirammal
b082689f69 e2e: use proper variable name for rbd mount options
The variable naming for rbd mount options has been changed
to rbdMountOptions to be consistent with other variable naming schema

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-06-18 03:32:30 +00:00
Humble Chirammal
921975f45e e2e: correct gosec marker for credentials rule
The marker for hardcoded credentials check was set wrongly
and this patch address the same

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-06-18 03:32:30 +00:00
Humble Chirammal
fad1b602e1 cleanup: correct createORdeleteRbdResources() function name
This patch address a typo in the mentioned function name

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-06-18 03:32:30 +00:00
Mohammed Naser
6b86391bb2 rbd: Backout if image features is empty
In golang world, if you split an empty string that does not contain
the seperator, you get an array with one empty string.  This results
in volumes failing to mount with "invalid feature " (note extra space
because it's trying to check if 'empty string' is a valid feature).

This patch checks if the string is empty, and if so, it just decides
to skip the entire validation and returning nothing.

Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>
(cherry picked from commit 671d6a7767)
2021-06-10 17:04:13 +00:00
Mohammed Naser
f476a376f7 rbd: Add failing test when no features are provided
Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>
(cherry picked from commit f193ebfbb1)
2021-06-10 17:04:13 +00:00
Madhu Rajanna
f11722a472 rbd: fail fast in create volume for missmatch encryption
CreateVolume will fail in below cases

* If the snapshot is encrypted and requested volume
is not encrypted
* If the snapshot is not encrypted and requested
volume is encrypted

* If the parent volume is encrypted and requested volume
is not encrypted
* If the parent volume is not encrypted and requested
volume is encrypted

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 7b5c78ec7c)
2021-06-07 16:32:00 +00:00
Niels de Vos
a0ca713d79 rbd: repair thick-provisioned images on CreateVolume restart
Signed-off-by: Niels de Vos <ndevos@redhat.com>
(cherry picked from commit 7cbad9305f)
2021-06-01 16:08:39 +00:00
Niels de Vos
58d606ab8d cleanup: split repairExistingVolume() from CreateVolume()
Move the repairing of a volume/snapshot from CreateVolume to its own
function. This reduces the complexity of the code, and makes the
procedure easier to understand. Further enhancements to repairing an
exsiting volume can be done in the new function.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
(cherry picked from commit 96a8ea3e88)
2021-06-01 16:08:39 +00:00
Madhu Rajanna
a884b99c6e rbd: fix image details logging
log only the required details of
the image.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 0ce6ad1152)
2021-05-07 09:25:48 +00:00
Madhu Rajanna
4bec4c1818 e2e: pvc mounting when snap and parent pvc is deleted
Added an E2E test to test below case

* Create PVC
* Create Snapshot from PVC
* Delete PVC
* Create Clone from Snapshot
* Delete Snapshot
* Mount clone to Application
* Delete Application and PVC Clone

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit fa36a46682)
2021-05-07 09:25:48 +00:00
Madhu Rajanna
5e9f007ffd rbd: flatten image if the depth is not zero
flatten the image if the deep-flatten feature
is present on the images in the chain or if the
images in chain is not zero, as we cannot check
the deep-flatten feature the images which are
in trash.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 67d73cd6e9)
2021-05-07 09:25:48 +00:00
Madhu Rajanna
38bd4e613e rbd: discard image not found error
For flatten we call checkImageChainHasFeature
which internally calls to getImageInfo returns
the parent name even if the parent is in the trash,
when we try to open the parent image to get its
information it fails as the image not found.
we should treat error as nil if the parent is not found.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit e15e2e5081)
2021-05-07 09:25:48 +00:00
Madhu Rajanna
75fa1927fc rbd: mark image ready when image state is up+unknown
To recover from split brain (up+error) state the image need to be
demoted and requested for resync on site-a and then the image on site-b
should gets demoted.The volume should be marked to ready=true when the
image state on both the clusters are up+unknown because during the last
snapshot syncing the data gets copied first and then image state on the
site-a changes to up+unknown.

If the image state on both the sites are up+unknown consider that
complete data is synced as the last snapshot
gets exchanged between the clusters.

* create 10 GB of file and validate the data after resync

* Do Failover when the site-a goes down
* Force promote the image and write data in GiB
* Once the site-a comes back, Demote the image and issue resync
* Demote the image on site-b
* The status will get reflected on the other site when the last
  snapshot sync happens
* The image will go to up+unknown state. and complete data will
  be copied to site a
* Promote the image on site-a and use it

```bash
csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006:
  global_id:   e7f9ec55-06ab-46cb-a1ae-784be75ed96d
  state:       up+unknown
  description: remote image demoted
  service:     a on minicluster1
  last_update: 2021-04-28 07:11:56
  peer_sites:
    name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
    state: up+unknown
    description: remote image demoted
    last_update: 2021-04-28 07:11:41
 ```

* Do Failover when the site-a goes down
* Force promote the image on site-b and write data in GiB
* Demote the image on site-b
* Once the site-a comes back, Demote the image on site-a
* The images on the both site will go to split brain state

```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
  global_id:   115c3df9-3d4f-4c04-93a7-531b82155ddf
  state:       up+error
  description: split-brain
  service:     a on minicluster2
  last_update: 2021-04-28 07:25:41
  peer_sites:
    name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph
    state: up+error
    description: split-brain
    last_update: 2021-04-28 07:25:26
```
* Issue resync
* The images cannot be resynced because when we issue resync
  on site a the image on site-b was in demoted state
* To recover from this state (promote and then demote the
  image on site-b after sometime)

```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
  global_id:   115c3df9-3d4f-4c04-93a7-531b82155ddf
  state:       up+unknown
  description: remote image demoted
  service:     a on minicluster1
  last_update: 2021-04-28 07:32:56
  peer_sites:
    name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
    state: up+unknown
    description: remote image demoted
    last_update: 2021-04-28 07:32:41
```
* Once the data is copied we can see that  the image state
  is moved to up+unknown on both sites
* Promote the image on site-a and use it

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 07a916b84d)
2021-05-05 15:07:18 +00:00
Madhu Rajanna
1c59f0683e rbd: delete encryption key from KMS
when a Snapshot is encrypted during a CreateSnapshot
operation, the encryption key gets created in the KMS
when we delete the Snapshot the key from the KMS
should also gets deleted.

When we create a volume from snapshot we are copying
required information but we missed to copy the
encryption information, This commit adds the missing
information to delete the encryption key.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit c3bae17fce)
2021-04-30 09:37:23 +00:00
Madhu Rajanna
f547f76315 revert: deploy: update templates for v3.3.1
This reverts commit a07260f191.
which had template changes for v3.3.1 release.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-22 17:09:38 +05:30
Madhu Rajanna
a07260f191 deploy: update templates for v3.3.1
updated required templates for v3.3.1
release.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-22 13:35:31 +05:30
Humble Chirammal
1367cb445f rbd: return crypt error for the rpc return
At present we return the volume connect error if the clone
from snapshot fails when rbdvolume is encrypted, which is incorrect.
This patch correctly return the failed copy encryption error to the
caller

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
(cherry picked from commit 798437d0c4)
2021-04-22 12:55:50 +05:30
Madhu Rajanna
76fb7f6441 build: remove helm init from deploy.sh
from helm v3.x version there is no helm init
command. Removing the helm init which was causing
helm chart pushing issue in release and devel
branch.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 6508726276)
2021-04-22 12:32:05 +05:30
Madhu Rajanna
969d3796fa build: install helm version from build.env
Install the helm package based on the version
specified in the build.env

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit aa77b677a3)
2021-04-22 12:32:05 +05:30
Madhu Rajanna
599f3fd8e4 rbd: modified logic to check image watchers
Before RBD map operation, we do check the
watchers on the RBD image. In the case of
RWO volume. cephcsi makes sure only one
client is using the RBD image. If the rbd
image is mirrored, by default mirroring
daemon will add a watcher on the image
and as we are using go-ceph a watcher will
be added as we have opened the image So
we will have two watchers on an image if
mirroring is enabled. This holds when the
rbd mirror daemon is running, In case if
the mirror daemon is not running there will
be only one watcher on the rbd image
(which is placed by go-ceph image open)
we should not block the map operation if
the mirroring daemon is not running as
its Async mirroring. This commit adds a
check to make sure no more than 2 watchers
if the image is mirrored or no more than 1
watcher if it is not mirrored image.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 52290333e6)
2021-04-20 11:54:30 +05:30
Madhu Rajanna
c0533d1b17 revert: update templates for v3.3.0 release
This commit reverts back the changes done
for v3.3.0 release. With this change a
release canary tagged image and helm charts
will get pushed.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-16 14:37:11 +05:30
Madhu Rajanna
8122750c58 build: update required files for release-v3.3
updated the required templates and upgrade
document for release 3.3

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-15 19:06:49 +05:30
Madhu Rajanna
eea52847bc rbd: check volumeID in PV if image not found
If the pool or few keys are missing in the omap.
GetImageAttributes function returns nil error message and few
empty items in imageAttributes struct. if the image is not
found and  the entiries are missing use
the volumeId present on the PV annotation for further operations.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-15 17:13:06 +05:30
Madhu Rajanna
cfc88c9910 rbd: discard up+unknown state in ResyncVolume
incase if the image is promoted and demoted the
image state will be set to up+unknown if the image
on the remote cluster is still in demoted state.

when user changes the state from primary to secondary
and still the image is in demoted (secondary) state
in the remote cluster. the image state on both the cluster
will be on unknown state.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-15 17:13:06 +05:30
Rakshith R
31634ede3d cleanup: update mergify.yml to use merge_bot_account option
New version of mergifyio requires the use `merge_bot_account`
instead of `bot_accout` configuration option.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-04-15 12:00:45 +05:30
Rakshith R
3795704340 ci: update feature gates setting from minikube.sh
BlockVolume, CSIBlockVolume(GA since k8s v1.18) & VolumeSnapshotDataSource
(GA since k8s v1.20) default to true and don't need to be set to true in
feature gates setting.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-04-15 05:27:16 +00:00
Niels de Vos
8b8480017b logging: report issues in rbdImage.DEKStore API with stacks
It helps to get a stack trace when debugging issues. Certain things are
considered bugs in the code (like missing attributes in a struct), and
might cause a panic in certain occasions.

In this case, a missing string will not panic, but the behaviour will
also not be correct (DEKs getting encrypted, but unable to decrypt).
Clearly logging this as a BUG is probably better than calling panic().

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-14 03:59:28 +00:00
Niels de Vos
35d58a7d5a e2e: only test a single encrypted clone/snapshot
The default number for cloning and snapshot/restore is 10 volumes. This
adds to the time the test suite runs. There is no need to validate 10
copies of the encrypted volume, a single copy is sufficient.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-14 03:59:28 +00:00