This commit disables mon,mgr and mds liveness probe
which on failing caused `crashLoopBackOff` state.
Updates: #2094
Signed-off-by: Rakshith R <rar@redhat.com>
Mergify.io has removed bot_account from its free open source plan.
This commit removes bot_account option from comment, merge and rebase
actions default and documenting the implications going forward.
Signed-off-by: Rakshith R <rar@redhat.com>
Quite a few PRs have the `build:` prefix. It would be good to teach
Mergify to set the label for those.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Added an E2E test to test below case
* Create PVC
* Create Snapshot from PVC
* Delete PVC
* Create Clone from Snapshot
* Delete Snapshot
* Mount clone to Application
* Delete Application and PVC Clone
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
flatten the image if the deep-flatten feature
is present on the images in the chain or if the
images in chain is not zero, as we cannot check
the deep-flatten feature the images which are
in trash.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
For flatten we call checkImageChainHasFeature
which internally calls to getImageInfo returns
the parent name even if the parent is in the trash,
when we try to open the parent image to get its
information it fails as the image not found.
we should treat error as nil if the parent is not found.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When the configuration for Mergify itself gets updated, there is not
need to run the e2e tests.
It seems that the `(ci)` part of matching a title for ci/testing PRs
would match partial words like 'capacity'. This is not intended, so
rephrasing the regex and adding `e2e` as match too.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The new gosec 2.7.0 complains like:
G304 (CWE-22): Potential file inclusion via variable (Confidence: HIGH, Severity: MEDIUM)
Updates: #2025
Signed-off-by: Niels de Vos <ndevos@redhat.com>
To recover from split brain (up+error) state the image need to be
demoted and requested for resync on site-a and then the image on site-b
should gets demoted.The volume should be marked to ready=true when the
image state on both the clusters are up+unknown because during the last
snapshot syncing the data gets copied first and then image state on the
site-a changes to up+unknown.
If the image state on both the sites are up+unknown consider that
complete data is synced as the last snapshot
gets exchanged between the clusters.
* create 10 GB of file and validate the data after resync
* Do Failover when the site-a goes down
* Force promote the image and write data in GiB
* Once the site-a comes back, Demote the image and issue resync
* Demote the image on site-b
* The status will get reflected on the other site when the last
snapshot sync happens
* The image will go to up+unknown state. and complete data will
be copied to site a
* Promote the image on site-a and use it
```bash
csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006:
global_id: e7f9ec55-06ab-46cb-a1ae-784be75ed96d
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:11:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:11:41
```
* Do Failover when the site-a goes down
* Force promote the image on site-b and write data in GiB
* Demote the image on site-b
* Once the site-a comes back, Demote the image on site-a
* The images on the both site will go to split brain state
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+error
description: split-brain
service: a on minicluster2
last_update: 2021-04-28 07:25:41
peer_sites:
name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph
state: up+error
description: split-brain
last_update: 2021-04-28 07:25:26
```
* Issue resync
* The images cannot be resynced because when we issue resync
on site a the image on site-b was in demoted state
* To recover from this state (promote and then demote the
image on site-b after sometime)
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:32:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:32:41
```
* Once the data is copied we can see that the image state
is moved to up+unknown on both sites
* Promote the image on site-a and use it
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Adding labels to Pull-Requests can be done by Mergify. It is very useful
to filter PRs on their labels so that experts in certain areas can
identify PRs to review.
Adding labels is currently a complete manual process, this adds some
automation for it. There is no intention of it being complete, this is
mostly for getting started and trying things out.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The matching with regexes on the `base=` configuration in Mergify does
not seem to work as intended. Possibly this is due the addtional quotes
around the regex.
Fixes: 8d7c66363 (ci: apply standard Mergify rules for release branches too)
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It is not always possible to automatically backport PRs to release
branches. That also means that these backport PRs are not created by the
mergify[bot] account. Because of this, it is needed to manually merge
PRs, as Mergify refuses to do it.
By changing the `base=` option to match a regular expression that
includes both `devel` and `release-*` branches, Mergify should assist
with merging PRs to release branches too.
Note that the `ci/centos` branch is different, as runs other tests than
the normal branches.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
when a Snapshot is encrypted during a CreateSnapshot
operation, the encryption key gets created in the KMS
when we delete the Snapshot the key from the KMS
should also gets deleted.
When we create a volume from snapshot we are copying
required information but we missed to copy the
encryption information, This commit adds the missing
information to delete the encryption key.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
At present the cert keys are not unique which is not correct.
The keys in the secret should be unique and this patch address
the same.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
All backport prs are authored by mergify[bot] not ceph-csi-bot
and there is no support for bot_account to create backport pr
currently.
Fixes: #1994
Signed-off-by: Rakshith R <rar@redhat.com>
from helm v3.x version there is no helm init
command. Removing the helm init which was causing
helm chart pushing issue in release and devel
branch.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
At present we return the volume connect error if the clone
from snapshot fails when rbdvolume is encrypted, which is incorrect.
This patch correctly return the failed copy encryption error to the
caller
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
As we have v3.3 as the latest release
updating the upgrade doc in the devel
branch to point to the same.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Before RBD map operation, we do check the
watchers on the RBD image. In the case of
RWO volume. cephcsi makes sure only one
client is using the RBD image. If the rbd
image is mirrored, by default mirroring
daemon will add a watcher on the image
and as we are using go-ceph a watcher will
be added as we have opened the image So
we will have two watchers on an image if
mirroring is enabled. This holds when the
rbd mirror daemon is running, In case if
the mirror daemon is not running there will
be only one watcher on the rbd image
(which is placed by go-ceph image open)
we should not block the map operation if
the mirroring daemon is not running as
its Async mirroring. This commit adds a
check to make sure no more than 2 watchers
if the image is mirrored or no more than 1
watcher if it is not mirrored image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Since rbdImage is a common struct for
rbdVolume and rbdSnapshot, it description
was matching to only snapshot.
This commit makes the comments generic for
both volumes and snapshots.
Signed-off-by: Yug <yuggupta27@gmail.com>
By default, the write buffer size in libfuse2 is 2KiB
`fuse_big_writes = true` option is used to override this limit.
This commit makes `fuse_big_writes = true` option as default
in ceph.conf.
Closes: #1928
Signed-off-by: Rakshith R <rar@redhat.com>
as we have a new release-v3.3 branch. adding
the mergify rules for auto merge and auto backport
based on backport-to-release-v3.3 label.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the pool or few keys are missing in the omap.
GetImageAttributes function returns nil error message and few
empty items in imageAttributes struct. if the image is not
found and the entiries are missing use
the volumeId present on the PV annotation for further operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase if the image is promoted and demoted the
image state will be set to up+unknown if the image
on the remote cluster is still in demoted state.
when user changes the state from primary to secondary
and still the image is in demoted (secondary) state
in the remote cluster. the image state on both the cluster
will be on unknown state.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
BlockVolume, CSIBlockVolume(GA since k8s v1.18) & VolumeSnapshotDataSource
(GA since k8s v1.20) default to true and don't need to be set to true in
feature gates setting.
Signed-off-by: Rakshith R <rar@redhat.com>
It helps to get a stack trace when debugging issues. Certain things are
considered bugs in the code (like missing attributes in a struct), and
might cause a panic in certain occasions.
In this case, a missing string will not panic, but the behaviour will
also not be correct (DEKs getting encrypted, but unable to decrypt).
Clearly logging this as a BUG is probably better than calling panic().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The default number for cloning and snapshot/restore is 10 volumes. This
adds to the time the test suite runs. There is no need to validate 10
copies of the encrypted volume, a single copy is sufficient.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This moves validatePVCSnapshot() into its own function, so that it
follows the same format as validatePVCClone() does already.
Signed-off-by: Niels de Vos <ndevos@redhat.com>