Commit Graph

3365 Commits

Author SHA1 Message Date
Yati Padia
774e8e4042 util: enable golang profiling
Add support for golang profiling.
Standard tools like go tool pprof and curl
work. example:
$ go tool pprof http://localhost:8080/debug/pprof/profile
$ go tool pprof http://localhost:8080/debug/pprof/heap
$ curl http://localhost:8080/debug/pprof/heap?debug=1

https://golang.org/pkg/net/http/pprof/ contains
more details about the pprof interface.

Fixes: #1699

Signed-off-by: Yati Padia <ypadia@redhat.com>
2021-05-25 10:41:22 +00:00
Humble Chirammal
9aa3520c9d build: update go version to 1.16 in go.mod
Make go version latest in the repo

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-05-25 09:03:52 +00:00
Niels de Vos
2b9f6c3598 e2e: fetch volume metrics from Kubelet
Test if metrics are available at all. The actual values are a little
difficult to validate.

BlockMode volumes support metrics since Kubernetes 1.22.

See-also: kubernetes/kubernetes#97972
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-25 06:41:04 +00:00
Niels de Vos
25d0a1cfc0 rbd: add support for block-devices in NodeGetVolumeStats()
The NodeGetVolumeStats procedure can now be used to fetch the capacity
of the RBD block-device. By default this is a thin-provisioned device,
which means that the capacity is not reserved in the Ceph cluster. This
makes it possible to over-provision the cluster.

In order to detect the amount of storage used by the RBD block-device
(when thin-provisioned), it is required to connect to the Ceph cluster.
Unfortunately, the NodeGetVolumeStats CSI procedure does not provide
enough parameters to connect to the Ceph cluster and fetch more details
about the RBD image.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-25 06:41:04 +00:00
Niels de Vos
c0ab4c03e6 cephfs: move NodeGetVolumeStats() to CephFS NodeServer
The CephFS NodeServer should handle the CephFS specific requests. This
is not something that the NodeServer for RBD should handle.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-25 06:41:04 +00:00
Rakshith R
a4e4750fdc deploy: disable mon,mgr and mds liveness probe
This commit disables mon,mgr and mds liveness probe
which on failing caused `crashLoopBackOff` state.

Updates: #2094

Signed-off-by: Rakshith R <rar@redhat.com>
2021-05-24 16:12:20 +00:00
Humble Chirammal
151b8c665f e2e: update upgrade-version to v3.3.1
making default version of upgrade test to v3.3.1

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-05-24 16:12:20 +00:00
Humble Chirammal
d56978739f deploy: update Rook version to v1.6.2
Rook v1.6.2 is available and this patch updates the version to the
same:

https://github.com/rook/rook/releases/tag/v1.6.2

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-05-24 16:12:20 +00:00
Rakshith R
5cf3b4ab80 cleanup: update mergify.yml to remove bot_account option
Mergify.io has removed bot_account from its free open source plan.
This commit removes bot_account option from comment, merge and rebase
actions default and documenting the implications going forward.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-05-20 13:38:00 +05:30
Niels de Vos
150de619ee ci: have mergify set the component/build label on PRs
Quite a few PRs have the `build:` prefix. It would be good to teach
Mergify to set the label for those.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-17 09:56:56 +05:30
Madhu Rajanna
5ccdb35da0 doc: update readme for latest releases
as we have done new releases for v3.3.x and
v3.2.x updating the readme for the same.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-05-10 09:42:18 +00:00
Madhu Rajanna
0ce6ad1152 rbd: fix image details logging
log only the required details of
the image.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-05-07 07:57:37 +00:00
Madhu Rajanna
fa36a46682 e2e: pvc mounting when snap and parent pvc is deleted
Added an E2E test to test below case

* Create PVC
* Create Snapshot from PVC
* Delete PVC
* Create Clone from Snapshot
* Delete Snapshot
* Mount clone to Application
* Delete Application and PVC Clone

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-05-07 07:57:37 +00:00
Madhu Rajanna
67d73cd6e9 rbd: flatten image if the depth is not zero
flatten the image if the deep-flatten feature
is present on the images in the chain or if the
images in chain is not zero, as we cannot check
the deep-flatten feature the images which are
in trash.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-05-07 07:57:37 +00:00
Madhu Rajanna
e15e2e5081 rbd: discard image not found error
For flatten we call checkImageChainHasFeature
which internally calls to getImageInfo returns
the parent name even if the parent is in the trash,
when we try to open the parent image to get its
information it fails as the image not found.
we should treat error as nil if the parent is not found.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-05-07 07:57:37 +00:00
Niels de Vos
86cfc3dd0c ci: update mergify config for labelling PRs
When the configuration for Mergify itself gets updated, there is not
need to run the e2e tests.

It seems that the `(ci)` part of matching a title for ci/testing PRs
would match partial words like 'capacity'. This is not intended, so
rephrasing the regex and adding `e2e` as match too.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-07 07:43:40 +05:30
Niels de Vos
f11a041f56 cleanup: address gosec complaint about creating a file
The new gosec 2.7.0 complains like:

    G304 (CWE-22): Potential file inclusion via variable (Confidence: HIGH, Severity: MEDIUM)

Updates: #2025
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-05 16:05:23 +00:00
Madhu Rajanna
07a916b84d rbd: mark image ready when image state is up+unknown
To recover from split brain (up+error) state the image need to be
demoted and requested for resync on site-a and then the image on site-b
should gets demoted.The volume should be marked to ready=true when the
image state on both the clusters are up+unknown because during the last
snapshot syncing the data gets copied first and then image state on the
site-a changes to up+unknown.

If the image state on both the sites are up+unknown consider that
complete data is synced as the last snapshot
gets exchanged between the clusters.

* create 10 GB of file and validate the data after resync

* Do Failover when the site-a goes down
* Force promote the image and write data in GiB
* Once the site-a comes back, Demote the image and issue resync
* Demote the image on site-b
* The status will get reflected on the other site when the last
  snapshot sync happens
* The image will go to up+unknown state. and complete data will
  be copied to site a
* Promote the image on site-a and use it

```bash
csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006:
  global_id:   e7f9ec55-06ab-46cb-a1ae-784be75ed96d
  state:       up+unknown
  description: remote image demoted
  service:     a on minicluster1
  last_update: 2021-04-28 07:11:56
  peer_sites:
    name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
    state: up+unknown
    description: remote image demoted
    last_update: 2021-04-28 07:11:41
 ```

* Do Failover when the site-a goes down
* Force promote the image on site-b and write data in GiB
* Demote the image on site-b
* Once the site-a comes back, Demote the image on site-a
* The images on the both site will go to split brain state

```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
  global_id:   115c3df9-3d4f-4c04-93a7-531b82155ddf
  state:       up+error
  description: split-brain
  service:     a on minicluster2
  last_update: 2021-04-28 07:25:41
  peer_sites:
    name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph
    state: up+error
    description: split-brain
    last_update: 2021-04-28 07:25:26
```
* Issue resync
* The images cannot be resynced because when we issue resync
  on site a the image on site-b was in demoted state
* To recover from this state (promote and then demote the
  image on site-b after sometime)

```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
  global_id:   115c3df9-3d4f-4c04-93a7-531b82155ddf
  state:       up+unknown
  description: remote image demoted
  service:     a on minicluster1
  last_update: 2021-04-28 07:32:56
  peer_sites:
    name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
    state: up+unknown
    description: remote image demoted
    last_update: 2021-04-28 07:32:41
```
* Once the data is copied we can see that  the image state
  is moved to up+unknown on both sites
* Promote the image on site-a and use it

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-05-05 13:38:29 +00:00
Niels de Vos
90fd12629f ci: use mergify to add labels to PRs
Adding labels to Pull-Requests can be done by Mergify. It is very useful
to filter PRs on their labels so that experts in certain areas can
identify PRs to review.

Adding labels is currently a complete manual process, this adds some
automation for it. There is no intention of it being complete, this is
mostly for getting started and trying things out.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-05 17:53:13 +05:30
Niels de Vos
70f4f3d5f6 ci: drop quotes for Mergify regex matching
The matching with regexes on the `base=` configuration in Mergify does
not seem to work as intended. Possibly this is due the addtional quotes
around the regex.

Fixes: 8d7c66363 (ci: apply standard Mergify rules for release branches too)
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-04 14:08:10 +05:30
Niels de Vos
8d7c66363d ci: apply standard Mergify rules for release branches too
It is not always possible to automatically backport PRs to release
branches. That also means that these backport PRs are not created by the
mergify[bot] account. Because of this, it is needed to manually merge
PRs, as Mergify refuses to do it.

By changing the `base=` option to match a regular expression that
includes both `devel` and `release-*` branches, Mergify should assist
with merging PRs to release branches too.

Note that the `ci/centos` branch is different, as runs other tests than
the normal branches.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-05-04 12:05:45 +05:30
Prasanna Kumar Kalever
d59903efe1 e2e: update rook ceph cluster version
use pacific release

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-05-03 13:12:46 +00:00
Madhu Rajanna
c3bae17fce rbd: delete encryption key from KMS
when a Snapshot is encrypted during a CreateSnapshot
operation, the encryption key gets created in the KMS
when we delete the Snapshot the key from the KMS
should also gets deleted.

When we create a volume from snapshot we are copying
required information but we missed to copy the
encryption information, This commit adds the missing
information to delete the encryption key.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-30 08:05:47 +00:00
Rakshith R
e34e3c39aa helm: update external-snapshotter image to v4.0.0
update external-snapshotter image to v4.0.0.
Updating helm charts was forgotten in #1916.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-04-29 13:41:48 +00:00
Humble Chirammal
9dc2b1122d doc: correct the keys in certificate secrets
At present the cert keys are not unique which is not correct.
The keys in the secret should be unique and this patch address
the same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-04-29 08:51:29 +00:00
Humble Chirammal
074c937a08 cleanup: correct typo in vault_tokens.go
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-04-29 08:51:29 +00:00
Prasanna Kumar Kalever
8fafc4256e build: use latest pacific release of ceph
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2021-04-27 13:41:57 +00:00
Mudit Agarwal
ec105bd782 cephfs: expand clone error messages
Adding "snapshot clone" in the clone error messages.

Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
2021-04-26 13:38:55 +00:00
Niels de Vos
c932176802 testing: enable nodeExpansion for k8s-storage rbd-rwo verification
Updates: #2015
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-26 05:49:39 +00:00
Matthias Neugebauer
3505731c42 Replace deprecated GitVersion with Version
This replaces the deprecated `GitVersion` with `Version`.

See a499b4b179/pkg/chartutil/capabilities.go (L71-L74)

Signed-off-by: Matthias Neugebauer <matthias.neugebauer@uni-muenster.de>
2021-04-26 04:17:13 +00:00
Rakshith R
e005099549 cleanup: fix author in mergifyio backport conditions
All backport prs are authored by mergify[bot] not ceph-csi-bot
and there is no support for bot_account to create backport pr
currently.
Fixes: #1994

Signed-off-by: Rakshith R <rar@redhat.com>
2021-04-22 15:03:32 +05:30
Madhu Rajanna
6508726276 build: remove helm init from deploy.sh
from helm v3.x version there is no helm init
command. Removing the helm init which was causing
helm chart pushing issue in release and devel
branch.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-22 06:26:22 +00:00
Madhu Rajanna
aa77b677a3 build: install helm version from build.env
Install the helm package based on the version
specified in the build.env

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-22 06:26:22 +00:00
Humble Chirammal
798437d0c4 rbd: return crypt error for the rpc return
At present we return the volume connect error if the clone
from snapshot fails when rbdvolume is encrypted, which is incorrect.
This patch correctly return the failed copy encryption error to the
caller

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-04-21 16:10:20 +00:00
Humble Chirammal
0166817de4 cleanup: correct typo in travis scripts
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-04-21 09:22:30 +00:00
Madhu Rajanna
029b5004aa doc: update upgrade doc for v3.3.0
As we have v3.3 as the latest release
updating the upgrade doc in the devel
branch to point to the same.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-21 06:39:07 +00:00
Madhu Rajanna
52290333e6 rbd: modified logic to check image watchers
Before RBD map operation, we do check the
watchers on the RBD image. In the case of
RWO volume. cephcsi makes sure only one
client is using the RBD image. If the rbd
image is mirrored, by default mirroring
daemon will add a watcher on the image
and as we are using go-ceph a watcher will
be added as we have opened the image So
we will have two watchers on an image if
mirroring is enabled. This holds when the
rbd mirror daemon is running, In case if
the mirror daemon is not running there will
be only one watcher on the rbd image
(which is placed by go-ceph image open)
we should not block the map operation if
the mirroring daemon is not running as
its Async mirroring. This commit adds a
check to make sure no more than 2 watchers
if the image is mirrored or no more than 1
watcher if it is not mirrored image.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-19 16:30:55 +00:00
Niels de Vos
27247d1444 e2e: remove losetup workaround for minikube
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-19 13:30:57 +00:00
Niels de Vos
e5787f24b0 rebase: update minikube to version 1.19.0
See-also: https://github.com/kubernetes/minikube/releases/tag/v1.19.0
Fixes: #1972
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2021-04-19 11:15:57 +00:00
Yug
6a46f381c2 cleanup: update description to generic
Since rbdImage is a common struct for
rbdVolume and rbdSnapshot, it description
was matching to only snapshot.
This commit makes the comments generic for
both volumes and snapshots.

Signed-off-by: Yug <yuggupta27@gmail.com>
2021-04-19 07:32:35 +00:00
Rakshith R
9f2cf498b6 cephfs: enable ceph-fuse big_writes by default
By default, the write buffer size in libfuse2 is 2KiB
`fuse_big_writes = true` option is used to override this limit.
This commit makes `fuse_big_writes = true` option as default
in ceph.conf.

Closes: #1928

Signed-off-by: Rakshith R <rar@redhat.com>
2021-04-19 07:08:57 +00:00
Humble Chirammal
54845b63c0 cleanup: better or corrected variable name in grpc prometheous code
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-04-16 10:22:35 +00:00
Humble Chirammal
0fae0e53b6 cleanup: various source code comment corrections
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2021-04-16 10:22:35 +00:00
Madhu Rajanna
b0c9b3e752 doc: rename master to devel
renamed master to devel in image
compatibility list.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-16 06:10:46 +00:00
Madhu Rajanna
6c21be278a doc: update readme for v3.3.0 release
updated readme with available released
image tags.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-16 06:10:46 +00:00
Madhu Rajanna
a3b3858a97 ci: add mergify rules for release 3.3
as we have a new release-v3.3 branch. adding
the mergify rules for auto merge and auto backport
based on backport-to-release-v3.3 label.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-15 17:39:49 +05:30
Madhu Rajanna
d94a7ca7e1 revert: cleanup: update mergify.yml to use merge_bot_account option
This reverts commit 31634ede3d.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-15 17:34:08 +05:30
Madhu Rajanna
eea52847bc rbd: check volumeID in PV if image not found
If the pool or few keys are missing in the omap.
GetImageAttributes function returns nil error message and few
empty items in imageAttributes struct. if the image is not
found and  the entiries are missing use
the volumeId present on the PV annotation for further operations.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-15 17:13:06 +05:30
Madhu Rajanna
cfc88c9910 rbd: discard up+unknown state in ResyncVolume
incase if the image is promoted and demoted the
image state will be set to up+unknown if the image
on the remote cluster is still in demoted state.

when user changes the state from primary to secondary
and still the image is in demoted (secondary) state
in the remote cluster. the image state on both the cluster
will be on unknown state.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2021-04-15 17:13:06 +05:30
Rakshith R
31634ede3d cleanup: update mergify.yml to use merge_bot_account option
New version of mergifyio requires the use `merge_bot_account`
instead of `bot_accout` configuration option.

Signed-off-by: Rakshith R <rar@redhat.com>
2021-04-15 12:00:45 +05:30