Commit Graph

4179 Commits

Author SHA1 Message Date
Shyamsundar Ranganathan
c2280011d1 rbd: Report remote peer readiness if Up and status.Unknown
Current code uses an !A && !B condition incorrectly to
test A:Up and B:status for a remote peer image.

This should be !A || !B as we require both conditions to
be in the specified state (Up: true, and status Unknown).

This is corrected by this commit, and further fixes:
- check and return ready only when a remote site is
found in the status output
- check if all peer sites are ready, if multiple are found
and return ready appropriately

Signed-off-by: Shyamsundar Ranganathan <srangana@redhat.com>
2022-08-09 05:32:15 +00:00
Madhu Rajanna
8d7b6ee59f rbd: consider mirror deamon state for ResyncVolume
During ResyncVolume we check if the image
is in an error state, and we resync.
After resync, the image will move to
either the `Error` or the `Resyncing` state.
And if the image is in the above two
conditions, we will return a successful
response and Ready=false so that the
consumer can wait until the volume is
ready to use. If the image is in any
other state we return an error message
to indicate the syncing is not going on.
The whole resync and image state change
depends on the rbd mirror daemon. If the
mirror daemon is not running, the image
can be in Resyncing or Unknown state.
The Ramen marks the volume replication as
secondary, and once the resync starts, it
will delete the volume replication CR as a
cleanup process.

As we dont have a check for the rbd mirror
daemon, we are returning a resync success
response and Ready=false. Due to this false
response Ramen is assuming the resync started
and deleted the volume replication CR, and
because of this, the cluster goes into a bad
state and needs manual intervention.

fixes #3289

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-08-08 13:26:15 +00:00
Rakshith R
1ea4a1b790 ci: fix invalid mergifyio configuration
Comment out `comment: ` settings, since it
does not have any options set, otherwise
throws the following error.
```
The current Mergify configuration is invalid
required key not provided @ defaults → actions → comment → message
```

Signed-off-by: Rakshith R <rar@redhat.com>
2022-08-08 13:35:30 +05:30
Humble Chirammal
c9773db3f3 ci: remove check for snapshot controller installation and cleanup
At present, the check is performed to validate the version of kube
is v1.17 and this commit remove the same.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-08-05 12:02:45 +00:00
Humble Chirammal
5aabd4e1d2 deploy: remove the snapshot controller installation check
no need to have 1.17 kube version check anymore  before we install
snapshot controller.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-08-05 12:02:45 +00:00
Madhu Rajanna
297b14ed54 ci: update minikube to v1.26.1
update minikube to latest patch release.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-08-04 17:26:56 +00:00
Niels de Vos
83df1eae53 rebase: k8s.io/mount-utils/IsNotMountPoint() is deprecated
IsNotMountPoint() is deprecated and Mounter.IsMountPoint() is
recommended to be used instead.

Reported-by: golangci/staticcheck
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
10b2277330 util: use k8s.io/mount-utils/NewWithoutSystemd() to prevent logging
NewWithoutSystemd() has been introduced in the k8s.io/mount-utils
package so that systemd is not called while executing functions. This
offers consumers the ability to prevent confusing and scary messages
from getting logged.

See-also: kubernetes/kubernetes#111218
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
3a200b6976 rbd: use IsLikelyNotMountPoint() to prevent systemd log messages
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Niels de Vos
533994daff rebase: update k8s.io/mount-utils to current master
kubernetes/kubernetes#111083 has been merged and synced into
k8s.io/mount-utils. This should remove any systemd log messages while
calling NodeStageVolume and NodeGetVolumeStats.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-04 09:53:07 +00:00
Pedro Alvarez
3c3cbc8005 doc: update relative path to ceph-config.yaml file
Signed-off-by: Pedro Alvarez <pedro.alvarez@softiron.com>
2022-08-04 07:16:56 +00:00
Niels de Vos
0a173a8a9e nfs: make DeleteVolume (more) idempotent
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-03 19:43:16 +00:00
Niels de Vos
a6cd56ae7e e2e: correct failure logging for NFS
Some of the steps still refer to CephFS, likely missed some replacements
while copy/pasting. The logging is a little confusing when messages
claim something with CephFS failed, but the test is about NFS.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-08-03 19:43:16 +00:00
Yati Padia
f0074a3ebf deploy: enable HonorPVReclaimPolicy feature gate
This commit enables the HonorPVReclaimPolicy feature
gate.

fixes: #3230

Signed-off-by: Yati Padia <ypadia@redhat.com>
2022-08-03 19:43:16 +00:00
Humble Chirammal
bc9ad3d9f1 rbd: add dummy attacher implementation
previously, it was a requirement to have attacher sidecar for CSI
drivers and there had an implementation of dummy mode of operation.
However skipAttach implementation has been stabilized and the dummy
mode of operation is going to be removed from the external-attacher.
Considering this driver  work on volumeattachment objects for NBD driver
use cases, we have to implement dummy controllerpublish and unpublish
and thus keep supporting our operations even in absence of dummy mode
of operation in the sidecar.

This commit make a NOOP controller publish and unpublish for RBD driver.

CephFS driver does not require attacher and it has already been made free
from the attachment operations.

    Ref# https://github.com/ceph/ceph-csi/pull/3149
    Ref# https://github.com/kubernetes-csi/external-attacher/issues/226

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-08-03 00:25:49 +00:00
Prasanna Kumar Kalever
b4f44a43d5 doc: Add documentation about --setmetadata option
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
c0a566b5ed deploy: add setmetadata=true in the templates
setmetadata on the volume by default, otherwise e2e will fail

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
30244bf11b cephfs: snapshots honor --setmetadata option
`--setmetadata` is false by default, honoring it
will keep the metadata disabled by default

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
14d6211d6d cephfs: subvolumes honor --setmetadata option
`--setmetadata` is false by default, honoring it
will keep the metadata disabled by default

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
cf97e377fa e2e: validate clusterName metadata
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Prasanna Kumar Kalever
de7128b3a2 cephfs: Add clusterName as metadata on snapshots
Example:
sh-4.4$ ceph fs subvolume snapshot metadata ls myfs csi-vol-ba248f9e-0e75-11ed-b774-8e97192ff5ec \
			csi-snap-ce24e3bb-0e75-11ed-b774-8e97192ff5ec --group_name csi
{
    "csi.ceph.com/cluster/name": "\"K8s-cluster-1\"",
    "csi.storage.k8s.io/volumesnapshot/name": "cephfs-pvc-snapshot",
    "csi.storage.k8s.io/volumesnapshot/namespace": "rook-ceph",
    "csi.storage.k8s.io/volumesnapshotcontent/name": "snapcontent-2e89e1b2-e6e9-48fe-b365-edb493d7022e"
}

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-08-01 07:15:29 +00:00
Anthony D'Atri
56d7d3cd15 doc: minor cleanup
Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
2022-08-01 04:29:34 +00:00
dependabot[bot]
f4d6e51c4b rebase: bump k8s.io/kubernetes from 1.24.2 to 1.24.3
Bumps [k8s.io/kubernetes](https://github.com/kubernetes/kubernetes) from 1.24.2 to 1.24.3.
- [Release notes](https://github.com/kubernetes/kubernetes/releases)
- [Commits](https://github.com/kubernetes/kubernetes/compare/v1.24.2...v1.24.3)

---
updated-dependencies:
- dependency-name: k8s.io/kubernetes
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-30 15:24:00 +00:00
dependabot[bot]
48dc0c95a6 rebase: bump github.com/aws/aws-sdk-go from 1.44.28 to 1.44.62
Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.44.28 to 1.44.62.
- [Release notes](https://github.com/aws/aws-sdk-go/releases)
- [Changelog](https://github.com/aws/aws-sdk-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go/compare/v1.44.28...v1.44.62)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-30 01:38:31 +00:00
Niels de Vos
a04a0ecc9f ci: retry command in Pod on "unable to upgrade connection" error
Sometimes executing a command in a Pod fails with "unable to upgrade
connection". This is most likely a temporary situation, and retrying
hopefully reduces the number of spurious failures because of it.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-29 16:39:26 +00:00
Prasanna Kumar Kalever
856d7c264c cephfs: handle metadata op-failures with unsupported ceph versions
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
c32e71b31c e2e: CephFS validate restore and clone metadata
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
eb55096ebd e2e: add test case for snapshot metadata validation
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
f84265fdf5 deploy: add --extra-create-metadata arg to csi-snapshotter sidecar
This argument in csi-snapshotter sidecar allows us to receive
snapshot-name/snapshot-namespace/snapshotcontent-name metadata in the
CreateSnapshot() request.

For ex:

csi.storage.k8s.io/volumesnapshot/name
csi.storage.k8s.io/volumesnapshot/namespace
csi.storage.k8s.io/volumesnapshotcontent/name

This is a useful information which can be used depend on the use case we
have at our driver. The features like adding metadata to snapshot image
can consume this based on the need.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
5f36f7e8bd cephfs: update subvolume snapshot metadata if snapshot already exists.
Make sure to set metadata when subvolume snapshot exist, i.e. if the
provisioner pod is restarted while createSnapShot is in progress, say it
created the subvolume snapshot but didn't yet set the metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
7c9259a45e cephfs: set metadata on the subvolume snapshot on create
Set snapshot-name/snapshot-namespace/snapshotcontent-name details
on subvolume snapshots as metadata on create.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Prasanna Kumar Kalever
8c0dd482fa cephfs: add set/Remove subvolume snapshot metadata utility functions
Add utility functions to set/Remove
snapshot-name/snapshot-namespace/snapshotcontent-name metadata on
subvolume snapshots.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 19:37:23 +00:00
Humble Chirammal
76ddf8e306 deploy: introduce new log level for sidecar controllers
At present we have single log level configuration for all the containers
running for our CSI pods, which has been defaulted to log Level 5.
However this cause many logs to be spitted in a cluster and cause log
spamming to an extent. This commit introduce one more log level control
for CSI pods called sidecarLogLevel which defaults to log Level 1.

The sidecar controllers like snapshotter, resizer, attacher..etc has
been configured with this new log level and driver pods are with old
configruation value.

This allow us to have different configuration options for sidecar
constrollers and driver pods.

With this, we will also have a choice of different configuation setting
instead of locking onto one variable for the containers deployed via CSI driver.

To summarize the CSI containers maintained by Ceph CSI driver has log
level 5 and controllers/sidecars not maintained by Ceph CSI driver has
log level 1 configuration.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-07-28 08:31:37 +00:00
Prasanna Kumar Kalever
51099d60fe cephfs: handle metadata op-failures with unsupported ceph versions
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
2390a43415 e2e: add tests to validate cluster name
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
3ddb8c289c doc: add documentation about --clustername option
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
cc9e8aa7b6 deploy: add cluster name in the templates
added in helm charts which should help users.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
11d51ed9b0 cephfs: unset cluster Name metadata
unsets the cluster name metadata key and value on the subvolume

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
21d811096b cephfs: set cluster Name as metadata on the subvolume
This change helps read the cluster name from the cmdline args,
the provisioner will set the same on the subvolume.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
25ce21f496 e2e: add test cases for subvolume metadata validation
create a PVC and check PVC/PV metadata on cephFS subvolume

Fixes: #2875
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
466bdf97b2 cephfs: set metadata on restart of provisioner pod
Make sure to set metadata when subvolume exist, i.e. if the provisioner pod
is restarted while createVolume is in progress, say it created the subvolume
but didn't yet set the metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
6bcb8ecc68 cephfs: set PV/PVC details on the subvolume as metadata on create
This helps Monitoring solutions without access to Kubernetes clusters to
display the details of the PV/PVC/NameSpace in their dashboard.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
ecf03eb6ae cephfs: add set/Get/List/Remove metadata utility functions
Add utility functions to set/Get/List/Remove PV/PVC/PVCNamespace metadata
on subvolume.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Madhu Rajanna
8c5563a9bc rbd: remove checkHealthyPrimary check
After Failover of workloads to the secondary
cluster when the primary cluster is down,
RBD Image is not marked healthy, and VR
resources are not promoted to the Primary,
In VolumeReplication, the `CURRENT STATE`
remains Unknown and doesn't change to Primary.

This happens because the primary cluster went down,
and we have force promoted the image on the
secondary cluster. and the image stays in
up+stopping_replay or could be any other states.
Currently assumption was that the image will
always be `up+stopped`. But the image will be in
`up+stopped` only for planned failover and it
could be in any other state if its a forced
failover. For this reason, removing
checkHealthyPrimary from the PromoteVolume RPC call.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-27 09:04:27 +00:00
dependabot[bot]
33d4d54dbe rebase: bump google.golang.org/grpc from 1.47.0 to 1.48.0
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.47.0 to 1.48.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.47.0...v1.48.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-27 04:04:54 +00:00
Niels de Vos
04889e66db ci: verify that Ceph Mgr is running
The Ceph v17.2.2 container-image fails to start Ceph Mgr. This causes
issues while the e2e test suite is running. It is better to check if
Ceph Mgr is available, before continuing with the rest of the CI job.

Updates: #3259
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-26 12:47:51 +00:00
Madhu Rajanna
3ddec80346 ci: update mergify rules for kubernetes 1.24
Updating mergify rules to consider CI run on
Kubernetes 1.24 and discard CI run on kubernetes
1.21 as we no longer need it.

updates: #3086

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-25 09:37:02 +02:00
Madhu Rajanna
8de063394b e2e: add deadcode nolint for k8sVersionGreaterEquals
k8sVersionGreaterEquals is not used anywhere but it
will be used in future if we need to have a kubernetes
version check. adding nolint for it now to skip it
from static check.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-25 07:35:41 +00:00
Madhu Rajanna
efabe70a46 e2e: remove kubernetes 1.22 check
We run CI jobs on kubernetes 1.22 by default
and we dont need to have a check to make sure
we have atleast Kubernetes 1.22 for few tests.
As we have CI runs on 1.22 by default, Removing
unwanted check.

updates: #3086
depends-on #3255

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-25 07:35:41 +00:00
Niels de Vos
011d4fc81c cleanup: create k8s.io/mount-utils Mounter only once
Recently the k8s.io/mount-utils package added more runtime dectection.
When creating a new Mounter, the detect is run every time. This is
unfortunate, as it logs a message like the following:

```
mount_linux.go:283] Detected umount with safe 'not mounted' behavior
```

This message might be useful, so it probably good to keep it.

In Ceph-CSI there are various locations where Mounter instances are
created. Moving that to the DefaultNodeServer type reduces it to a
single place. Some utility functions need to accept the additional
parameter too, so that has been modified as well.

See-also: kubernetes/kubernetes#109676
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-21 07:14:43 +00:00