Commit Graph

4246 Commits

Author SHA1 Message Date
Prasanna Kumar Kalever
51099d60fe cephfs: handle metadata op-failures with unsupported ceph versions
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
2390a43415 e2e: add tests to validate cluster name
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
3ddb8c289c doc: add documentation about --clustername option
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
cc9e8aa7b6 deploy: add cluster name in the templates
added in helm charts which should help users.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
11d51ed9b0 cephfs: unset cluster Name metadata
unsets the cluster name metadata key and value on the subvolume

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
21d811096b cephfs: set cluster Name as metadata on the subvolume
This change helps read the cluster name from the cmdline args,
the provisioner will set the same on the subvolume.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
25ce21f496 e2e: add test cases for subvolume metadata validation
create a PVC and check PVC/PV metadata on cephFS subvolume

Fixes: #2875
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
466bdf97b2 cephfs: set metadata on restart of provisioner pod
Make sure to set metadata when subvolume exist, i.e. if the provisioner pod
is restarted while createVolume is in progress, say it created the subvolume
but didn't yet set the metadata.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
6bcb8ecc68 cephfs: set PV/PVC details on the subvolume as metadata on create
This helps Monitoring solutions without access to Kubernetes clusters to
display the details of the PV/PVC/NameSpace in their dashboard.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Prasanna Kumar Kalever
ecf03eb6ae cephfs: add set/Get/List/Remove metadata utility functions
Add utility functions to set/Get/List/Remove PV/PVC/PVCNamespace metadata
on subvolume.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-07-28 04:07:52 +00:00
Madhu Rajanna
8c5563a9bc rbd: remove checkHealthyPrimary check
After Failover of workloads to the secondary
cluster when the primary cluster is down,
RBD Image is not marked healthy, and VR
resources are not promoted to the Primary,
In VolumeReplication, the `CURRENT STATE`
remains Unknown and doesn't change to Primary.

This happens because the primary cluster went down,
and we have force promoted the image on the
secondary cluster. and the image stays in
up+stopping_replay or could be any other states.
Currently assumption was that the image will
always be `up+stopped`. But the image will be in
`up+stopped` only for planned failover and it
could be in any other state if its a forced
failover. For this reason, removing
checkHealthyPrimary from the PromoteVolume RPC call.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-27 09:04:27 +00:00
dependabot[bot]
33d4d54dbe rebase: bump google.golang.org/grpc from 1.47.0 to 1.48.0
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.47.0 to 1.48.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.47.0...v1.48.0)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-27 04:04:54 +00:00
Niels de Vos
04889e66db ci: verify that Ceph Mgr is running
The Ceph v17.2.2 container-image fails to start Ceph Mgr. This causes
issues while the e2e test suite is running. It is better to check if
Ceph Mgr is available, before continuing with the rest of the CI job.

Updates: #3259
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-26 12:47:51 +00:00
Madhu Rajanna
3ddec80346 ci: update mergify rules for kubernetes 1.24
Updating mergify rules to consider CI run on
Kubernetes 1.24 and discard CI run on kubernetes
1.21 as we no longer need it.

updates: #3086

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-25 09:37:02 +02:00
Madhu Rajanna
8de063394b e2e: add deadcode nolint for k8sVersionGreaterEquals
k8sVersionGreaterEquals is not used anywhere but it
will be used in future if we need to have a kubernetes
version check. adding nolint for it now to skip it
from static check.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-25 07:35:41 +00:00
Madhu Rajanna
efabe70a46 e2e: remove kubernetes 1.22 check
We run CI jobs on kubernetes 1.22 by default
and we dont need to have a check to make sure
we have atleast Kubernetes 1.22 for few tests.
As we have CI runs on 1.22 by default, Removing
unwanted check.

updates: #3086
depends-on #3255

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-25 07:35:41 +00:00
Niels de Vos
011d4fc81c cleanup: create k8s.io/mount-utils Mounter only once
Recently the k8s.io/mount-utils package added more runtime dectection.
When creating a new Mounter, the detect is run every time. This is
unfortunate, as it logs a message like the following:

```
mount_linux.go:283] Detected umount with safe 'not mounted' behavior
```

This message might be useful, so it probably good to keep it.

In Ceph-CSI there are various locations where Mounter instances are
created. Moving that to the DefaultNodeServer type reduces it to a
single place. Some utility functions need to accept the additional
parameter too, so that has been modified as well.

See-also: kubernetes/kubernetes#109676
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-21 07:14:43 +00:00
Rakshith R
5ed305850f build: resolve a fixme and disable tcmu repo
Cmd to disable apache arrow repo is removed, since
it is no longer needed.
Cmd to disable tcmu repo is added to make build pass.

refer: https://github.com/ceph/ceph-container/issues/2034

Signed-off-by: Rakshith R <rar@redhat.com>
2022-07-20 09:29:35 +00:00
Yati Padia
b0b0e083ad cephfs: add update rbac rule to pv resource
This commit adds the update rbac rule to persistent
volume resource as the ci was failing with below error:
cannot update resource "persistentvolumes" in API group
"" at the cluster scope

Signed-off-by: Yati Padia <ypadia@redhat.com>
2022-07-19 14:42:21 +00:00
Yati Padia
776821f17f deploy: update csi-provisioner to latest version
This commits updates csi-provisioner sidecar to
latest version i.e., v3.2.0.

fixes: #3184

Signed-off-by: Yati Padia <ypadia@redhat.com>
2022-07-19 14:42:21 +00:00
dependabot[bot]
30668c0549 rebase: bump github.com/aws/aws-sdk-go-v2/service/sts
Bumps [github.com/aws/aws-sdk-go-v2/service/sts](https://github.com/aws/aws-sdk-go-v2) from 1.16.7 to 1.16.9.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Changelog](https://github.com/aws/aws-sdk-go-v2/blob/main/CHANGELOG.md)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.16.7...service/ivs/v1.16.9)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/sts
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-19 07:40:39 +00:00
takeaki-matsumoto
1025871021 cephfs: Support mount option on nodeplugin
add mount options on nodeplugin side

Signed-off-by: takeaki-matsumoto <takeaki.matsumoto@linecorp.com>
2022-07-18 22:04:12 +00:00
Madhu Rajanna
ceb88d6498 cephfs: remove extra check for restore size
Looks like cephfs snapshot size is buggy and its
getting removed in ceph fs. we cannot get the size
of the snapshot during CreateVolume call, so we cannot
do any size check at CreateVolume to check if the
restore size is smaller or not.

As we are removing this check it also fixes #3147
but we dont have any validation at CSI level for
smaller restore we need to depend on kubernetes
external-provisioner for it.

fixes: #3147

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-18 10:04:14 +00:00
dependabot[bot]
f8c8ff6c70 rebase: bump k8s.io/klog/v2 from 2.60.1 to 2.70.1
Bumps [k8s.io/klog/v2](https://github.com/kubernetes/klog) from 2.60.1 to 2.70.1.
- [Release notes](https://github.com/kubernetes/klog/releases)
- [Changelog](https://github.com/kubernetes/klog/blob/main/RELEASE.md)
- [Commits](https://github.com/kubernetes/klog/compare/v2.60.1...v2.70.1)

---
updated-dependencies:
- dependency-name: k8s.io/klog/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-15 14:37:18 +00:00
Madhu Rajanna
f171143135 cephfs: round to cephfs size to multiple of 4Mib
Due to the bug in the df stat we need to round off
the subvolume size to align with 4Mib.

Note:- Minimum supported size in cephcsi is 1Mib,
we dont need to take care of Kib.

fixes #3240

More details at https://github.com/ceph/ceph/pull/46905

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-07-13 18:32:40 +00:00
Humble Chirammal
1856647506 cephfs: go with default permissions while creating subvolumes
While creating subvolumes, CephFS driver set the mode to `777`
and pass it along to go ceph apis which cause the subvolume
permission to be on 777, however if we create a subvolume
directly in the ceph cluster, the default permission bits are
set which is 755 for the subvolume. This commit try to stick
to the default behaviour even while creating the subvolume.

This also means that we can work with fsgrouppolicy set to
`File` in csiDriver object which is also addressed in this commit.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-07-13 06:49:58 +00:00
dependabot[bot]
4b709310e2 rebase: bump github.com/IBM/keyprotect-go-client from 0.7.0 to 0.8.0
Bumps [github.com/IBM/keyprotect-go-client](https://github.com/IBM/keyprotect-go-client) from 0.7.0 to 0.8.0.
- [Release notes](https://github.com/IBM/keyprotect-go-client/releases)
- [Commits](https://github.com/IBM/keyprotect-go-client/compare/v0.7.0...v0.8.0)

---
updated-dependencies:
- dependency-name: github.com/IBM/keyprotect-go-client
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-12 17:32:46 +00:00
dependabot[bot]
d6719c4d62 rebase: bump github.com/stretchr/testify from 1.7.2 to 1.8.0
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.7.2 to 1.8.0.
- [Release notes](https://github.com/stretchr/testify/releases)
- [Commits](https://github.com/stretchr/testify/compare/v1.7.2...v1.8.0)

---
updated-dependencies:
- dependency-name: github.com/stretchr/testify
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-12 12:44:18 +00:00
dependabot[bot]
acaf19c66e rebase: bump github.com/hashicorp/vault/api from 1.6.0 to 1.7.2
Bumps [github.com/hashicorp/vault/api](https://github.com/hashicorp/vault) from 1.6.0 to 1.7.2.
- [Release notes](https://github.com/hashicorp/vault/releases)
- [Changelog](https://github.com/hashicorp/vault/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/vault/compare/v1.6.0...v1.7.2)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/vault/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-08 17:11:49 +00:00
Benoît Knecht
507844c9b1 rbd: Use rados namespace when getting clone depth
When the Ceph user is restricted to a specific namespace in the pool, it is
crucial that evey interaction with the cluster is done within that namespace.
This wasn't the case in `getCloneDepth()`.

This issue was causing snapshot creation to fail with

> Failed to check and update snapshot content: failed to take snapshot of the
> volume X: "rpc error: code = Internal desc = rbd: ret=-1, Operation not
> permitted"

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
2022-07-07 22:20:29 +00:00
dependabot[bot]
aed7d8d4e4 rebase: bump k8s.io/kubernetes from 1.24.1 to 1.24.2
Bumps [k8s.io/kubernetes](https://github.com/kubernetes/kubernetes) from 1.24.1 to 1.24.2.
- [Release notes](https://github.com/kubernetes/kubernetes/releases)
- [Commits](https://github.com/kubernetes/kubernetes/compare/v1.24.1...v1.24.2)

---
updated-dependencies:
- dependency-name: k8s.io/kubernetes
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-07 14:39:56 +00:00
Humble Chirammal
08b42e5d67 nfs: make use of latest sidecars in the deployment
The sidecars in the NFS deployment has latest versions which is
also updated for RBD and CephFS drivers. This commit update
the versions in the NFS deployment too.

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2022-07-05 18:06:37 +00:00
Carsten Buchberger
b262f06c33 helm: enable host networking for provisioner
Adds the possibility in the helm-chart to enable hostNetworking
for provider pods.

Signed-off-by: Carsten Buchberger <c.buchberger@witcom.de>
2022-07-04 15:14:59 +00:00
Niels de Vos
14ba1498bf util: reduce systemd related errors while mounting
There are regular reports that identify a non-error as the cause of
failures. The Kubernetes mount-utils package has detection for systemd
based environments, and if systemd is unavailable, the following error
is logged:

    Cannot run systemd-run, assuming non-systemd OS
    systemd-run output: System has not been booted with systemd as init
    system (PID 1). Can't operate.
    Failed to create bus connection: Host is down, failed with: exit status 1

Because of the `failed` and `exit status 1` error message, users might
assume that the mounting failed. This does not need to be the case. The
container-images that the Ceph-CSI projects provides, do not use
systemd, so the error will get logged with each mount attempt.

By using the newer MountSensitiveWithoutSystemd() function from the
mount-utils package where we can, the number of confusing logs get
reduced.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-07-04 10:02:54 +00:00
Niels de Vos
a1ed6207f6 cephfs: report detailed error message on clone failure
go-ceph provides a new GetFailure() method to retrieve details errors
when cloning failed. This is now included in the `cephFSCloneState`
struct, which was a simple string before.

While modifying the `cephFSCloneState` struct, the constants have been
removed, as go-ceph provides them as well.

Fixes: #3140
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-06-30 19:33:41 +00:00
Yati Padia
5c40f1ef33 rbd: remove the clone in case of failure
This commit removes the clone incase
unsetAllMetadata or copyEncryptionConfig or
expand fails for createVolumeFromSnapshot
and CreateSnapshot.
It also removes the clone in case of
any failure in createCloneFromImage.

issue: #3103

Signed-off-by: Yati Padia <ypadia@redhat.com>
2022-06-30 05:50:16 +00:00
Niels de Vos
dbbda5473b e2e: pass non-empty Namespace/Name in deletePVCAndPV()
When getting the PVC or PV failed, the returned object may contain empty
values. If that happens, a retry uses the empty values for Namespace and
Name, which will never be successful.

Instead, use the Namespace and Name attributes from the original object,
and not from the object returned by the Get() call.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-06-30 00:27:27 +00:00
Niels de Vos
2df55a55a3 e2e: use exclusive-lock together with lock_on_read
When using `lock_on_read`, the RBD image needs to have the
`exclusive-lock` feature enabled too.

Fixes: #3221
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-06-29 17:40:17 +00:00
Prasanna Kumar Kalever
29ddfb501b rebase: update minikube to v1.26.0
A new stable release of minikube is available, lets switch to it.
https://github.com/kubernetes/minikube/releases/tag/v1.26.0

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-29 17:40:17 +00:00
Prasanna Kumar Kalever
b56511c0c8 e2e: reduce defaultCloneCount to 3
CI is failing very frequently hitting resource leaks issue,
until we solve the root cause for resource leaks reducing the clone
count from 10 to 3.

related: #2327
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-29 11:08:09 +00:00
Prasanna Kumar Kalever
9fa3c8382b cleanup: reduce struct padding
internal/rbd/rbd_util.go:89:15: struct of size 312 bytes could be of
size 304 bytes:
``
struct{
	RbdImageName   	string,
	ImageID        	string,
	VolID          	string,
	Monitors       	string,
	JournalPool    	string,
	Pool           	string,
	RadosNamespace 	string,
	ClusterID      	string,
	RequestName    	string,
	NamePrefix     	string,
	ParentName     	string,
	ParentPool     	string,
	ClusterName    	string,
	Owner          	string,
	VolSize        	int64,
	StripeCount    	uint64,
	StripeUnit     	uint64,
	ObjectSize     	uint64,
	ImageFeatureSet	github.com/ceph/go-ceph/rbd.FeatureSet,
	encryption
*github.com/ceph/ceph-csi/internal/util.VolumeEncryption,
	CreatedAt
*google.golang.org/protobuf/types/known/timestamppb.Timestamp,
	conn
*github.com/ceph/ceph-csi/internal/util.ClusterConnection,
	ioctx          	*github.com/ceph/go-ceph/rados.IOContext,
	Primary        	bool,
	EnableMetadata 	bool,
}
`` (maligned)
type rbdImage struct {
              ^}`
make: *** [Makefile:118: go-lint] Error 1

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Prasanna Kumar Kalever
29a3f4acf6 cleanup: ReconcilePersistentVolume consider passing it by pointer
Address: hugeParam linter

internal/controller/persistentvolume/persistentvolume.go:59:7:
hugeParam: r is heavy (80 bytes); consider passing it by pointer
(gocritic)
[...]
internal/controller/persistentvolume/persistentvolume.go:135:7:
hugeParam: r is heavy (80 bytes); consider passing it by pointer
(gocritic)
func (r ReconcilePersistentVolume) reconcilePV(ctx context.Context, obj
runtime.Object) error {}

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Prasanna Kumar Kalever
af0bdaf2cb doc: Add documentation about --setmetadata option
Fixes: #2874
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Prasanna Kumar Kalever
dc738b96b4 deploy: add setmetadata=true in the templates
setmetadata on the volume by default, otherwise e2e will fail

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Prasanna Kumar Kalever
caf4090657 rbd: provide option to disable setting metadata on rbd images
As we added support to set the metadata on the rbd images created for
the PVC and volume snapshot, by default metadata is set on all the images.

As we have seen we are hitting issues#2327 a lot of times with this,
we start to leave a lot of stale images. Currently, we rely on
`--extra-create-metadata=true` to decide to set the metadata or not,
we cannot set this option to false to disable setting metadata because we
use this for encryption too.

This changes is to provide an option to disable setting the image
metadata when starting cephcsi.

Fixes: #3009
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2022-06-28 19:12:53 +00:00
Madhu Rajanna
8a47904e8f rbd: add unit test for checkHealthyPrimary
Removed the code in checkHealthyPrimary which
makes the ceph call, passing it as input now.
Added unit test for checkHealthyPrimary function

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Madhu Rajanna
53e76fab69 rbd: fix checkHealthyPrimary to consider up+stopped state
we need to check for image should be in up+stopped state
not anyone of the state for that the we need to use
OR check not the AND check.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Madhu Rajanna
704cb5c941 revert: rbd: consider remote image health for primary
When the image is force promoted to primary on the
cluster the remote image might not be in replaying
state because due to the split brain state. This
PR reverts back the commit
c3c87f2ef3. Which we added
to check the remote image status.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2022-06-28 13:17:11 +00:00
Niels de Vos
34ff13984a ci: prevent panic in retest action on nil strings
In case a PullRequest does not have a MergeableState set, it will be
`nil`. Dereferencing the pointer will cause a Go panic, and the action
won't work as intended.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2022-06-27 15:00:50 +00:00
Rakshith R
bf9e14d2c9 ci: retest only one pr at a time & rebase if necessary
This commit improves retest action by rebasing the pr
if it behind devel branch and adding retests to only one
pr at a time.
refer:
https://docs.github.com/en/graphql/reference/enums#mergestatestatus

Signed-off-by: Rakshith R <rar@redhat.com>
2022-06-27 06:16:13 +00:00