Commit Graph

40 Commits

Author SHA1 Message Date
ShyamsundarR
e5e332eded Use correct file descriptor to parse errors
File descriptors in use to parse errors from a few command
invocations were incorrect. This led to inability to detect
certain errors cases and act accordingly.

One of the easiest noticeable issues was when an image is deleted
but its RADOS keys and maps are still intact. In such cases
the DeleteVolume call always errored out unable to find the
image rather than, proceed with cleaning up the RADOS objects
and returning a success.

The original method of using stdout was incorrect, as the command
was tested from within a shell script and the scripts STDIN/OUT/ERR
was redirected to understand behavior. This is now tested using just
the CLI in question, and also examining Ceph code, and further
testing a couple of edge conditions by deleting backing images
for PVs

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-07-16 07:51:10 +00:00
Poornima G
32ea550e3a Modify CephFs provisioner to use the ceph mgr commands
Currently CephFs provisioner mounts the ceph filesystem
and creates a subdirectory as a part of provisioning the
volume. Ceph now supports commands to provision fs subvolumes,
hance modify the provisioner to use ceph mgr commands to
(de)provision fs subvolumes.

Signed-off-by: Poornima G <pgurusid@redhat.com>
2019-07-12 05:42:41 +00:00
ShyamsundarR
c4a3675cec Move locks to more granular locking than CPU count based
As detailed in issue #279, current lock scheme has hash
buckets that are count of CPUs. This causes a lot of contention
when parallel requests are made to the CSI plugin. To reduce
lock contention, this commit introduces granular locks per
identifier.

The commit also changes the timeout for gRPC requests to Create
and Delete volumes, as the current timeout is 10s (kubernetes
documentation says 15s but code defaults are 10s). A virtual
setup takes about 12-15s to complete a request at times, that leads
to unwanted retries of the same request, hence the increased
timeout to enable operation completion with minimal retries.

Tests to create PVCs before and after these changes look like so,

Before:
Default master code + sidecar provisioner --timeout option set
to 30 seconds

20 PVCs
Creation: 3 runs, 396/391/400 seconds
Deletion: 3 runs, 218/271/118 seconds
  - Once was stalled for more than 8 minutes and cancelled the run

After:
Current commit + sidecar provisioner --timeout option set to 30 sec
20 PVCs
Creation: 3 runs, 42/59/65 seconds
Deletion: 3 runs, 32/32/31 seconds

Fixes: #279
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-07-01 14:10:14 +00:00
ShyamsundarR
bc39c523b7 Fix returning success from DeleteSnapshot for stale requests
Also reduced code duplication in fetching pool list from Ceph.

DeleteSnapshot like DeleteVolume, should return a success when it
detects that the snapshot keys are missing from the RADOS OMaps that
store the snapshot UUID to request name mapping.

This was missing in the code, and is now added.

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-07-01 10:54:53 +00:00
ShyamsundarR
c5762b6b5c Modify RBD plugin to use a single ID and move the id and key into the secret
RBD plugin needs only a single ID to manage images and operations against a
pool, mentioned in the storage class. The current scheme of 2 IDs is hence not
needed and removed in this commit.

Further, unlike CephFS plugin, the RBD plugin splits the user id and the key
into the storage class and the secret respectively. Also the parameter name
for the key in the secret is noted in the storageclass making it a variant and
hampers usability/comprehension. This is also fixed by moving the id and the key
to the secret and not retaining the same in the storage class, like CephFS.

Fixes #270

Testing done:
- Basic PVC creation and mounting

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-06-24 13:46:14 +00:00
Madhu Rajanna
a38986fce0 Enable all static-checks in golangci-lint
* Enable all static-checks in golangci-lint
* Update golangci-lint version
* Fix issue found in golangci-lint

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2019-06-10 15:56:17 +05:30
Madhu Rajanna
7d3a6105c7 Fix misspell words
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2019-06-10 12:52:13 +05:30
ShyamsundarR
b9cd0e18ad Make CephFS plugin stateless reusing RADOS based journal scheme
This is a part of the stateless set of commits for CephCSI.

This commit removes the dependency on config maps to store cephFS provisioned
volumes, and instead relies on RADOS based objects and keys, and required
CSI VolumeID encoding to detect the provisioned volumes.

Changes:
- Provide backward compatibility to provisioned volumes by older plugin versions (1.0.0 or older)
- Remove Create/Delete support for statically provisioned volumes (fixes #382)
- Added namespace support to RADOS OMaps and used the same to store RADOS CSI objects and keys in the CephFS metadata pool
- Added support to mention fsname for CephFS provisioning (fixes #359)
- Changed field name in CSI Identifier to 'location', to denote a pool or fscid
- Updated mounter cache to use new scheme
- Required Helm manifests are updated
- Required documentation and other manifests are updated
- Made driver option 'metadatastorage' as optional, as fresh installs do not need to specify the same

Testing done:
- Create/Mount/Delete PVC
- Create/Delete 5 PVCs
- Mount version 1.0.0 PVC
- Delete version 1.0.0 PV
- Mount Statically defined PV/PVC/Pod
- Mount Statically defined version 1.0.0 PV/PVC/Pod
- Delete Statically defined version 1.0.0 PV/PVC/Pod
- Node restart when mounted to test mountcache
- Use InstanceID other than 'default'
- RBD basic round of tests, as namespace is added to OMaps
- csitest against ceph-fs plugin
  - NOTE: CephFS plugin still does not detect and address already created
  volumes but of a different size
- Test not providing any value to the metadata storage parameter

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-05-30 06:20:35 -04:00
ShyamsundarR
1406f29dcd Refactor voljournal to aid reuse with CephFS
and to also inmprove the code reuse in rbd itself.

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-05-30 09:58:40 +00:00
ShyamsundarR
d02e50aa9b Removed config maps and replaced with rados omaps
Existing config maps are now replaced with rados omaps that help
store information regarding the requested volume names and the rbd
image names backing the same.

Further to detect cluster, pool and which image a volume ID refers
to, changes to volume ID encoding has been done as per provided
design specification in the stateless ceph-csi proposal.

Additional changes and updates,
- Updated documentation
- Updated manifests
- Updated Helm chart
- Addressed a few csi-test failures

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-05-19 12:29:33 +00:00
wilmardo
891daa9375 Replaces the references to the Kubernete Authors with the Ceph-CSI authors 2019-04-03 11:14:08 +02:00
Róbert Vašek
d0d5da83c9
Merge pull request #282 from huaizong/improve-remount-pv-path-when-exit-v2
remount old mount point when csi plugin unexpect exit
2019-04-02 08:36:07 +02:00
王怀宗
af330fe68e 1. fix mountcache race conflict
2. support user-defined cache dir
    3. if not define mountcachedir disable mountcache
2019-03-27 16:04:58 +08:00
ShyamsundarR
ba2e5cff51 Address remenant subject reference and code style reviews
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-03-26 16:19:24 +00:00
ShyamsundarR
fc0cf957be Updated code and docs to reflect correct terminology
- Updated instances of fsid with clusterid
- Updated instances of credentials/subject with user/key

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-03-26 16:19:24 +00:00
ShyamsundarR
c9c1c871fc Removed a couple of debug logs
Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-03-26 16:19:24 +00:00
ShyamsundarR
2064e674a4 Addressed using k8s client APIs to fetch secrets
Based on the review comments addressed the following,
- Moved away from having to update the pod with volumes
when a new Ceph cluster is added for provisioning via the
CSI driver

- The above now used k8s APIs to fetch secrets
  - TBD: Need to add a watch mechanisim such that these
secrets can be cached and updated when changed

- Folded the Cephc configuration and ID/key config map
and secrets into a single secret

- Provided the ability to read the same config via mapped
or created files within the pod

Tests:
- Ran PV creation/deletion/attach/use using new scheme
StorageClass
- Ran PV creation/deletion/attach/use using older scheme
to ensure nothing is broken
- Did not execute snapshot related tests

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-03-26 16:19:24 +00:00
ShyamsundarR
97f8c4b677 Provide options to pass in Ceph cluster-id
This commit provides the option to pass in Ceph cluster-id instead
of a MON list from the storage class.

This helps in moving towards a stateless CSI implementation.

Tested the following,
- PV provisioning and staging using cluster-id in storage class
- PV provisioning and staging using MON list in storage class

Did not test,
- snapshot operations in either forms of the storage class

Signed-off-by: ShyamsundarR <srangana@redhat.com>
2019-03-26 16:19:24 +00:00
王怀宗
b318964af5 issue #91
issue #217

Goal

we try to solve when csi exit unexpect, the pod use cephfs pv can not auto recovery because lost mount relation until pod be killed and reschedule to other node. i think this is may be a problem. may be csi plugin can do more thing to remount the old path so when pod may be auto recovery when pod exit and restart, the old mount path can use.

NoGoal

Pod should exit and restart when csi plugin pod exit and mount point lost. if pod not exit will get error of **transport endpoint is not connected**.

implment logic

csi-plugin start:

	1. load all MountCachEntry  from node local dir
	2. check if volID exist in cluster, if no we ignore this entry, if yes continue
	3. check if stagingPath exist, if yes we mount the path
	4. check if all targetPath exist, if yes we binmount to staging path

NodeServer:

1. NodeStageVolume: add MountCachEntry on local dir include readonly attr and ceph secret
2. NodeStagePublishVolume: add pod bind mount path to MountCachEntry  and persist local dir
3. NodeStageunPublishVolume: remove pod bind mount path From MountCachEntry  and persist local dir
4. NodeStageunStageVolume: remove MountCachEntry  from local dir
2019-03-25 22:47:39 +08:00
Róbert Vašek
a4dd845735
Merge pull request #223 from Madhu-1/fix-222-1.0
update driver name as per csi spec
2019-03-14 06:38:13 +01:00
Madhu Rajanna
d61a87b42e Fix driver name as per CSI spec
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2019-03-13 12:04:30 +05:30
Madhu Rajanna
16279eda78 Roundup volume size to Mib for rbd
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2019-03-04 19:17:28 +05:30
Madhu Rajanna
6f4f148d3b remove glog
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-02-27 14:17:19 +05:30
gman
e5dbea15d3 util/cachepersister: check and return CacheEntryNotFound error in Get() 2019-02-25 18:05:20 +01:00
gman
0235b9c249 k8s metadata cache: delete shouldn't fail on NotFound errors 2019-02-20 20:20:44 +01:00
Madhu Rajanna
fd4c019aba cleanup: remove duplicate code
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-02-19 13:44:10 +05:30
gman
8223ae325b addressed review comments 2019-02-14 13:55:51 +00:00
gman
892d65d387 added StripSecretInArgs in pkg/util 2019-02-14 13:55:51 +00:00
gman
6099f142f0 moved klog initialization into pkg/util package 2019-02-12 16:31:55 +01:00
Humble Chirammal
c0712db08a migrate util package to klog from glog.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
2019-02-05 12:09:04 +00:00
Madhu Rajanna
03d93219d7 Fix metalinter issue
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-01-29 11:37:03 +05:30
Madhu Rajanna
50ba8ed446 Fix gometalinter issues
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-01-29 11:24:36 +05:30
Madhu Rajanna
ca2e475296 Fix gometalinter issues
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-01-29 11:23:50 +05:30
Madhu Rajanna
7a0c233c27 Fix issues found in gometalinter
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-01-29 11:20:35 +05:30
Madhu Rajanna
25642fe404 Add method comments
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-01-29 11:20:35 +05:30
Madhu Rajanna
5eb1974e38 Fix vetshadow issues
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-01-25 14:15:25 +05:30
Madhu Rajanna
9f76f6bd59 Remove dead code
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-01-25 14:10:52 +05:30
Madhu Rajanna
20af5afcab Fix golint issues
Signed-off-by: Madhu Rajanna <mrajanna@redhat.com>
2019-01-16 18:33:38 +05:30
mickymiek
32cb974b8c gofmt 2019-01-14 20:15:09 +00:00
mickymiek
62d65ad0cb cm metadata persist for rbd and cephfs 2019-01-14 20:15:09 +00:00