Commit Graph

1708 Commits

Author SHA1 Message Date
Madhu Rajanna
30af703a2f rbd: add controller to main
initialize and start the rbd controller when
we the driver type is controller.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
68bd44beba rbd: add new controller to regenerate omap data
In the case of Disaster Recovery failover, the
user expected to create the static PVC's. We have
planned not to go with the PVC name and namespace
for many reasons (as in kubernetes it's planned to
support PVC transfer to a new namespace with a
different name and with new features coming in
like data populator etc). For now, we are
planning to go with static PVC's to support
async mirroring.

During Async mirroring only the RBD images are
mirrored to the secondary site, and when the
user creates the static PVC's on the failover
we need to regenerate the omap data. The
volumeHandler in PV spec is an encoded string
which contains clusterID and poolID and image UUID,
The clusterID and poolID won't remain same on both
the clusters, for that cephcsi need to generate the
new volume handler and its to create a mapping
between new volume handler and old volume handler
with that whenever cephcsi gets csi requests it
check if the mapping exists it will pull the new
volume handler and continues other operations.

The new controller watches for the PVs created,
It checks if the omap exists if it doesn't it
will regenerate the entire omap data.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
5af3fe5deb rebase: add controller runtime dependency
this commits add the controller runtime
and its dependency to the vendor.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
14700b89d1 rbd: update inuse logic of a rbd image
in case of mirrored image, if the image is
primary a watcher will be added by the rbd
mirror deamon on the rbd image.
we have to consider 2 watcher to check image
is in use.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Madhu Rajanna
ba84f14241 journal: create object with provided UUID
incase of async mirroring the volume UUID is
retrieved from the volume name, instead of cephcsi
generating a new UUID it should reserve the passed
UUID it will be useful when we support both metro DR
and async mirroring on a kubernetes clusters.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-28 18:50:00 +00:00
Niels de Vos
1f18e876f0 e2e: use docker.io/library as prefix for official images
Docker Hub offers a way to pull official images without any project
prefix, like "docker.io/vault:latest". This does a redirect to the
images located under "docker.io/library".

By using the full qualified image name, a redirect gets removed while
pulling the images. This reduces the likelyhood of hittin Docker Hub
pull rate-limits.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-26 13:51:02 +00:00
Niels de Vos
954ac97d22 ci: add more logging during Rook deployment
It seems that the new log_errors() function does not get triggered when
the script hits `exit 1` conditions in functions. The functions should
return a non-0 value, not cause an exit of the script.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-24 18:12:29 +00:00
Niels de Vos
ed033153ea ci: gather logs when deploying Rook fails
Log a few commands that help troubleshooting Rook deployment issues.
This might need to get extended with more commands.

Updates: #1636
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-24 14:53:38 +00:00
Niels de Vos
eaeee8ac3d deploy: use docker.io for unqualified image names
Images that have an unqualified name (no explicit registry) come from
Docker Hub. This can be made explicit by adding docker.io as prefix. In
addition, the default :latest tag has been added too.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-24 10:27:33 +00:00
Niels de Vos
7a69c5e238 build: add ROOK_VERSION to build.env
The CentOS CI jobs use Rook v1.3.9, this version should be places in
build.env just like other versions that the CI jobs detect.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-23 11:15:23 +00:00
Niels de Vos
4fd0294eb7 e2e: pull centos image from registry.centos.org
The BlockVolume PVC tests consume the example files that refer to
"centos:latest" without registry. This means that the images will get
pulled from Docker Hub, which has rate limits preventing CI jobs from
pulling images.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-19 16:00:33 +00:00
Madhu Rajanna
7d229c2369 build: update imagepullpolicy for vault
this allows the image to be reused instead of pulling
it again.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-19 16:00:33 +00:00
Madhu Rajanna
168526a906 e2e: use centos as image for normal user validation
Reduce the number of images that get pulled from Docker Hub. Use the
official CentOS container registry instead.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-19 16:00:33 +00:00
Madhu Rajanna
8d86bac9b7 e2e: set imagePullPolicy to ifNotPresent
If the imagePullPolicy is not set and the image
tag is empty or latest the image is always pulled.
This commit sets the policy to pull image if not
present.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-18 13:43:52 +00:00
Madhu Rajanna
8d3a44d0c4 rbd: add minsnapshotsonimage flag
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time.  Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC  creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2020-11-18 05:59:20 +00:00
Niels de Vos
880b5bb427 ci: use the Fedora container registry for cephcsi:test
This reduces the dependency on Docker, where image pull rate limits are
seen in the CI.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-17 09:28:02 +00:00
Prasanna Kumar Kalever
817edfd1c7 cleanup: remove the use of text in markdown
We do not have `text` in the new section of the MarkDown Rules. Hence
dropping them.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2020-11-11 13:18:05 +00:00
Prasanna Kumar Kalever
8475a3b97e doc: update about a markdown rule in coding guide
Update the coding guide about MD014, i.e.
Dollar signs used before commands without showing output

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2020-11-11 13:18:05 +00:00
Prasanna Kumar Kalever
2945f7b669 cleanup: stick to standards when using dollar-sign in md
MD014 - Dollar signs used before commands without showing output
The dollar signs are unnecessary, it is easier to copy and paste and
less noisy if the dollar signs are omitted. Especially when the
command doesn't list the output, but if the command follows output
we can use `$ ` (dollar+space) mainly to differentiate between
command and its ouput.

scenario 1: when command doesn't follow output
```console
cd ~/work
```

scenario 2: when command follow output (use dollar+space)
```console
$ ls ~/work
file1 file2 dir1 dir2 ...
```

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2020-11-11 13:18:05 +00:00
Prasanna Kumar Kalever
fcaa332921 doc: improve e2e tests guide
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
2020-11-11 13:18:05 +00:00
Yug
3ac6bbd87c cephfs: Add isCloneRetryError function
The function isCloneRetryError verifies
if the clone error is `pending` or
`in-progress` error.

Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
2020-11-09 07:29:12 +00:00
Yug
acbedc52bf cephfs: Add 'pending' state for clone status
In certain cases, clone status can be 'pending'.
In that case, abort error message should be
returned similar to that during 'in-progress'
state.

Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
2020-11-09 07:29:12 +00:00
Niels de Vos
565038fdfd cephfs: ignore quota when SubVolumeInfo() returns Infinite
There is a type-check on BytesQuota after calling SubVolumeInfo() to see
if the value is supported. In case no quota is configured, the value
Infinite is returned. This can not be converted to an int64, so the
original code returned an error.

It seems that attaching/mounting sometimes fails with the following
error:

    FailedMount: MountVolume.MountDevice failed for volume "pvc-0e8fdd18-873b-4420-bd27-fa6c02a49496" : rpc error: code = Internal desc = subvolume csi-vol-0d68d71a-1f5f-11eb-96d2-0242ac110012 has unsupported quota: infinite

By ignoring the quota of Infinite, and not setting a quota in the
Subvolume object, this problem should not happen again.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-06 14:58:26 +00:00
John Mulligan
8a41cd03a5 journal: fix reading omaps from objects with large key counts
The implementation of getOMapValues assumed that the number of key-value
pairs assigned to the object would be close to the number of keys
being requested. When the number of keys on the object exceeded the
"listExcess" value the function would fail to read additional keys
even if they existed in the omap.
This change sets a large fixed "chunk size" value and keeps reading
key-value pairs as long as the callback gets called and increments
the numKeys counter.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
2020-11-06 06:42:22 +00:00
Niels de Vos
7caca1137f e2e: do not use Failf() to abort tests in a go-routine (util)
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.

The CephFS tests have been updated already, this changs only affects the
validatePVCClone() utility function.

Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-06 03:31:39 +00:00
Niels de Vos
45d64ab7d0 e2e: do not use Failf() to abort tests in a go-routine (rbd)
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.

The CephFS tests have been updated already, this changs only affects the
RBD tests.

Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-06 03:31:39 +00:00
Niels de Vos
96eafecad8 e2e: do not use Failf() to abort tests in a go-routine
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.

Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-05 11:26:20 +00:00
Niels de Vos
cba466c163 ci: do not run "make mod-check" for containerized-tests
An updated CI job will run "make mod-check" in parallel with the full
containerized-test and containerized-build targets. This will hopefully
reduced the time that is needed for the whole
ci/centos/containerized-tests job.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-04 05:36:07 +00:00
Niels de Vos
afd6994e19 ci: allow passing USE_PULLED_IMAGE=yes to use pre-pulled images
When passing USE_PULLED_IMAGE=yes to the containerized-test or
containerized-build make targets, it is now possible to use pre-pulled
container images. This saves time as the container images will not get
created from scratch.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-11-04 05:36:07 +00:00
Mudit Agarwal
0ecfd0e72c rbd: replace go-ceph GetParentInfo() with GetParent()
GetParent() is a newer and better version of
GetParentInfo() in go-ceph.

Signed-off-by: Mudit Agarwal <muagarwa@redhat.com>
2020-11-03 08:00:12 +00:00
Niels de Vos
eefaf09ade doc: add common bot commands to GitHub PR template
By placing the common bot commands and their description in the PR
template, developers are reminded on their usage. The idea comes from
the Ceph project where this is done too.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 11:25:34 +00:00
Niels de Vos
523d813b4e doc: allow in-line HTML in MarkDown documents
The GitHub style for Pull Request and Issue templates add HTML tags for
some advanced usage. The MarkDown linter should not give warnings when
these are used.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 11:25:34 +00:00
Niels de Vos
ecc33a9f86 cleanup: no need to validate conf.Vtype twice
conf.Vtype was verified already, no need to do it a 2nd time.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 09:37:36 +00:00
Niels de Vos
4c91f07c78 cleanup: do not panic when validateMaxSnaphostFlag() detects an error
When the cephcsi executable detects an error when calling
validateMaxSnaphostFlag(), it panics due to klog.Fatalln(). The error
that validateMaxSnaphostFlag() logs should be understandable enough, so
that users know what to investigate. A Go panic on a user error is not
very userfriendly, and does not provide any additional usefil
information.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 09:37:36 +00:00
Niels de Vos
3e305970df cleanup: do not panic when validateCloneDepthFlag() detects an error
When the cephcsi executable receives an error when calling
validateCloneDepthFlag(), it panics due to klog.Fatalln(). The errors
that validateCloneDepthFlag() logs should be understandable enough, so
that users know what to investigate. A Go panic on a user error is not
very userfriendly, and does not provide any additional usefil
information.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 09:37:36 +00:00
Niels de Vos
86a8d29bd1 cleanup: do not panic when the metricspath is not a valid URL
When the cephcsi executable receives an error when calling
util.ValidateURL() on the optional "metricspath". The error that
util.ValidateURL() returns should be understandable enough, so that
users know what to investigate. A Go panic on a user error is not very
userfriendly, and does not provide any additional usefil information.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 09:37:36 +00:00
Niels de Vos
23817c1a83 cleanup: do not panic on invalid drivername
When the cephcsi executable receives an error when calling
util.ValidateDriverName(), it panics due to klog.Fatalln(). The error
that util.ValidateDriverName() returns should be understandable enough,
so that users know what to investigate. A Go panic on a user error is
not very userfriendly, and does not provide any additional usefil
information.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 09:37:36 +00:00
Niels de Vos
79e91fa894 cleanup: prevent Go panic on missing driver type
When running the 'cephcsi' executable without arguments, a Go panic is
reported:

    $ ./_output/cephcsi
    F1026 13:59:04.302740 3409054 cephcsi.go:126] driver type not specified
    goroutine 1 [running]:
    k8s.io/klog/v2.stacks(0xc000010001, 0xc0000520a0, 0x48, 0x9a)
    	/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
    k8s.io/klog/v2.(*loggingT).output(0x2370360, 0xc000000003, 0x0, 0x0, 0xc000194770, 0x20cb265, 0xa, 0x7e, 0x413500)
    	/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:945 +0x191
    k8s.io/klog/v2.(*loggingT).println(0x2370360, 0x3, 0x0, 0x0, 0xc000163e08, 0x1, 0x1)
    	/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:699 +0x11a
    k8s.io/klog/v2.Fatalln(...)
    	/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:1456
    main.main()
    	/go/src/github.com/ceph/ceph-csi/cmd/cephcsi.go:126 +0xafa

Just logging the error and exiting should be sufficient. This stack-trace
from the Go panic does not add any useful information.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 09:37:36 +00:00
Niels de Vos
7101a6dc8e cleanup: add logAndExit() for cephcsi:main() to call instead of panic
The main() function of the cephcsi executable calls klog.Fatalln() to
report certain errors. This causes the executable to panic which is not
helpful to users that only need the error message.

By introducing logAndExit(), there is no need to call klog.Fatalln()
anymore.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 09:37:36 +00:00
Niels de Vos
9732cf16a1 cephfs: drop unused Credentials from resizeVolume()
When using go-ceph and the volumeOptions.Connect() call, the credentials
are not needed once the connection is established.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 08:02:12 +00:00
Niels de Vos
5baed6190c cephfs: implement resizeVolume() with go-ceph
Reduce the number of calls to the `ceph fs` executable to improve
performance of CephFS volume resizing.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 08:02:12 +00:00
Niels de Vos
d431402101 cephfs: make resizeVolume() a method of volumeOptions
This prepares resizeVolume() so that the volumeOptions.conn can be used
for connecting with go-ceph and use the connection cache.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 08:02:12 +00:00
Niels de Vos
48108bc549 e2e: retry deploying on API-server timeouts
The upgrade-tests-cephfs fails relative regularly with the following
error during intial deployment:

    timeout waiting for deployment csi-cephfsplugin-provisioner with error error waiting for deployment "csi-cephfsplugin-provisioner" status to match expectation: etcdserver: request timed out

By detecting if the API-server returned a non-fatal error, the test does
not need to abort, but can wait for completion. PollImmediate() will
still return ErrWaitTimeout once the timeout elapsed.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-28 06:56:41 +00:00
Niels de Vos
b26d33b7c1 build: install git as when building from Dockerfile
When running a simple build with only the required arguments, the
following warning are reported:

    $ buildah bud --build-arg=BASE_IMAGE=ceph/ceph:v15 --build-arg=GO_ARCH=amd64 -f ./deploy/cephcsi/image/Dockerfile .
    ...
    STEP 15: COPY . ${SRC_DIR}
    STEP 16: RUN make cephcsi
    cephcsi image settings: quay.io/cephcsi/cephcsi version canary
    make: git: Command not found
    make: git: Command not found
    if [ ! -d ./vendor ]; then (go mod tidy && go mod vendor); fi
    make: git: Command not found
    ...
    STEP 23: COMMIT
    Getting image source signatures
    ...
    Writing manifest to image destination
    Storing signatures
    --> 239b19c4049

git is used to detect the current commit, and store it in the binary
that is built. Without the commit, the "Git Commit:" in the output is
empty, making it impossible to get the exact version:

    $ podman run --rm 239b19c4049 --version
    Cephcsi Version: canary
    Git Commit:
    Go Version: go1.15
    Compiler: gc
    Platform: linux/amd64
    Kernel: 5.8.4-200.fc32.x86_64

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-27 21:46:38 +00:00
Madhu Rajanna
fdbd487741 ci: fix shellcheck in test-go
Fixed shellcheck in test-go script

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-27 17:04:09 +00:00
Niels de Vos
6e3d68f575 ci: use major Kubernetes versions again for Mergify checks
With the update to minikube v1.14.1 downloading binaries for the recent
Kubernetes patch releases works again. The CI jobs have been updated to
use the major versions, and so should Mergify.

Fixes: #1588
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-27 19:35:02 +05:30
Niels de Vos
3c1aff2174 rebase: update minikube to v1.14.1
Minikube 1.14.1 contains a fix for downloading Kubernetes binaries with
version 1.19.3 and 1.18.10. When this version of minikube is used, we
can return to passing major versions to CI jobs (1.19 and 1.18).

Updates: #1588
See-also: kubernetes/minikube#9500
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-27 03:35:31 +00:00
Niels de Vos
381ea22641 ci: create ceph-csi-config ConfigMap for external storage tests
The StorageClasses that get deployed for the Kubernetes e2e external
storage tests reference a ConfigMap that contains the connection details
for the Ceph cluster. Without this ConfigMap, Ceph-CSI will not function
correctly.

Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-23 11:40:28 +00:00
Niels de Vos
6a46b8f17f cephfs: implement getSubVolumeInfo() with go-ceph
Fixes: #1551
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-23 10:58:35 +00:00
Niels de Vos
429f7acd8f cephfs: make getSubVolumeInfo() a method of volumeOptions
Signed-off-by: Niels de Vos <ndevos@redhat.com>
2020-10-23 10:58:35 +00:00