In the case of Disaster Recovery failover, the
user expected to create the static PVC's. We have
planned not to go with the PVC name and namespace
for many reasons (as in kubernetes it's planned to
support PVC transfer to a new namespace with a
different name and with new features coming in
like data populator etc). For now, we are
planning to go with static PVC's to support
async mirroring.
During Async mirroring only the RBD images are
mirrored to the secondary site, and when the
user creates the static PVC's on the failover
we need to regenerate the omap data. The
volumeHandler in PV spec is an encoded string
which contains clusterID and poolID and image UUID,
The clusterID and poolID won't remain same on both
the clusters, for that cephcsi need to generate the
new volume handler and its to create a mapping
between new volume handler and old volume handler
with that whenever cephcsi gets csi requests it
check if the mapping exists it will pull the new
volume handler and continues other operations.
The new controller watches for the PVs created,
It checks if the omap exists if it doesn't it
will regenerate the entire omap data.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
in case of mirrored image, if the image is
primary a watcher will be added by the rbd
mirror deamon on the rbd image.
we have to consider 2 watcher to check image
is in use.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase of async mirroring the volume UUID is
retrieved from the volume name, instead of cephcsi
generating a new UUID it should reserve the passed
UUID it will be useful when we support both metro DR
and async mirroring on a kubernetes clusters.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Docker Hub offers a way to pull official images without any project
prefix, like "docker.io/vault:latest". This does a redirect to the
images located under "docker.io/library".
By using the full qualified image name, a redirect gets removed while
pulling the images. This reduces the likelyhood of hittin Docker Hub
pull rate-limits.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It seems that the new log_errors() function does not get triggered when
the script hits `exit 1` conditions in functions. The functions should
return a non-0 value, not cause an exit of the script.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Log a few commands that help troubleshooting Rook deployment issues.
This might need to get extended with more commands.
Updates: #1636
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Images that have an unqualified name (no explicit registry) come from
Docker Hub. This can be made explicit by adding docker.io as prefix. In
addition, the default :latest tag has been added too.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The CentOS CI jobs use Rook v1.3.9, this version should be places in
build.env just like other versions that the CI jobs detect.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The BlockVolume PVC tests consume the example files that refer to
"centos:latest" without registry. This means that the images will get
pulled from Docker Hub, which has rate limits preventing CI jobs from
pulling images.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reduce the number of images that get pulled from Docker Hub. Use the
official CentOS container registry instead.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the imagePullPolicy is not set and the image
tag is empty or latest the image is always pulled.
This commit sets the policy to pull image if not
present.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
We do not have `text` in the new section of the MarkDown Rules. Hence
dropping them.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Update the coding guide about MD014, i.e.
Dollar signs used before commands without showing output
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
MD014 - Dollar signs used before commands without showing output
The dollar signs are unnecessary, it is easier to copy and paste and
less noisy if the dollar signs are omitted. Especially when the
command doesn't list the output, but if the command follows output
we can use `$ ` (dollar+space) mainly to differentiate between
command and its ouput.
scenario 1: when command doesn't follow output
```console
cd ~/work
```
scenario 2: when command follow output (use dollar+space)
```console
$ ls ~/work
file1 file2 dir1 dir2 ...
```
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The function isCloneRetryError verifies
if the clone error is `pending` or
`in-progress` error.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
In certain cases, clone status can be 'pending'.
In that case, abort error message should be
returned similar to that during 'in-progress'
state.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
There is a type-check on BytesQuota after calling SubVolumeInfo() to see
if the value is supported. In case no quota is configured, the value
Infinite is returned. This can not be converted to an int64, so the
original code returned an error.
It seems that attaching/mounting sometimes fails with the following
error:
FailedMount: MountVolume.MountDevice failed for volume "pvc-0e8fdd18-873b-4420-bd27-fa6c02a49496" : rpc error: code = Internal desc = subvolume csi-vol-0d68d71a-1f5f-11eb-96d2-0242ac110012 has unsupported quota: infinite
By ignoring the quota of Infinite, and not setting a quota in the
Subvolume object, this problem should not happen again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The implementation of getOMapValues assumed that the number of key-value
pairs assigned to the object would be close to the number of keys
being requested. When the number of keys on the object exceeded the
"listExcess" value the function would fail to read additional keys
even if they existed in the omap.
This change sets a large fixed "chunk size" value and keeps reading
key-value pairs as long as the callback gets called and increments
the numKeys counter.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.
The CephFS tests have been updated already, this changs only affects the
validatePVCClone() utility function.
Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.
The CephFS tests have been updated already, this changs only affects the
RBD tests.
Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.
Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>
An updated CI job will run "make mod-check" in parallel with the full
containerized-test and containerized-build targets. This will hopefully
reduced the time that is needed for the whole
ci/centos/containerized-tests job.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When passing USE_PULLED_IMAGE=yes to the containerized-test or
containerized-build make targets, it is now possible to use pre-pulled
container images. This saves time as the container images will not get
created from scratch.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
By placing the common bot commands and their description in the PR
template, developers are reminded on their usage. The idea comes from
the Ceph project where this is done too.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The GitHub style for Pull Request and Issue templates add HTML tags for
some advanced usage. The MarkDown linter should not give warnings when
these are used.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the cephcsi executable detects an error when calling
validateMaxSnaphostFlag(), it panics due to klog.Fatalln(). The error
that validateMaxSnaphostFlag() logs should be understandable enough, so
that users know what to investigate. A Go panic on a user error is not
very userfriendly, and does not provide any additional usefil
information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the cephcsi executable receives an error when calling
validateCloneDepthFlag(), it panics due to klog.Fatalln(). The errors
that validateCloneDepthFlag() logs should be understandable enough, so
that users know what to investigate. A Go panic on a user error is not
very userfriendly, and does not provide any additional usefil
information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the cephcsi executable receives an error when calling
util.ValidateURL() on the optional "metricspath". The error that
util.ValidateURL() returns should be understandable enough, so that
users know what to investigate. A Go panic on a user error is not very
userfriendly, and does not provide any additional usefil information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the cephcsi executable receives an error when calling
util.ValidateDriverName(), it panics due to klog.Fatalln(). The error
that util.ValidateDriverName() returns should be understandable enough,
so that users know what to investigate. A Go panic on a user error is
not very userfriendly, and does not provide any additional usefil
information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When running the 'cephcsi' executable without arguments, a Go panic is
reported:
$ ./_output/cephcsi
F1026 13:59:04.302740 3409054 cephcsi.go:126] driver type not specified
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc000010001, 0xc0000520a0, 0x48, 0x9a)
/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x2370360, 0xc000000003, 0x0, 0x0, 0xc000194770, 0x20cb265, 0xa, 0x7e, 0x413500)
/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:945 +0x191
k8s.io/klog/v2.(*loggingT).println(0x2370360, 0x3, 0x0, 0x0, 0xc000163e08, 0x1, 0x1)
/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:699 +0x11a
k8s.io/klog/v2.Fatalln(...)
/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:1456
main.main()
/go/src/github.com/ceph/ceph-csi/cmd/cephcsi.go:126 +0xafa
Just logging the error and exiting should be sufficient. This stack-trace
from the Go panic does not add any useful information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The main() function of the cephcsi executable calls klog.Fatalln() to
report certain errors. This causes the executable to panic which is not
helpful to users that only need the error message.
By introducing logAndExit(), there is no need to call klog.Fatalln()
anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When using go-ceph and the volumeOptions.Connect() call, the credentials
are not needed once the connection is established.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reduce the number of calls to the `ceph fs` executable to improve
performance of CephFS volume resizing.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This prepares resizeVolume() so that the volumeOptions.conn can be used
for connecting with go-ceph and use the connection cache.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The upgrade-tests-cephfs fails relative regularly with the following
error during intial deployment:
timeout waiting for deployment csi-cephfsplugin-provisioner with error error waiting for deployment "csi-cephfsplugin-provisioner" status to match expectation: etcdserver: request timed out
By detecting if the API-server returned a non-fatal error, the test does
not need to abort, but can wait for completion. PollImmediate() will
still return ErrWaitTimeout once the timeout elapsed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When running a simple build with only the required arguments, the
following warning are reported:
$ buildah bud --build-arg=BASE_IMAGE=ceph/ceph:v15 --build-arg=GO_ARCH=amd64 -f ./deploy/cephcsi/image/Dockerfile .
...
STEP 15: COPY . ${SRC_DIR}
STEP 16: RUN make cephcsi
cephcsi image settings: quay.io/cephcsi/cephcsi version canary
make: git: Command not found
make: git: Command not found
if [ ! -d ./vendor ]; then (go mod tidy && go mod vendor); fi
make: git: Command not found
...
STEP 23: COMMIT
Getting image source signatures
...
Writing manifest to image destination
Storing signatures
--> 239b19c4049
git is used to detect the current commit, and store it in the binary
that is built. Without the commit, the "Git Commit:" in the output is
empty, making it impossible to get the exact version:
$ podman run --rm 239b19c4049 --version
Cephcsi Version: canary
Git Commit:
Go Version: go1.15
Compiler: gc
Platform: linux/amd64
Kernel: 5.8.4-200.fc32.x86_64
Signed-off-by: Niels de Vos <ndevos@redhat.com>
With the update to minikube v1.14.1 downloading binaries for the recent
Kubernetes patch releases works again. The CI jobs have been updated to
use the major versions, and so should Mergify.
Fixes: #1588
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Minikube 1.14.1 contains a fix for downloading Kubernetes binaries with
version 1.19.3 and 1.18.10. When this version of minikube is used, we
can return to passing major versions to CI jobs (1.19 and 1.18).
Updates: #1588
See-also: kubernetes/minikube#9500
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The StorageClasses that get deployed for the Kubernetes e2e external
storage tests reference a ConfigMap that contains the connection details
for the Ceph cluster. Without this ConfigMap, Ceph-CSI will not function
correctly.
Signed-off-by: Niels de Vos <ndevos@redhat.com>