On occasion deploying Vault fails. It seems the vault-init-job batch job
does not use a full-qualified-image-name for the "vault" container.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Use a qualified image name, including the registry where it should come
from. This makes it possible to pull the image from the right location,
and consume it in CI jobs without trying to pull again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
getCloneInfo() does not need to return a full CloneStatus struct that
only has one member. Instead, it can just return the value of the single
member, so the JSON type/struct does not need to be exposed.
This makes the API for getCloneInfo() a little simpler, so it can be
replaced by a go-ceph implementation later on.
As the function does not return any of the unused attributes anymore, it
is renamed to getCloneStatu() as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Uses github.com/libopenstorage/secrets to communicate with Vault. This
removes the need for maintaining our own limited Vault APIs.
By adding the new dependency, several other packages got updated in the
process. Unused indirect dependencies have been removed from go.mod.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Once the Vault API removed a secret, the contents will have been wiped.
The key is still available, until it gets destroyed. This causes the e2e
test to detect an empty secret, and assume that it has not been deleted
yet.
By requesting the `data` field from the secret, an error is thrown in
case the secret has been wiped. This makes it possible for the e2e test
to detect that the secret has been removed and scheduled for destroying.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Instead of the hand-rolled Vault usage, use the libopenstorage/secrets
package that provides a nice API. The support for Vault becomes much
simpler and maintainable that way.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Add "sys/mounts" so that VaultBackendKey does not need to be set. The
libopenstorage API detects the version for the key-value store in Vault
by reading "sys/mounts". Without permissions to read this endpoint, the
VaultBackendKey is required to be configured.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
updated deployment template for the new controller and
also added `update` configmap RBAC for the controller
as the controller uses the configmap for the leader
election.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
earlier if the depth check fails the
complete vol struct was getting logged,
this commits logs only the pool and image
name.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commits adds an E2E testing
to verify the metadata created by controller,
We are not checking the generated omap data,
but we will be verify PVC resize and binding
pvc to application.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In the case of Disaster Recovery failover, the
user expected to create the static PVC's. We have
planned not to go with the PVC name and namespace
for many reasons (as in kubernetes it's planned to
support PVC transfer to a new namespace with a
different name and with new features coming in
like data populator etc). For now, we are
planning to go with static PVC's to support
async mirroring.
During Async mirroring only the RBD images are
mirrored to the secondary site, and when the
user creates the static PVC's on the failover
we need to regenerate the omap data. The
volumeHandler in PV spec is an encoded string
which contains clusterID and poolID and image UUID,
The clusterID and poolID won't remain same on both
the clusters, for that cephcsi need to generate the
new volume handler and its to create a mapping
between new volume handler and old volume handler
with that whenever cephcsi gets csi requests it
check if the mapping exists it will pull the new
volume handler and continues other operations.
The new controller watches for the PVs created,
It checks if the omap exists if it doesn't it
will regenerate the entire omap data.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
in case of mirrored image, if the image is
primary a watcher will be added by the rbd
mirror deamon on the rbd image.
we have to consider 2 watcher to check image
is in use.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase of async mirroring the volume UUID is
retrieved from the volume name, instead of cephcsi
generating a new UUID it should reserve the passed
UUID it will be useful when we support both metro DR
and async mirroring on a kubernetes clusters.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Docker Hub offers a way to pull official images without any project
prefix, like "docker.io/vault:latest". This does a redirect to the
images located under "docker.io/library".
By using the full qualified image name, a redirect gets removed while
pulling the images. This reduces the likelyhood of hittin Docker Hub
pull rate-limits.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It seems that the new log_errors() function does not get triggered when
the script hits `exit 1` conditions in functions. The functions should
return a non-0 value, not cause an exit of the script.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Log a few commands that help troubleshooting Rook deployment issues.
This might need to get extended with more commands.
Updates: #1636
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Images that have an unqualified name (no explicit registry) come from
Docker Hub. This can be made explicit by adding docker.io as prefix. In
addition, the default :latest tag has been added too.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The CentOS CI jobs use Rook v1.3.9, this version should be places in
build.env just like other versions that the CI jobs detect.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The BlockVolume PVC tests consume the example files that refer to
"centos:latest" without registry. This means that the images will get
pulled from Docker Hub, which has rate limits preventing CI jobs from
pulling images.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Reduce the number of images that get pulled from Docker Hub. Use the
official CentOS container registry instead.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If the imagePullPolicy is not set and the image
tag is empty or latest the image is always pulled.
This commit sets the policy to pull image if not
present.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
We do not have `text` in the new section of the MarkDown Rules. Hence
dropping them.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Update the coding guide about MD014, i.e.
Dollar signs used before commands without showing output
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
MD014 - Dollar signs used before commands without showing output
The dollar signs are unnecessary, it is easier to copy and paste and
less noisy if the dollar signs are omitted. Especially when the
command doesn't list the output, but if the command follows output
we can use `$ ` (dollar+space) mainly to differentiate between
command and its ouput.
scenario 1: when command doesn't follow output
```console
cd ~/work
```
scenario 2: when command follow output (use dollar+space)
```console
$ ls ~/work
file1 file2 dir1 dir2 ...
```
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
The function isCloneRetryError verifies
if the clone error is `pending` or
`in-progress` error.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
In certain cases, clone status can be 'pending'.
In that case, abort error message should be
returned similar to that during 'in-progress'
state.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
There is a type-check on BytesQuota after calling SubVolumeInfo() to see
if the value is supported. In case no quota is configured, the value
Infinite is returned. This can not be converted to an int64, so the
original code returned an error.
It seems that attaching/mounting sometimes fails with the following
error:
FailedMount: MountVolume.MountDevice failed for volume "pvc-0e8fdd18-873b-4420-bd27-fa6c02a49496" : rpc error: code = Internal desc = subvolume csi-vol-0d68d71a-1f5f-11eb-96d2-0242ac110012 has unsupported quota: infinite
By ignoring the quota of Infinite, and not setting a quota in the
Subvolume object, this problem should not happen again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The implementation of getOMapValues assumed that the number of key-value
pairs assigned to the object would be close to the number of keys
being requested. When the number of keys on the object exceeded the
"listExcess" value the function would fail to read additional keys
even if they existed in the omap.
This change sets a large fixed "chunk size" value and keeps reading
key-value pairs as long as the callback gets called and increments
the numKeys counter.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.
The CephFS tests have been updated already, this changs only affects the
validatePVCClone() utility function.
Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.
The CephFS tests have been updated already, this changs only affects the
RBD tests.
Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>
There are several go-routines where Failf() is called, which will cause
a Golang panic inside the Ginko test framework. Instead of aborting the
go-routine, capture the error and check for failures once all
go-routines have finished.
Updates: #1359
Signed-off-by: Niels de Vos <ndevos@redhat.com>