When cloning a volume from a (CSI) snapshot, we use DeepCopy() and do
not need an RBD snapshot as source.
Suggested-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case restoring a snapshot of a thick-PVC failed during DeepCopy(),
the image will exist, but have partial contents. Only when the image has
the thick-provisioned metadata set, it has completed DeepCopy().
When the metadata is missing, the image is deleted, and an error is
returned to the caller. Kubernetes will automatically retry provisioning
on the ABORTED error, and the restoring will get restarted from the
beginning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Wrapcheck is a simple Go linter to check that errors
from external packages are wrapped during return to
help identify the error source during debugging.
This commit addresses the wrapcheck error
Updates:#2025
Signed-off-by: Yati Padia <ypadia@redhat.com>
the parent volume(CreateVolume) and the clone volume
(CreateSnapshot) are both indepedent and parent volume
can be deleted anytime. To check the thick provision
during Snapshot restore(CreateVolume from snapshot)
we need the thick provision metadata so for the same
reason setting the thick provision metadata on the
clone image we are creating at the CreateSnapshot time.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
RbdSnapName holds the actual RBD image name which
got created during the CreateSnapshot operation.
RbdImageName holds the name of the parent from
which the snapshot is created. and the parent
is independent of snapshot and it can be deleted
any time for the same reason using the RbdSnapName
to check the rbd image details.
generate a temporary volume from the snapshot which
replaces the rbdImageName with RbdSnapName and use
it to check the image metadata.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added validation to allow only Restore of Thick PVC
snapshot to a thick clone and creation of thick clone
from thick PVC.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
isThickProvisioned can be used for both snapshot
and clone validation if isThickProvisioned is method
of common rbdImage structure.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
In case cloning a thick-PVC failed during DeepCopy(), the image will
exist, but have partial contents. Only when the image has the
thick-provisioned metadata set, it has completed DeepCopy().
When the metadata is missing, the image is deleted, and an error is
returned to the caller. Kubernetes will automatically retry provisioning
on the ABORTED error, and the cloning will get restarted from the
beginning.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Not all Linux kernels support the deep-flatten feature. Disabling the
feature makes it possible to map RBD images on older kernels (like what
minikube uses).
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Thick-provisioned images are independent, cloned images or snapshots are
deep-flattened during creation. There is no need to try and flatten them
again.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
To create a full-allocated RBD image from a snapshot/clone DeepCopy()
can be used. This is needed when the parent of the new volume is
thick-provisioner, so that the new volume is independent of the parent
and thick-provisioned as well.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
isCompatibleEncryption is used to validate the
requested volume and the existing volume and
the destination volume name wont be generated yet
and logging the destination volume prints the empty
image name with pool name.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
actual error will be present in the stdErr not the error
when we try to add a task to flatten the rbd image. This
commits corrects the error checking when the image does
not have a parent.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit addresses the following issue:
'nolint:gocyclo // complexity needs to be reduced.'
is unused for linter "gocyclo" (nolintlint)
Updates:#2025
Signed-off-by: Yati Padia <ypadia@redhat.com>
This commit returns actual error returned by the go-ceph API
to the function GetPoolName(..) instead of just returning
ErrPoolNotFound everytime there is error getting the pool id.
There is a issue reported in which the snapshot creation
takes much more time to reach True state
(i.e., between 2-7 mins) and keeps trying to create with
below error though pool is present:
rpc error: code = NotFound desc = pool not found: pool ID (21)
not found in Ceph cluster.
Since we cannot interpret the actual error for the delay in
snapshot creation, it is required to return the actual error
as well so that we can uderstand the reason.
Signed-off-by: Yati Padia <ypadia@redhat.com>
While traversing image chain, the parent
image can be present in a different pool
that the one child is in. So, updating
pool name in the next itteration to
that of the Parent.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
Validate Snapshot request to check if the
passed pool name is not empty.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
if the parent and child clones are in
different namespaces we need to open a new
ioctx for pools.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
when clones are created in different pool,we
need to retrieve the parent pool to get the
information of the parent image.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
as we are supporting the creation of clone to a new
pool we need to pass the correct parent volume
to cleanup the snapshot on parent volume.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
added support to create image in different pool.
if the snapshot/rbd image exists in one pool we
can create a clone the clone of the rbd image to
a different pool.
Co-authored-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: Yug <yuggupta27@gmail.com>
In golang world, if you split an empty string that does not contain
the seperator, you get an array with one empty string. This results
in volumes failing to mount with "invalid feature " (note extra space
because it's trying to check if 'empty string' is a valid feature).
This patch checks if the string is empty, and if so, it just decides
to skip the entire validation and returning nothing.
Signed-off-by: Mohammed Naser <mnaser@vexxhost.com>
CreateVolume will fail in below cases
* If the snapshot is encrypted and requested volume
is not encrypted
* If the snapshot is not encrypted and requested
volume is encrypted
* If the parent volume is encrypted and requested volume
is not encrypted
* If the parent volume is not encrypted and requested
volume is encrypted
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
if it is an error from the IsMountPoint
function and the error is not IsNotExist return
it as a internal server error.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Currently we are relaying on the error output from
the umount command we run on the nodes when mounting
the volume but we are not checking for all the error
message to verify the volume is mounted or not.
This commits uses IsMountPoint function in util
to check the mountpoint.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as callers are already taking care of returing
the GRPC error code return the actual error
from the IsMountPoint function.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
when using pre-provisioned volumes, pass these parameters:
- kernelMountOptions
- fuseMountOptions
- subVolumeGroup
in spec.csi.volumeAttributes in PV declaration
Signed-off-by: YingshuoTao <frigid.blues@gmail.com>
Move the repairing of a volume/snapshot from CreateVolume to its own
function. This reduces the complexity of the code, and makes the
procedure easier to understand. Further enhancements to repairing an
exsiting volume can be done in the new function.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit addresses ifshort linter issues which
checks if short syntax for if-statements is possible.
updates: #1586
Signed-off-by: Rakshith R <rar@redhat.com>
This commit removes calling of .String() when logging
since `%s`,`%v` or `%q` will call an existing .String() function
automatically.
Fixes: #2051
Signed-off-by: Rakshith R <rar@redhat.com>
The NodeGetVolumeStats procedure can now be used to fetch the capacity
of the RBD block-device. By default this is a thin-provisioned device,
which means that the capacity is not reserved in the Ceph cluster. This
makes it possible to over-provision the cluster.
In order to detect the amount of storage used by the RBD block-device
(when thin-provisioned), it is required to connect to the Ceph cluster.
Unfortunately, the NodeGetVolumeStats CSI procedure does not provide
enough parameters to connect to the Ceph cluster and fetch more details
about the RBD image.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The CephFS NodeServer should handle the CephFS specific requests. This
is not something that the NodeServer for RBD should handle.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
flatten the image if the deep-flatten feature
is present on the images in the chain or if the
images in chain is not zero, as we cannot check
the deep-flatten feature the images which are
in trash.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
For flatten we call checkImageChainHasFeature
which internally calls to getImageInfo returns
the parent name even if the parent is in the trash,
when we try to open the parent image to get its
information it fails as the image not found.
we should treat error as nil if the parent is not found.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
The new gosec 2.7.0 complains like:
G304 (CWE-22): Potential file inclusion via variable (Confidence: HIGH, Severity: MEDIUM)
Updates: #2025
Signed-off-by: Niels de Vos <ndevos@redhat.com>
To recover from split brain (up+error) state the image need to be
demoted and requested for resync on site-a and then the image on site-b
should gets demoted.The volume should be marked to ready=true when the
image state on both the clusters are up+unknown because during the last
snapshot syncing the data gets copied first and then image state on the
site-a changes to up+unknown.
If the image state on both the sites are up+unknown consider that
complete data is synced as the last snapshot
gets exchanged between the clusters.
* create 10 GB of file and validate the data after resync
* Do Failover when the site-a goes down
* Force promote the image and write data in GiB
* Once the site-a comes back, Demote the image and issue resync
* Demote the image on site-b
* The status will get reflected on the other site when the last
snapshot sync happens
* The image will go to up+unknown state. and complete data will
be copied to site a
* Promote the image on site-a and use it
```bash
csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006:
global_id: e7f9ec55-06ab-46cb-a1ae-784be75ed96d
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:11:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:11:41
```
* Do Failover when the site-a goes down
* Force promote the image on site-b and write data in GiB
* Demote the image on site-b
* Once the site-a comes back, Demote the image on site-a
* The images on the both site will go to split brain state
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+error
description: split-brain
service: a on minicluster2
last_update: 2021-04-28 07:25:41
peer_sites:
name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph
state: up+error
description: split-brain
last_update: 2021-04-28 07:25:26
```
* Issue resync
* The images cannot be resynced because when we issue resync
on site a the image on site-b was in demoted state
* To recover from this state (promote and then demote the
image on site-b after sometime)
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:32:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:32:41
```
* Once the data is copied we can see that the image state
is moved to up+unknown on both sites
* Promote the image on site-a and use it
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
when a Snapshot is encrypted during a CreateSnapshot
operation, the encryption key gets created in the KMS
when we delete the Snapshot the key from the KMS
should also gets deleted.
When we create a volume from snapshot we are copying
required information but we missed to copy the
encryption information, This commit adds the missing
information to delete the encryption key.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>