To recover from split brain (up+error) state the image need to be
demoted and requested for resync on site-a and then the image on site-b
should gets demoted.The volume should be marked to ready=true when the
image state on both the clusters are up+unknown because during the last
snapshot syncing the data gets copied first and then image state on the
site-a changes to up+unknown.
If the image state on both the sites are up+unknown consider that
complete data is synced as the last snapshot
gets exchanged between the clusters.
* create 10 GB of file and validate the data after resync
* Do Failover when the site-a goes down
* Force promote the image and write data in GiB
* Once the site-a comes back, Demote the image and issue resync
* Demote the image on site-b
* The status will get reflected on the other site when the last
snapshot sync happens
* The image will go to up+unknown state. and complete data will
be copied to site a
* Promote the image on site-a and use it
```bash
csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006:
global_id: e7f9ec55-06ab-46cb-a1ae-784be75ed96d
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:11:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:11:41
```
* Do Failover when the site-a goes down
* Force promote the image on site-b and write data in GiB
* Demote the image on site-b
* Once the site-a comes back, Demote the image on site-a
* The images on the both site will go to split brain state
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+error
description: split-brain
service: a on minicluster2
last_update: 2021-04-28 07:25:41
peer_sites:
name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph
state: up+error
description: split-brain
last_update: 2021-04-28 07:25:26
```
* Issue resync
* The images cannot be resynced because when we issue resync
on site a the image on site-b was in demoted state
* To recover from this state (promote and then demote the
image on site-b after sometime)
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:32:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:32:41
```
* Once the data is copied we can see that the image state
is moved to up+unknown on both sites
* Promote the image on site-a and use it
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 07a916b84d)
when a Snapshot is encrypted during a CreateSnapshot
operation, the encryption key gets created in the KMS
when we delete the Snapshot the key from the KMS
should also gets deleted.
When we create a volume from snapshot we are copying
required information but we missed to copy the
encryption information, This commit adds the missing
information to delete the encryption key.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit c3bae17fce)
At present we return the volume connect error if the clone
from snapshot fails when rbdvolume is encrypted, which is incorrect.
This patch correctly return the failed copy encryption error to the
caller
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
(cherry picked from commit 798437d0c4)
Before RBD map operation, we do check the
watchers on the RBD image. In the case of
RWO volume. cephcsi makes sure only one
client is using the RBD image. If the rbd
image is mirrored, by default mirroring
daemon will add a watcher on the image
and as we are using go-ceph a watcher will
be added as we have opened the image So
we will have two watchers on an image if
mirroring is enabled. This holds when the
rbd mirror daemon is running, In case if
the mirror daemon is not running there will
be only one watcher on the rbd image
(which is placed by go-ceph image open)
we should not block the map operation if
the mirroring daemon is not running as
its Async mirroring. This commit adds a
check to make sure no more than 2 watchers
if the image is mirrored or no more than 1
watcher if it is not mirrored image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit 52290333e6)
If the pool or few keys are missing in the omap.
GetImageAttributes function returns nil error message and few
empty items in imageAttributes struct. if the image is not
found and the entiries are missing use
the volumeId present on the PV annotation for further operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
incase if the image is promoted and demoted the
image state will be set to up+unknown if the image
on the remote cluster is still in demoted state.
when user changes the state from primary to secondary
and still the image is in demoted (secondary) state
in the remote cluster. the image state on both the cluster
will be on unknown state.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It helps to get a stack trace when debugging issues. Certain things are
considered bugs in the code (like missing attributes in a struct), and
might cause a panic in certain occasions.
In this case, a missing string will not panic, but the behaviour will
also not be correct (DEKs getting encrypted, but unable to decrypt).
Clearly logging this as a BUG is probably better than calling panic().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
It is possible that when a provisioner restarts after a snapshot was
cloned, but before the newly restored image had its encryption metadata
set, the new image is not marked as encrypted. This will prevent
attaching/mounting the image, as the encryption key will not be fetched,
or is not available in the DEKStore.
By actively repairing the encryption configuration when needed, this
problem should be addressed.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
buildCreateVolumeResponse() exists exactly for the need to create a
csi.CreateVolumeResponse based on an rbdVolume. Calling this helper
reduces the code duplication in CreateVolume().
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbdVolume that needs its encryption configured is constructed in the
Exists() method. It is suitable to move the copyEncryptionConfig() call
there as well, so that the object is completely constructed in a single
place.
Golang-ci:gocyclo complained about the increased complexity of the
Exists() function. Moving the repairing of the ImageID into its own
helper function makes the code a little easier to understand.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Introduce helper function cloneFromSnapshot() that takes care of the
procedures that are needed when an existing snapshot has been found.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When a source volume is encrypted, the passphrase needs to be copied and
stored for the newly cloned volume.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Cloning volumes requires copying the DEK from the source to the newly
cloned volume. Introduce copyEncryptionConfig() as a helper for that.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Without this, the rbdVolume can not connect to the Ceph cluster and
configure the (optional) encryption.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The ControllerServer should not need to care about support for
encryption, ideally it is transparantly handled by the rbdVolume type
and its internal API.
Deleting the DEK was one of the last remainders that was explicitly done
inside the ControllerServer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In the case of the Async DR, the volumeID will
not be the same if the clusterID or the PoolID
is different, With Earlier implementation, it
is expected that the new volumeID mapping is
stored in the rados omap pool. In the case of the
ControllerExpand or the DeleteVolume Request,
the only volumeID will be sent it's not possible
to find the corresponding poolID in the new cluster.
With This Change, it works as below
The csi-rbdplugin-controller will watch for the PV
objects, when there are any PV objects created it
will check the omap already exists, If the omap doesn't
exist it will generate the new volumeID and it checks for
the volumeID mapping entry in the PV annotation, if the
mapping does not exist, it will add the new entry
to the PV annotation.
The cephcsi will check for the PV annotations if the
omap does not exist if the mapping exists in the PV
annotation, it will use the new volumeID for further
operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It seems that newer versions of some tools/libraries identify encrypted
filesystems with `crypto_LUKS` instead of `crypt`.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
In case of the DR the image on the primary site cannot be
demoted as the cluster is down, during failover the image need
to be force promoted. RBD returns `Device or resource busy`
error message if the image cannot be promoted for above reason.
Return FailedPrecondition so that replication operator can send
request to force promote the image.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
instead of fetching the force option from the
parameters. Use the Force field available in
the PromoteVolume Request.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
as the org github.com/kube-storage is renamed
to github.com/csi-addons as the name kube-storage
was more generic.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Incase of resync the image will get deleted, gets
recreated and its a a time consuming operation.
It makes sense to return aborted error instead
of not found as we have omap data only the image
is missing in rbd pool.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Do resync if the image is in unknow or in error
state.
Check for the current image state for up+stopped
or up+replaying and also all peer site status
should be un up+stopped to confirm that resyncing
is done and image can be promoted and used.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added replication related operations as a method
of rbdImage as these methods can be easily used
when we introduce volumesnaphot mirroring operations.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
the rbd mirror state can be in enabled,disabled
or disabling state. If the mirroring is not disabled
yet and still in disabling state. we need to
check for it and return abort error message if
the mirroring is still getting disabled.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added ReplicationServer struct for the replication related
operation it also embed the ControllerServer which
already implements the helper functions like locking/unlocking etc.
removed getVolumeFromID and cleanup functions for better
code readability and easy maintaince.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
There is no need for each EncryptionKMS to implement the same GetID()
function. We have a VolumeEncryption type that is more suitable for
keeping track of the KMS-ID that was used to get the configuration of
the KMS.
This does not change any metadata that is stored anywhere.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Current rbd plugin only supports the layering feature
for rbd image. Add exclusive-lock and journaling image
features for the rbd.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Signed-off-by: woohhan <woohyung_han@tmax.co.kr>
rbdVolumes can have several resources that get allocated during its
usage. Only destroying the IOContext may not be suffiecient and can
cause resource leaks.
Use rbdVolume.Destroy() when the rbdVolume is not used anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Connections are reference counted, so just assigning the connection to
an other object for re-use is not correct. This can cause connections to
be garbage collected while something else is still using it.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
as RBD is implementing the replication
we are registering it. For CephFS, its
not implementing the replication we are
passing nil so we dont want to register
it.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Because rbdVolume and rbdSnapshot are very similar, they can be based
off a common struct rbdImage that contains the common attributes and
functions.
This makes it possible to re-use functions for snapshots, and prevents
further duplication or code.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbdSnapshot and rbdVolume structs have many common attributes. In
order to combine these into an rbdImage struct that implements shared
functionality, having the same attribute for the ID makes things much
easier.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
NewVolumeEncryption() will return an indication that an alternative
DEKStore needs to be configured in case the KMS does not support it.
setKMS() will also set the DEKStore if needed, so renaming it to
configureEncryption() makes things clearer.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Use DEKStore API for Fetching and Storing passphrases.
Drop the fallback for the old KMS interface that is now provided as
DEKStore. The original implementation has been re-used for the DEKStore
interface.
This also moves GetCryptoPassphrase/StoreNewCryptoPassphrase functions
to methods of VolumeEncryption.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Prepare for grouping encryption related functions together. The main
rbdVolume object should not be cluttered with KMS or DEK procedures.
Signed-off-by: Niels de Vos <ndevos@redhat.com>