75fa1927fc
To recover from split brain (up+error) state the image need to be
demoted and requested for resync on site-a and then the image on site-b
should gets demoted.The volume should be marked to ready=true when the
image state on both the clusters are up+unknown because during the last
snapshot syncing the data gets copied first and then image state on the
site-a changes to up+unknown.
If the image state on both the sites are up+unknown consider that
complete data is synced as the last snapshot
gets exchanged between the clusters.
* create 10 GB of file and validate the data after resync
* Do Failover when the site-a goes down
* Force promote the image and write data in GiB
* Once the site-a comes back, Demote the image and issue resync
* Demote the image on site-b
* The status will get reflected on the other site when the last
snapshot sync happens
* The image will go to up+unknown state. and complete data will
be copied to site a
* Promote the image on site-a and use it
```bash
csi-vol-5633715e-a7eb-11eb-bebb-0242ac110006:
global_id: e7f9ec55-06ab-46cb-a1ae-784be75ed96d
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:11:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:11:41
```
* Do Failover when the site-a goes down
* Force promote the image on site-b and write data in GiB
* Demote the image on site-b
* Once the site-a comes back, Demote the image on site-a
* The images on the both site will go to split brain state
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+error
description: split-brain
service: a on minicluster2
last_update: 2021-04-28 07:25:41
peer_sites:
name: abbda0f0-0117-4425-8cb2-deb4c853da47-rook-ceph
state: up+error
description: split-brain
last_update: 2021-04-28 07:25:26
```
* Issue resync
* The images cannot be resynced because when we issue resync
on site a the image on site-b was in demoted state
* To recover from this state (promote and then demote the
image on site-b after sometime)
```bash
csi-vol-37effcb5-a7f1-11eb-bebb-0242ac110006:
global_id: 115c3df9-3d4f-4c04-93a7-531b82155ddf
state: up+unknown
description: remote image demoted
service: a on minicluster1
last_update: 2021-04-28 07:32:56
peer_sites:
name: e47e29f4-96e8-44ed-b6c6-edf15c5a91d6-rook-ceph
state: up+unknown
description: remote image demoted
last_update: 2021-04-28 07:32:41
```
* Once the data is copied we can see that the image state
is moved to up+unknown on both sites
* Promote the image on site-a and use it
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
(cherry picked from commit
|
||
---|---|---|
.github | ||
assets | ||
charts | ||
cmd | ||
deploy | ||
docs | ||
e2e | ||
examples | ||
internal | ||
scripts | ||
troubleshooting/tools | ||
vendor | ||
.commitlintrc.yml | ||
.gitignore | ||
.mergify.yml | ||
.pre-commit-config.yaml | ||
.travis.yml | ||
build.env | ||
deploy.sh | ||
go.mod | ||
go.sum | ||
LICENSE | ||
Makefile | ||
README.md |
Ceph CSI
This repo contains Ceph Container Storage Interface (CSI) driver for RBD, CephFS and kubernetes sidecar deployment yamls of provisioner, attacher, resizer, driver-registrar and snapshotter for supporting CSI functionalities.
Overview
Ceph CSI plugins implement an interface between CSI enabled Container Orchestrator (CO) and Ceph cluster. It allows dynamically provisioning Ceph volumes and attaching them to workloads.
Independent CSI plugins are provided to support RBD and CephFS backed volumes,
- For details about configuration and deployment of RBD plugin, please refer rbd doc and for CephFS plugin configuration and deployment please refer cephfs doc.
- For example usage of RBD and CephFS CSI plugins, see examples in
examples/
. - Stale resource cleanup, please refer cleanup doc.
NOTE:
- Ceph CSI
Arm64
support is experimental.
Project status
Status: GA
Supported CO platforms
Ceph CSI drivers are currently developed and tested exclusively on Kubernetes environments. There is work in progress to make this CO independent and thus support other orchestration environments in the future.
NOTE:
csiv0.3
is deprecated with release ofcsi v1.1.0
Support Matrix
Ceph-CSI features and available versions
Plugin | Features | Feature Status | CSI Driver Version | CSI Spec Version | Ceph Cluster Version | Kubernetes Version |
---|---|---|---|---|---|---|
RBD | Dynamically provision, de-provision Block mode RWO volume | GA | >= v1.0.0 | >= v1.0.0 | Nautilus (>=14.0.0) | >= v1.14.0 |
Dynamically provision, de-provision Block mode RWX volume | GA | >= v1.0.0 | >= v1.0.0 | Nautilus (>=14.0.0) | >= v1.14.0 | |
Dynamically provision, de-provision File mode RWO volume | GA | >= v1.0.0 | >= v1.0.0 | Nautilus (>=14.0.0) | >= v1.14.0 | |
Provision File Mode ROX volume from snapshot | Alpha | >= v3.0.0 | >= v1.0.0 | Nautilus (>=v14.2.2) | >= v1.17.0 | |
Provision File Mode ROX volume from another volume | Alpha | >= v3.0.0 | >= v1.0.0 | Nautilus (>=v14.2.2) | >= v1.16.0 | |
Provision Block Mode ROX volume from snapshot | Alpha | >= v3.0.0 | >= v1.0.0 | Nautilus (>=v14.2.2) | >= v1.17.0 | |
Provision Block Mode ROX volume from another volume | Alpha | >= v3.0.0 | >= v1.0.0 | Nautilus (>=v14.2.2) | >= v1.16.0 | |
Creating and deleting snapshot | Alpha | >= v1.0.0 | >= v1.0.0 | Nautilus (>=14.0.0) | >= v1.17.0 | |
Provision volume from snapshot | Alpha | >= v1.0.0 | >= v1.0.0 | Nautilus (>=14.0.0) | >= v1.17.0 | |
Provision volume from another volume | Alpha | >= v1.0.0 | >= v1.0.0 | Nautilus (>=14.0.0) | >= v1.16.0 | |
Expand volume | Beta | >= v2.0.0 | >= v1.1.0 | Nautilus (>=14.0.0) | >= v1.15.0 | |
Metrics Support | Beta | >= v1.2.0 | >= v1.1.0 | Nautilus (>=14.0.0) | >= v1.15.0 | |
Topology Aware Provisioning Support | Alpha | >= v2.1.0 | >= v1.1.0 | Nautilus (>=14.0.0) | >= v1.14.0 | |
CephFS | Dynamically provision, de-provision File mode RWO volume | Beta | >= v1.1.0 | >= v1.0.0 | Nautilus (>=14.2.2) | >= v1.14.0 |
Dynamically provision, de-provision File mode RWX volume | Beta | >= v1.1.0 | >= v1.0.0 | Nautilus (>=v14.2.2) | >= v1.14.0 | |
Dynamically provision, de-provision File mode ROX volume | Alpha | >= v3.0.0 | >= v1.0.0 | Nautilus (>=v14.2.2) | >= v1.14.0 | |
Creating and deleting snapshot | Alpha | >= v3.1.0 | >= v1.0.0 | Octopus (>=v15.2.3) | >= v1.17.0 | |
Provision volume from snapshot | Alpha | >= v3.1.0 | >= v1.0.0 | Octopus (>=v15.2.3) | >= v1.17.0 | |
Provision volume from another volume | Alpha | >= v3.1.0 | >= v1.0.0 | Octopus (>=v15.2.3) | >= v1.16.0 | |
Expand volume | Beta | >= v2.0.0 | >= v1.1.0 | Nautilus (>=v14.2.2) | >= v1.15.0 | |
Metrics | Beta | >= v1.2.0 | >= v1.1.0 | Nautilus (>=v14.2.2) | >= v1.15.0 |
NOTE
: The Alpha
status reflects possible non-backward
compatible changes in the future, and is thus not recommended
for production use.
CSI spec and Kubernetes version compatibility
Please refer to the matrix in the Kubernetes documentation.
Ceph CSI Container images and release compatibility
Ceph CSI Release/Branch | Container image name | Image Tag |
---|---|---|
Master (Branch) | quay.io/cephcsi/cephcsi | canary |
v3.2.0 (Release) | quay.io/cephcsi/cephcsi | v3.2.0 |
v3.1.2 (Release) | quay.io/cephcsi/cephcsi | v3.1.2 |
v3.1.1 (Release) | quay.io/cephcsi/cephcsi | v3.1.1 |
v3.1.0 (Release) | quay.io/cephcsi/cephcsi | v3.1.0 |
v3.0.0 (Release) | quay.io/cephcsi/cephcsi | v3.0.0 |
v2.1.2 (Release) | quay.io/cephcsi/cephcsi | v2.1.2 |
v2.1.1 (Release) | quay.io/cephcsi/cephcsi | v2.1.1 |
v2.1.0 (Release) | quay.io/cephcsi/cephcsi | v2.1.0 |
v2.0.1 (Release) | quay.io/cephcsi/cephcsi | v2.0.1 |
v2.0.0 (Release) | quay.io/cephcsi/cephcsi | v2.0.0 |
v1.2.2 (Release) | quay.io/cephcsi/cephcsi | v1.2.2 |
v1.2.1 (Release) | quay.io/cephcsi/cephcsi | v1.2.1 |
v1.2.0 (Release) | quay.io/cephcsi/cephcsi | v1.2.0 |
v1.1.0 (Release) | quay.io/cephcsi/cephcsi | v1.1.0 |
v1.0.0 (Branch) | quay.io/cephcsi/cephfsplugin | v1.0.0 |
v1.0.0 (Branch) | quay.io/cephcsi/rbdplugin | v1.0.0 |
Contributing to this repo
Please follow development-guide and coding style guidelines if you are interested to contribute to this repo.
Troubleshooting
Please submit an issue at: Issues
Weekly Bug Triage call
We conduct weekly bug triage calls at our slack channel on Tuesdays. More details are available here
Dev standup
A regular dev standup takes place every other Monday,Tuesday,Thursday at
12:00 PM UTC. Convert to your local
timezone by executing command date -d "12:00 UTC"
on terminal
Any changes to the meeting schedule will be added to the agenda doc.
Anyone who wants to discuss the direction of the project, design and implementation reviews, or general questions with the broader community is welcome and encouraged to join.
- Meeting link: https://redhat.bluejeans.com/702977652
- Current agenda
Contact
Please use the following to reach members of the community:
- Slack: Join our slack channel to discuss about anything related to this project. You can join the slack by this invite link
- Forums: ceph-csi
- Twitter: @CephCsi