mirror of
https://github.com/ceph/ceph-csi.git
synced 2025-06-14 18:53:35 +00:00
doc: few corrections or typo fixing in design documentation
- Fixes spelling mistakes. - Grammatical error correction. - Wrapping the text at 80 line count..etc Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit is contained in:
committed by
mergify[bot]
parent
12e8e46bcf
commit
3196b798cc
@ -6,50 +6,49 @@ snapshot contents and then mount that volume to workloads.
|
||||
|
||||
CephFS exposes snapshots as special, read-only directories of a subvolume
|
||||
located in `<subvolume>/.snap`. cephfs-csi can already provision writable
|
||||
volumes with snapshots as their data source, where snapshot contents are
|
||||
cloned to the newly created volume. However, cloning a snapshot to volume
|
||||
is a very expensive operation in CephFS as the data needs to be fully copied.
|
||||
When the need is to only read snapshot contents, snapshot cloning is extremely
|
||||
volumes with snapshots as their data source, where snapshot contents are cloned
|
||||
to the newly created volume. However, cloning a snapshot to volume is a very
|
||||
expensive operation in CephFS as the data needs to be fully copied. When the
|
||||
need is to only read snapshot contents, snapshot cloning is extremely
|
||||
inefficient and wasteful.
|
||||
|
||||
This proposal describes a way for cephfs-csi to expose CephFS snapshots
|
||||
as shallow, read-only volumes, without needing to clone the underlying
|
||||
snapshot data.
|
||||
This proposal describes a way for cephfs-csi to expose CephFS snapshots as
|
||||
shallow, read-only volumes, without needing to clone the underlying snapshot
|
||||
data.
|
||||
|
||||
## Use-cases
|
||||
|
||||
What's the point of such read-only volumes?
|
||||
|
||||
* **Restore snapshots selectively:** users may want to traverse snapshots,
|
||||
restoring data to a writable volume more selectively instead of restoring
|
||||
the whole snapshot.
|
||||
* **Volume backup:** users can't backup a live volume, they first need
|
||||
to snapshot it. Once a snapshot is taken, it still can't be backed-up,
|
||||
as backup tools usually work with volumes (that are exposed as file-systems)
|
||||
restoring data to a writable volume more selectively instead of restoring the
|
||||
whole snapshot.
|
||||
* **Volume backup:** users can't backup a live volume, they first need to
|
||||
snapshot it. Once a snapshot is taken, it still can't be backed-up, as backup
|
||||
tools usually work with volumes (that are exposed as file-systems)
|
||||
and not snapshots (which might have backend-specific format). What this means
|
||||
is that in order to create a snapshot backup, users have to clone snapshot
|
||||
data twice:
|
||||
|
||||
1. first time, when restoring the snapshot into a temporary volume from
|
||||
where the data will be read,
|
||||
1. and second time, when transferring that volume into some backup/archive
|
||||
storage (e.g. object store).
|
||||
1. first time, when restoring the snapshot into a temporary volume from
|
||||
where the data will be read,
|
||||
1. and second time, when transferring that volume into some backup/archive
|
||||
storage (e.g. object store).
|
||||
|
||||
The temporary backed-up volume will most likely be thrown away after the
|
||||
backup transfer is finished. That's a lot of wasted work for what we
|
||||
originally wanted to do! Having the ability to create volumes from
|
||||
snapshots cheaply would be a big improvement for this use case.
|
||||
originally wanted to do! Having the ability to create volumes from snapshots
|
||||
cheaply would be a big improvement for this use case.
|
||||
|
||||
## Alternatives
|
||||
|
||||
* _Snapshots are stored in `<subvolume>/.snap`. Users could simply visit this
|
||||
directory by themselves._
|
||||
|
||||
`.snap` is CephFS-specific detail of how snapshots are exposed.
|
||||
Users / tools may not be aware of this special directory, or it may not fit
|
||||
their workflow. At the moment, the idiomatic way of accessing snapshot
|
||||
contents in CSI drivers is by creating a new volume and populating it
|
||||
with snapshot.
|
||||
`.snap` is CephFS-specific detail of how snapshots are exposed. Users / tools
|
||||
may not be aware of this special directory, or it may not fit their workflow.
|
||||
At the moment, the idiomatic way of accessing snapshot contents in CSI drivers
|
||||
is by creating a new volume and populating it with snapshot.
|
||||
|
||||
## Design
|
||||
|
||||
@ -57,21 +56,21 @@ Key points:
|
||||
|
||||
* Volume source is a snapshot, volume access mode is `*_READER_ONLY`.
|
||||
* No actual new subvolumes are created in CephFS.
|
||||
* The resulting volume is a reference to the source subvolume snapshot.
|
||||
This reference would be stored in `Volume.volume_context` map. In order
|
||||
to reference a snapshot, we need subvol name and snapshot name.
|
||||
* Mounting such volume means mounting the respective CephFS subvolume
|
||||
and exposing the snapshot to workloads.
|
||||
* Let's call a *shallow read-only volume with a subvolume snapshot
|
||||
as its data source* just a *shallow volume* from here on out for brevity.
|
||||
* The resulting volume is a reference to the source subvolume snapshot. This
|
||||
reference would be stored in `Volume.volume_context` map. In order to
|
||||
reference a snapshot, we need subvol name and snapshot name.
|
||||
* Mounting such volume means mounting the respective CephFS subvolume and
|
||||
exposing the snapshot to workloads.
|
||||
* Let's call a *shallow read-only volume with a subvolume snapshot as its data
|
||||
source* just a *shallow volume* from here on out for brevity.
|
||||
|
||||
### Controller operations
|
||||
|
||||
Care must be taken when handling life-times of relevant storage resources.
|
||||
When a shallow volume is created, what would happen if:
|
||||
Care must be taken when handling life-times of relevant storage resources. When
|
||||
a shallow volume is created, what would happen if:
|
||||
|
||||
* _Parent subvolume of the snapshot is removed while the shallow volume
|
||||
still exists?_
|
||||
* _Parent subvolume of the snapshot is removed while the shallow volume still
|
||||
exists?_
|
||||
|
||||
This shouldn't be a problem already. The parent volume has either
|
||||
`snapshot-retention` subvol feature in which case its snapshots remain
|
||||
@ -80,8 +79,8 @@ When a shallow volume is created, what would happen if:
|
||||
* _Source snapshot from which the shallow volume originates is removed while
|
||||
that shallow volume still exists?_
|
||||
|
||||
We need to make sure this doesn't happen and some book-keeping
|
||||
is necessary. Ideally we could employ some kind of reference counting.
|
||||
We need to make sure this doesn't happen and some book-keeping is necessary.
|
||||
Ideally we could employ some kind of reference counting.
|
||||
|
||||
#### Reference counting for shallow volumes
|
||||
|
||||
@ -92,26 +91,26 @@ When creating a volume snapshot, a reference tracker (RT), represented by a
|
||||
RADOS object, would be created for that snapshot. It would store information
|
||||
required to track the references for the backing subvolume snapshot. Upon a
|
||||
`CreateSnapshot` call, the reference tracker (RT) would be initialized with a
|
||||
single reference record, where the CSI snapshot itself is the first reference
|
||||
to the backing snapshot. Each subsequent shallow volume creation would add a
|
||||
new reference record to the RT object. Each shallow volume deletion would
|
||||
remove that reference from the RT object. Calling `DeleteSnapshot` would remove
|
||||
the reference record that was previously added in `CreateSnapshot`.
|
||||
single reference record, where the CSI snapshot itself is the first reference to
|
||||
the backing snapshot. Each subsequent shallow volume creation would add a new
|
||||
reference record to the RT object. Each shallow volume deletion would remove
|
||||
that reference from the RT object. Calling `DeleteSnapshot` would remove the
|
||||
reference record that was previously added in `CreateSnapshot`.
|
||||
|
||||
The subvolume snapshot would be removed from the Ceph cluster only once the RT
|
||||
object holds no references. Note that this behavior would permit calling
|
||||
`DeleteSnapshot` even if it is still referenced by shallow volumes.
|
||||
|
||||
* `DeleteSnapshot`:
|
||||
* RT holds no references or the RT object doesn't exist:
|
||||
delete the backing snapshot too.
|
||||
* RT holds at least one reference: keep the backing snapshot.
|
||||
* RT holds no references or the RT object doesn't exist:
|
||||
delete the backing snapshot too.
|
||||
* RT holds at least one reference: keep the backing snapshot.
|
||||
* `DeleteVolume`:
|
||||
* RT holds no references: delete the backing snapshot too.
|
||||
* RT holds at least one reference: keep the backing snapshot.
|
||||
* RT holds no references: delete the backing snapshot too.
|
||||
* RT holds at least one reference: keep the backing snapshot.
|
||||
|
||||
To enable creating shallow volumes from snapshots that were provisioned by
|
||||
older versions of cephfs-csi (i.e. before this feature is introduced),
|
||||
To enable creating shallow volumes from snapshots that were provisioned by older
|
||||
versions of cephfs-csi (i.e. before this feature is introduced),
|
||||
`CreateVolume` for shallow volumes would also create an RT object in case it's
|
||||
missing. It would be initialized to two: the source snapshot and the newly
|
||||
created shallow volume.
|
||||
@ -141,17 +140,17 @@ Things to look out for:
|
||||
|
||||
It doesn't consume any space on the filesystem. `Volume.capacity_bytes` is
|
||||
allowed to contain zero. We could use that.
|
||||
* _What should be the requested size when creating the volume (specified e.g.
|
||||
in PVC)?_
|
||||
* _What should be the requested size when creating the volume (specified e.g. in
|
||||
PVC)?_
|
||||
|
||||
This one is tricky. CSI spec allows for
|
||||
`CreateVolumeRequest.capacity_range.{required_bytes,limit_bytes}` to be
|
||||
zero. On the other hand,
|
||||
`PersistentVolumeClaim.spec.resources.requests.storage` must be bigger
|
||||
than zero. cephfs-csi doesn't care about the requested size (the volume
|
||||
will be read-only, so it has no usable capacity) and would always set it
|
||||
to zero. This shouldn't case any problems for the time being, but still
|
||||
is something we should keep in mind.
|
||||
`CreateVolumeRequest.capacity_range.{required_bytes,limit_bytes}` to be zero.
|
||||
On the other hand,
|
||||
`PersistentVolumeClaim.spec.resources.requests.storage` must be bigger than
|
||||
zero. cephfs-csi doesn't care about the requested size (the volume will be
|
||||
read-only, so it has no usable capacity) and would always set it to zero. This
|
||||
shouldn't case any problems for the time being, but still is something we
|
||||
should keep in mind.
|
||||
|
||||
`CreateVolume` and behavior when using volume as volume source (PVC-PVC clone):
|
||||
|
||||
@ -167,8 +166,8 @@ Volume deletion is trivial.
|
||||
|
||||
### `CreateSnapshot`
|
||||
|
||||
Snapshotting read-only volumes doesn't make sense in general, and should
|
||||
be rejected.
|
||||
Snapshotting read-only volumes doesn't make sense in general, and should be
|
||||
rejected.
|
||||
|
||||
### `ControllerExpandVolume`
|
||||
|
||||
@ -194,8 +193,8 @@ whole subvolume first, and only then perform the binds to target paths.
|
||||
#### For case (a)
|
||||
|
||||
Subvolume paths are normally retrieved by
|
||||
`ceph fs subvolume info/getpath <VOLUME NAME> <SUBVOLUME NAME> <SUBVOLUMEGROUP NAME>`,
|
||||
which outputs a path like so:
|
||||
`ceph fs subvolume info/getpath <VOLUME NAME> <SUBVOLUME NAME> <SUBVOLUMEGROUP NAME>`
|
||||
, which outputs a path like so:
|
||||
|
||||
```
|
||||
/volumes/<VOLUME NAME>/<SUBVOLUME NAME>/<UUID>
|
||||
@ -217,12 +216,12 @@ itself still exists or not.
|
||||
|
||||
#### For case (b)
|
||||
|
||||
For cases where subvolumes are managed externally and not by cephfs-csi, we
|
||||
must assume that the cephx user we're given can access only
|
||||
For cases where subvolumes are managed externally and not by cephfs-csi, we must
|
||||
assume that the cephx user we're given can access only
|
||||
`/volumes/<VOLUME NAME>/<SUBVOLUME NAME>/<UUID>` so users won't be able to
|
||||
benefit from snapshot retention. Users will need to be careful not to delete
|
||||
the parent subvolumes and snapshots while they are associated by these shallow
|
||||
RO volumes.
|
||||
benefit from snapshot retention. Users will need to be careful not to delete the
|
||||
parent subvolumes and snapshots while they are associated by these shallow RO
|
||||
volumes.
|
||||
|
||||
### `NodePublishVolume`, `NodeUnpublishVolume`
|
||||
|
||||
@ -235,38 +234,38 @@ mount.
|
||||
|
||||
## Volume parameters, volume context
|
||||
|
||||
This section provides a discussion around determinig what volume parameters and
|
||||
This section provides a discussion around determining what volume parameters and
|
||||
volume context parameters will be used to convey necessary information to the
|
||||
cephfs-csi driver in order to support shallow volumes.
|
||||
|
||||
Volume parameters `CreateVolumeRequest.parameters`:
|
||||
|
||||
* Should be "shallow" the default mode for all `CreateVolume` calls that have
|
||||
(a) snapshot as data source and (b) read-only volume access mode? If not,
|
||||
a new volume parameter should be introduced: e.g `isShallow: <bool>`. On the
|
||||
(a) snapshot as data source and (b) read-only volume access mode? If not, a
|
||||
new volume parameter should be introduced: e.g `isShallow: <bool>`. On the
|
||||
other hand, does it even makes sense for users to want to create full copies
|
||||
of snapshots and still have them read-only?
|
||||
|
||||
Volume context `Volume.volume_context`:
|
||||
|
||||
* Here we definitely need `isShallow` or similar. Without it we wouldn't be
|
||||
able to distinguish between a regular volume that just happens to have
|
||||
a read-only access mode, and a volume that references a snapshot.
|
||||
* Here we definitely need `isShallow` or similar. Without it we wouldn't be able
|
||||
to distinguish between a regular volume that just happens to have a read-only
|
||||
access mode, and a volume that references a snapshot.
|
||||
* Currently cephfs-csi recognizes `subvolumePath` for dynamically provisioned
|
||||
volumes and `rootPath` for pre-previsioned volumes. As mentioned in
|
||||
[`NodeStageVolume`, `NodeUnstageVolume` section](#NodeStageVolume-NodeUnstageVolume),
|
||||
snapshots cannot be mounted directly. How do we pass in path to the parent
|
||||
[`NodeStageVolume`, `NodeUnstageVolume` section](#NodeStageVolume-NodeUnstageVolume)
|
||||
, snapshots cannot be mounted directly. How do we pass in path to the parent
|
||||
subvolume?
|
||||
* a) Path to the snapshot is passed in via `subvolumePath` / `rootPath`,
|
||||
e.g.
|
||||
`/volumes/<VOLUME NAME>/<SUBVOLUME NAME>/<UUID>/.snap/<SNAPSHOT NAME>`.
|
||||
From that we can derive path to the subvolume: it's the parent of `.snap`
|
||||
directory.
|
||||
* b) Similar to a), path to the snapshot is passed in via `subvolumePath` /
|
||||
`rootPath`, but instead of trying to derive the right path we introduce
|
||||
another volume context parameter containing path to the parent subvolume
|
||||
explicitly.
|
||||
* c) `subvolumePath` / `rootPath` contains path to the parent subvolume and
|
||||
we introduce another volume context parameter containing name of the
|
||||
snapshot. Path to the snapshot is then formed by appending
|
||||
`/.snap/<SNAPSHOT NAME>` to the subvolume path.
|
||||
* a) Path to the snapshot is passed in via `subvolumePath` / `rootPath`,
|
||||
e.g.
|
||||
`/volumes/<VOLUME NAME>/<SUBVOLUME NAME>/<UUID>/.snap/<SNAPSHOT NAME>`.
|
||||
From that we can derive path to the subvolume: it's the parent of `.snap`
|
||||
directory.
|
||||
* b) Similar to a), path to the snapshot is passed in via `subvolumePath` /
|
||||
`rootPath`, but instead of trying to derive the right path we introduce
|
||||
another volume context parameter containing path to the parent subvolume
|
||||
explicitly.
|
||||
* c) `subvolumePath` / `rootPath` contains path to the parent subvolume and
|
||||
we introduce another volume context parameter containing name of the
|
||||
snapshot. Path to the snapshot is then formed by appending
|
||||
`/.snap/<SNAPSHOT NAME>` to the subvolume path.
|
||||
|
Reference in New Issue
Block a user