doc: few corrections or typo fixing in design documentation

- Fixes spelling mistakes.
- Grammatical error correction.
- Wrapping the text at 80 line count..etc

Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
This commit is contained in:
Humble Chirammal
2021-12-20 14:51:47 +05:30
committed by mergify[bot]
parent 12e8e46bcf
commit 3196b798cc
6 changed files with 245 additions and 240 deletions

View File

@ -6,50 +6,49 @@ snapshot contents and then mount that volume to workloads.
CephFS exposes snapshots as special, read-only directories of a subvolume
located in `<subvolume>/.snap`. cephfs-csi can already provision writable
volumes with snapshots as their data source, where snapshot contents are
cloned to the newly created volume. However, cloning a snapshot to volume
is a very expensive operation in CephFS as the data needs to be fully copied.
When the need is to only read snapshot contents, snapshot cloning is extremely
volumes with snapshots as their data source, where snapshot contents are cloned
to the newly created volume. However, cloning a snapshot to volume is a very
expensive operation in CephFS as the data needs to be fully copied. When the
need is to only read snapshot contents, snapshot cloning is extremely
inefficient and wasteful.
This proposal describes a way for cephfs-csi to expose CephFS snapshots
as shallow, read-only volumes, without needing to clone the underlying
snapshot data.
This proposal describes a way for cephfs-csi to expose CephFS snapshots as
shallow, read-only volumes, without needing to clone the underlying snapshot
data.
## Use-cases
What's the point of such read-only volumes?
* **Restore snapshots selectively:** users may want to traverse snapshots,
restoring data to a writable volume more selectively instead of restoring
the whole snapshot.
* **Volume backup:** users can't backup a live volume, they first need
to snapshot it. Once a snapshot is taken, it still can't be backed-up,
as backup tools usually work with volumes (that are exposed as file-systems)
restoring data to a writable volume more selectively instead of restoring the
whole snapshot.
* **Volume backup:** users can't backup a live volume, they first need to
snapshot it. Once a snapshot is taken, it still can't be backed-up, as backup
tools usually work with volumes (that are exposed as file-systems)
and not snapshots (which might have backend-specific format). What this means
is that in order to create a snapshot backup, users have to clone snapshot
data twice:
1. first time, when restoring the snapshot into a temporary volume from
where the data will be read,
1. and second time, when transferring that volume into some backup/archive
storage (e.g. object store).
1. first time, when restoring the snapshot into a temporary volume from
where the data will be read,
1. and second time, when transferring that volume into some backup/archive
storage (e.g. object store).
The temporary backed-up volume will most likely be thrown away after the
backup transfer is finished. That's a lot of wasted work for what we
originally wanted to do! Having the ability to create volumes from
snapshots cheaply would be a big improvement for this use case.
originally wanted to do! Having the ability to create volumes from snapshots
cheaply would be a big improvement for this use case.
## Alternatives
* _Snapshots are stored in `<subvolume>/.snap`. Users could simply visit this
directory by themselves._
`.snap` is CephFS-specific detail of how snapshots are exposed.
Users / tools may not be aware of this special directory, or it may not fit
their workflow. At the moment, the idiomatic way of accessing snapshot
contents in CSI drivers is by creating a new volume and populating it
with snapshot.
`.snap` is CephFS-specific detail of how snapshots are exposed. Users / tools
may not be aware of this special directory, or it may not fit their workflow.
At the moment, the idiomatic way of accessing snapshot contents in CSI drivers
is by creating a new volume and populating it with snapshot.
## Design
@ -57,21 +56,21 @@ Key points:
* Volume source is a snapshot, volume access mode is `*_READER_ONLY`.
* No actual new subvolumes are created in CephFS.
* The resulting volume is a reference to the source subvolume snapshot.
This reference would be stored in `Volume.volume_context` map. In order
to reference a snapshot, we need subvol name and snapshot name.
* Mounting such volume means mounting the respective CephFS subvolume
and exposing the snapshot to workloads.
* Let's call a *shallow read-only volume with a subvolume snapshot
as its data source* just a *shallow volume* from here on out for brevity.
* The resulting volume is a reference to the source subvolume snapshot. This
reference would be stored in `Volume.volume_context` map. In order to
reference a snapshot, we need subvol name and snapshot name.
* Mounting such volume means mounting the respective CephFS subvolume and
exposing the snapshot to workloads.
* Let's call a *shallow read-only volume with a subvolume snapshot as its data
source* just a *shallow volume* from here on out for brevity.
### Controller operations
Care must be taken when handling life-times of relevant storage resources.
When a shallow volume is created, what would happen if:
Care must be taken when handling life-times of relevant storage resources. When
a shallow volume is created, what would happen if:
* _Parent subvolume of the snapshot is removed while the shallow volume
still exists?_
* _Parent subvolume of the snapshot is removed while the shallow volume still
exists?_
This shouldn't be a problem already. The parent volume has either
`snapshot-retention` subvol feature in which case its snapshots remain
@ -80,8 +79,8 @@ When a shallow volume is created, what would happen if:
* _Source snapshot from which the shallow volume originates is removed while
that shallow volume still exists?_
We need to make sure this doesn't happen and some book-keeping
is necessary. Ideally we could employ some kind of reference counting.
We need to make sure this doesn't happen and some book-keeping is necessary.
Ideally we could employ some kind of reference counting.
#### Reference counting for shallow volumes
@ -92,26 +91,26 @@ When creating a volume snapshot, a reference tracker (RT), represented by a
RADOS object, would be created for that snapshot. It would store information
required to track the references for the backing subvolume snapshot. Upon a
`CreateSnapshot` call, the reference tracker (RT) would be initialized with a
single reference record, where the CSI snapshot itself is the first reference
to the backing snapshot. Each subsequent shallow volume creation would add a
new reference record to the RT object. Each shallow volume deletion would
remove that reference from the RT object. Calling `DeleteSnapshot` would remove
the reference record that was previously added in `CreateSnapshot`.
single reference record, where the CSI snapshot itself is the first reference to
the backing snapshot. Each subsequent shallow volume creation would add a new
reference record to the RT object. Each shallow volume deletion would remove
that reference from the RT object. Calling `DeleteSnapshot` would remove the
reference record that was previously added in `CreateSnapshot`.
The subvolume snapshot would be removed from the Ceph cluster only once the RT
object holds no references. Note that this behavior would permit calling
`DeleteSnapshot` even if it is still referenced by shallow volumes.
* `DeleteSnapshot`:
* RT holds no references or the RT object doesn't exist:
delete the backing snapshot too.
* RT holds at least one reference: keep the backing snapshot.
* RT holds no references or the RT object doesn't exist:
delete the backing snapshot too.
* RT holds at least one reference: keep the backing snapshot.
* `DeleteVolume`:
* RT holds no references: delete the backing snapshot too.
* RT holds at least one reference: keep the backing snapshot.
* RT holds no references: delete the backing snapshot too.
* RT holds at least one reference: keep the backing snapshot.
To enable creating shallow volumes from snapshots that were provisioned by
older versions of cephfs-csi (i.e. before this feature is introduced),
To enable creating shallow volumes from snapshots that were provisioned by older
versions of cephfs-csi (i.e. before this feature is introduced),
`CreateVolume` for shallow volumes would also create an RT object in case it's
missing. It would be initialized to two: the source snapshot and the newly
created shallow volume.
@ -141,17 +140,17 @@ Things to look out for:
It doesn't consume any space on the filesystem. `Volume.capacity_bytes` is
allowed to contain zero. We could use that.
* _What should be the requested size when creating the volume (specified e.g.
in PVC)?_
* _What should be the requested size when creating the volume (specified e.g. in
PVC)?_
This one is tricky. CSI spec allows for
`CreateVolumeRequest.capacity_range.{required_bytes,limit_bytes}` to be
zero. On the other hand,
`PersistentVolumeClaim.spec.resources.requests.storage` must be bigger
than zero. cephfs-csi doesn't care about the requested size (the volume
will be read-only, so it has no usable capacity) and would always set it
to zero. This shouldn't case any problems for the time being, but still
is something we should keep in mind.
`CreateVolumeRequest.capacity_range.{required_bytes,limit_bytes}` to be zero.
On the other hand,
`PersistentVolumeClaim.spec.resources.requests.storage` must be bigger than
zero. cephfs-csi doesn't care about the requested size (the volume will be
read-only, so it has no usable capacity) and would always set it to zero. This
shouldn't case any problems for the time being, but still is something we
should keep in mind.
`CreateVolume` and behavior when using volume as volume source (PVC-PVC clone):
@ -167,8 +166,8 @@ Volume deletion is trivial.
### `CreateSnapshot`
Snapshotting read-only volumes doesn't make sense in general, and should
be rejected.
Snapshotting read-only volumes doesn't make sense in general, and should be
rejected.
### `ControllerExpandVolume`
@ -194,8 +193,8 @@ whole subvolume first, and only then perform the binds to target paths.
#### For case (a)
Subvolume paths are normally retrieved by
`ceph fs subvolume info/getpath <VOLUME NAME> <SUBVOLUME NAME> <SUBVOLUMEGROUP NAME>`,
which outputs a path like so:
`ceph fs subvolume info/getpath <VOLUME NAME> <SUBVOLUME NAME> <SUBVOLUMEGROUP NAME>`
, which outputs a path like so:
```
/volumes/<VOLUME NAME>/<SUBVOLUME NAME>/<UUID>
@ -217,12 +216,12 @@ itself still exists or not.
#### For case (b)
For cases where subvolumes are managed externally and not by cephfs-csi, we
must assume that the cephx user we're given can access only
For cases where subvolumes are managed externally and not by cephfs-csi, we must
assume that the cephx user we're given can access only
`/volumes/<VOLUME NAME>/<SUBVOLUME NAME>/<UUID>` so users won't be able to
benefit from snapshot retention. Users will need to be careful not to delete
the parent subvolumes and snapshots while they are associated by these shallow
RO volumes.
benefit from snapshot retention. Users will need to be careful not to delete the
parent subvolumes and snapshots while they are associated by these shallow RO
volumes.
### `NodePublishVolume`, `NodeUnpublishVolume`
@ -235,38 +234,38 @@ mount.
## Volume parameters, volume context
This section provides a discussion around determinig what volume parameters and
This section provides a discussion around determining what volume parameters and
volume context parameters will be used to convey necessary information to the
cephfs-csi driver in order to support shallow volumes.
Volume parameters `CreateVolumeRequest.parameters`:
* Should be "shallow" the default mode for all `CreateVolume` calls that have
(a) snapshot as data source and (b) read-only volume access mode? If not,
a new volume parameter should be introduced: e.g `isShallow: <bool>`. On the
(a) snapshot as data source and (b) read-only volume access mode? If not, a
new volume parameter should be introduced: e.g `isShallow: <bool>`. On the
other hand, does it even makes sense for users to want to create full copies
of snapshots and still have them read-only?
Volume context `Volume.volume_context`:
* Here we definitely need `isShallow` or similar. Without it we wouldn't be
able to distinguish between a regular volume that just happens to have
a read-only access mode, and a volume that references a snapshot.
* Here we definitely need `isShallow` or similar. Without it we wouldn't be able
to distinguish between a regular volume that just happens to have a read-only
access mode, and a volume that references a snapshot.
* Currently cephfs-csi recognizes `subvolumePath` for dynamically provisioned
volumes and `rootPath` for pre-previsioned volumes. As mentioned in
[`NodeStageVolume`, `NodeUnstageVolume` section](#NodeStageVolume-NodeUnstageVolume),
snapshots cannot be mounted directly. How do we pass in path to the parent
[`NodeStageVolume`, `NodeUnstageVolume` section](#NodeStageVolume-NodeUnstageVolume)
, snapshots cannot be mounted directly. How do we pass in path to the parent
subvolume?
* a) Path to the snapshot is passed in via `subvolumePath` / `rootPath`,
e.g.
`/volumes/<VOLUME NAME>/<SUBVOLUME NAME>/<UUID>/.snap/<SNAPSHOT NAME>`.
From that we can derive path to the subvolume: it's the parent of `.snap`
directory.
* b) Similar to a), path to the snapshot is passed in via `subvolumePath` /
`rootPath`, but instead of trying to derive the right path we introduce
another volume context parameter containing path to the parent subvolume
explicitly.
* c) `subvolumePath` / `rootPath` contains path to the parent subvolume and
we introduce another volume context parameter containing name of the
snapshot. Path to the snapshot is then formed by appending
`/.snap/<SNAPSHOT NAME>` to the subvolume path.
* a) Path to the snapshot is passed in via `subvolumePath` / `rootPath`,
e.g.
`/volumes/<VOLUME NAME>/<SUBVOLUME NAME>/<UUID>/.snap/<SNAPSHOT NAME>`.
From that we can derive path to the subvolume: it's the parent of `.snap`
directory.
* b) Similar to a), path to the snapshot is passed in via `subvolumePath` /
`rootPath`, but instead of trying to derive the right path we introduce
another volume context parameter containing path to the parent subvolume
explicitly.
* c) `subvolumePath` / `rootPath` contains path to the parent subvolume and
we introduce another volume context parameter containing name of the
snapshot. Path to the snapshot is then formed by appending
`/.snap/<SNAPSHOT NAME>` to the subvolume path.