implement grpc metrics for ceph-csi

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2025-06-13 02:33:34 +00:00 · 2019-08-21 14:58:02 +05:30
parent 01a78cace5
commit a81a3bf96b
46 changed files with 1363 additions and 158 deletions
--- a/docs/deploy-cephfs.md
+++ b/docs/deploy-cephfs.md
@ -43,22 +43,24 @@ that should be resolved in v14.2.3.

 **Available command line arguments:**

-| Option              | Default value               | Description                                                                                                                                                                                                                                                                               |
-| ------------------- | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `--endpoint`        | `unix://tmp/csi.sock`       | CSI endpoint, must be a UNIX socket                                                                                                                                                                                                                                                       |
-| `--drivername`      | `cephfs.csi.ceph.com`       | Name of the driver (Kubernetes: `provisioner` field in StorageClass must correspond to this value)                                                                                                         |
-| `--nodeid`          | _empty_                     | This node's ID                                                                                                                                                                                                                                                                            |
-| `--type`            | _empty_                     | Driver type `[rbd | cephfs]` If the driver type is set to  `rbd` it will act as a `rbd plugin` or if it's set to `cephfs` will act as a `cephfs plugin`                                                                                                               |
-| `--volumemounter`   | _empty_                     | Default volume mounter. Available options are `kernel` and `fuse`. This is the mount method used if volume parameters don't specify otherwise. If left unspecified, the driver will first probe for `ceph-fuse` in system's path and will choose Ceph kernel client if probing failed. |
-| `--mountcachedir`   | _empty_                     | Volume mount cache info save dir. If left unspecified, the dirver will not record mount info, or it will save mount info and when driver restart it will remount volume it cached.                                                                                                     |
-| `--instanceid`      | "default"                   | Unique ID distinguishing this instance of Ceph CSI among other instances, when sharing Ceph clusters across CSI instances for provisioning                                                                                                                                             |
-| `--pluginpath`      | "/var/lib/kubelet/plugins/" | The location of cephcsi plugin on host                                                                                                                                                                                                                                                 |
-| `--metadatastorage` | _empty_                     | Points to where older (1.0.0 or older plugin versions) metadata about provisioned volumes are kept, as file or in as k8s configmap (`node` or `k8s_configmap` respectively)                                                                                                            |
-| `--pidlimit`        | _0_                         | Configure the PID limit in cgroups. The container runtime can restrict the number of processes/tasks which can cause problems while provisioning (or deleting) a large number of volumes. A value of `-1` configures the limit to the maximum, `0` does not configure limits at all.   |
-| `--livenessport`    | `8080`                      | TCP port for liveness requests                                                                                                                                                                                                                                                            |
-| `--livenesspath`    | `/metrics`                  | Path of prometheus endpoint where metrics will be available                                                                                                                                                                                                                               |
-| `--polltime`        | `60s`                       | Time interval in between each poll                                                                                                                                                                                                                                                        |
-| `--timeout`         | `3s`                        | Probe timeout in seconds                                                                                                                                                                                                                                                                  |
+| Option                | Default value               | Description                                                                                                                                                                                                                                                                            |
+| --------------------- | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `--endpoint`          | `unix://tmp/csi.sock`       | CSI endpoint, must be a UNIX socket                                                                                                                                                                                                                                                    |
+| `--drivername`        | `cephfs.csi.ceph.com`       | Name of the driver (Kubernetes: `provisioner` field in StorageClass must correspond to this value)                                                                                                                                                                                     |
+| `--nodeid`            | _empty_                     | This node's ID                                                                                                                                                                                                                                                                         |
+| `--type`              | _empty_                     | Driver type `[rbd | cephfs]` If the driver type is set to  `rbd` it will act as a `rbd plugin` or if it's set to `cephfs` will act as a `cephfs plugin`                                                                                                                                |
+| `--volumemounter`     | _empty_                     | Default volume mounter. Available options are `kernel` and `fuse`. This is the mount method used if volume parameters don't specify otherwise. If left unspecified, the driver will first probe for `ceph-fuse` in system's path and will choose Ceph kernel client if probing failed. |
+| `--mountcachedir`     | _empty_                     | Volume mount cache info save dir. If left unspecified, the dirver will not record mount info, or it will save mount info and when driver restart it will remount volume it cached.                                                                                                     |
+| `--instanceid`        | "default"                   | Unique ID distinguishing this instance of Ceph CSI among other instances, when sharing Ceph clusters across CSI instances for provisioning                                                                                                                                             |
+| `--pluginpath`        | "/var/lib/kubelet/plugins/" | The location of cephcsi plugin on host                                                                                                                                                                                                                                                 |
+| `--metadatastorage`   | _empty_                     | Points to where older (1.0.0 or older plugin versions) metadata about provisioned volumes are kept, as file or in as k8s configmap (`node` or `k8s_configmap` respectively)                                                                                                            |
+| `--pidlimit`          | _0_                         | Configure the PID limit in cgroups. The container runtime can restrict the number of processes/tasks which can cause problems while provisioning (or deleting) a large number of volumes. A value of `-1` configures the limit to the maximum, `0` does not configure limits at all.   |
+| `--metricsport`       | `8080`                      | TCP port for /grpc metrics requests                                                                                                                                                                                                                                                    |
+| `--metricspath`       | `/metrics`                  | Path of prometheus endpoint where metrics will be available                                                                                                                                                                                                                            |
+| `--enablegrpcmetrics` | `false`                     | Enable grpc metrics collection  and start prometheus server                                                                                                                                                                                                                            |
+| `--polltime`          | `60s`                       | Time interval in between each poll                                                                                                                                                                                                                                                     |
+| `--timeout`           | `3s`                        | Probe timeout in seconds                                                                                                                                                                                                                                                               |
+| `--histogramoption`   | `0.5,2,6`                   | Histogram option for grpc metrics, should be comma separated value (ex:= "0.5,2,6" where start=0.5 factor=2, count=6)                                                                                                                                                                  |

 **Available environmental variables:**

@ -76,7 +78,7 @@ is used to define in which namespace you want the configmaps to be stored
 | `clusterID`                                                                                         | yes            | String representing a Ceph cluster, must be unique across all Ceph clusters in use for provisioning, cannot be greater than 36 bytes in length, and should remain immutable for the lifetime of the Ceph cluster in use |
 | `fsName`                                                                                            | yes            | CephFS filesystem name into which the volume shall be created                                                                                                                                                           |
 | `mounter`                                                                                           | no             | Mount method to be used for this volume. Available options are `kernel` for Ceph kernel client and `fuse` for Ceph FUSE driver. Defaults to "default mounter", see command line arguments.                              |
-| `pool`                                                                                              | no            | Ceph pool into which volume data shall be stored                                                                                                                                                                        |
+| `pool`                                                                                              | no             | Ceph pool into which volume data shall be stored                                                                                                                                                                        |
 | `csi.storage.k8s.io/provisioner-secret-name`, `csi.storage.k8s.io/node-stage-secret-name`           | for Kubernetes | Name of the Kubernetes Secret object containing Ceph client credentials. Both parameters should have the same value                                                                                                     |
 | `csi.storage.k8s.io/provisioner-secret-namespace`, `csi.storage.k8s.io/node-stage-secret-namespace` | for Kubernetes | Namespaces of the above Secret objects                                                                                                                                                                                  |

--- a/docs/deploy-rbd.md
+++ b/docs/deploy-rbd.md
@ -27,32 +27,34 @@ make image-cephcsi

 **Available command line arguments:**

-| Option              | Default value         | Description                                                                                                                                                                  |
-| ------------------- | --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `--endpoint`        | `unix://tmp/csi.sock` | CSI endpoint, must be a UNIX socket                                                                                                                                          |
-| `--drivername`      | `rbd.csi.ceph.com`    | Name of the driver (Kubernetes: `provisioner` field in StorageClass must correspond to this value)                                                                           |
-| `--nodeid`          | _empty_               | This node's ID                                                                                                                                                               |
-| `--type`            | _empty_               | Driver type `[rbd | cephfs]` If the driver type is set to  `rbd` it will act as a `rbd plugin` or if it's set to `cephfs` will act as a `cephfs plugin`                      |
-| `--containerized`   | true                  | Whether running in containerized mode                                                                                                                                        |
-| `--instanceid`      | "default"             | Unique ID distinguishing this instance of Ceph CSI among other instances, when sharing Ceph clusters across CSI instances for provisioning                                   |
-| `--metadatastorage` | _empty_               | Points to where legacy (1.0.0 or older plugin versions) metadata about provisioned volumes are kept, as file or in as k8s configmap (`node` or `k8s_configmap` respectively) |
-| `--pidlimit`        | _0_                   | Configure the PID limit in cgroups. The container runtime can restrict the number of processes/tasks which can cause problems while provisioning (or deleting) a large number of volumes. A value of `-1` configures the limit to the maximum, `0` does not configure limits at all.   |
-| `--livenessport`    | `8080`                | TCP port for liveness requests                                                                                                                                               |
-| `--livenesspath`    | `"/metrics"`          | Path of prometheus endpoint where metrics will be available                                                                                                                  |
-| `--polltime`        | `"60s"`               | Time interval in between each poll                                                                                                                                           |
-| `--timeout`         | `"3s"`                | Probe timeout in seconds                                                                                                                                                     |
+| Option                | Default value         | Description                                                                                                                                                                                                                                                                          |
+| --------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `--endpoint`          | `unix://tmp/csi.sock` | CSI endpoint, must be a UNIX socket                                                                                                                                                                                                                                                  |
+| `--drivername`        | `rbd.csi.ceph.com`    | Name of the driver (Kubernetes: `provisioner` field in StorageClass must correspond to this value)                                                                                                                                                                                   |
+| `--nodeid`            | _empty_               | This node's ID                                                                                                                                                                                                                                                                       |
+| `--type`              | _empty_               | Driver type `[rbd | cephfs]` If the driver type is set to  `rbd` it will act as a `rbd plugin` or if it's set to `cephfs` will act as a `cephfs plugin`                                                                                                                              |
+| `--containerized`     | true                  | Whether running in containerized mode                                                                                                                                                                                                                                                |
+| `--instanceid`        | "default"             | Unique ID distinguishing this instance of Ceph CSI among other instances, when sharing Ceph clusters across CSI instances for provisioning                                                                                                                                           |
+| `--metadatastorage`   | _empty_               | Points to where legacy (1.0.0 or older plugin versions) metadata about provisioned volumes are kept, as file or in as k8s configmap (`node` or `k8s_configmap` respectively)                                                                                                         |
+| `--pidlimit`          | _0_                   | Configure the PID limit in cgroups. The container runtime can restrict the number of processes/tasks which can cause problems while provisioning (or deleting) a large number of volumes. A value of `-1` configures the limit to the maximum, `0` does not configure limits at all. |
+| `--metricsport`       | `8080`                | TCP port for liveness/grpc metrics requests                                                                                                                                                                                                                                          |
+| `--metricspath`       | `"/metrics"`          | Path of prometheus endpoint where metrics will be available                                                                                                                                                                                                                          |
+| `--enablegrpcmetrics` | `false`               | Enable grpc metrics collection  and start prometheus server                                                                                                                                                                                                                          |
+| `--polltime`          | `"60s"`               | Time interval in between each poll                                                                                                                                                                                                                                                   |
+| `--timeout`           | `"3s"`                | Probe timeout in seconds                                                                                                                                                                                                                                                             |
+| `--histogramoption`   | `0.5,2,6`             | Histogram option for grpc metrics, should be comma separated value (ex:= "0.5,2,6" where start=0.5 factor=2, count=6)                                                                                                                                                                |

 **Available volume parameters:**

-| Parameter                                                                                             | Required             | Description                                                                                                                                                                                                             |
-| ----------------------------------------------------------------------------------------------------- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `clusterID`                                                                                           | yes                  | String representing a Ceph cluster, must be unique across all Ceph clusters in use for provisioning, cannot be greater than 36 bytes in length, and should remain immutable for the lifetime of the Ceph cluster in use |
-| `pool`                                                                                                | yes                  | Ceph pool into which the RBD image shall be created                                                                                                                                                                     |
-| `imageFormat`                                                                                         | no                   | RBD image format. Defaults to `2`. See [man pages](http://docs.ceph.com/docs/mimic/man/8/rbd/#cmdoption-rbd-image-format)                                                                                               |
-| `imageFeatures`                                                                                       | no                   | RBD image features. Available for `imageFormat=2`. CSI RBD currently supports only `layering` feature. See [man pages](http://docs.ceph.com/docs/mimic/man/8/rbd/#cmdoption-rbd-image-feature)                          |
+| Parameter                                                                                           | Required             | Description                                                                                                                                                                                                             |
+| --------------------------------------------------------------------------------------------------- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `clusterID`                                                                                         | yes                  | String representing a Ceph cluster, must be unique across all Ceph clusters in use for provisioning, cannot be greater than 36 bytes in length, and should remain immutable for the lifetime of the Ceph cluster in use |
+| `pool`                                                                                              | yes                  | Ceph pool into which the RBD image shall be created                                                                                                                                                                     |
+| `imageFormat`                                                                                       | no                   | RBD image format. Defaults to `2`. See [man pages](http://docs.ceph.com/docs/mimic/man/8/rbd/#cmdoption-rbd-image-format)                                                                                               |
+| `imageFeatures`                                                                                     | no                   | RBD image features. Available for `imageFormat=2`. CSI RBD currently supports only `layering` feature. See [man pages](http://docs.ceph.com/docs/mimic/man/8/rbd/#cmdoption-rbd-image-feature)                          |
 | `csi.storage.k8s.io/provisioner-secret-name`, `csi.storage.k8s.io/node-stage-secret-name`           | yes (for Kubernetes) | name of the Kubernetes Secret object containing Ceph client credentials. Both parameters should have the same value                                                                                                     |
 | `csi.storage.k8s.io/provisioner-secret-namespace`, `csi.storage.k8s.io/node-stage-secret-namespace` | yes (for Kubernetes) | namespaces of the above Secret objects                                                                                                                                                                                  |
-| `mounter`                                                                                             | no                   | if set to `rbd-nbd`, use `rbd-nbd` on nodes that have `rbd-nbd` and `nbd` kernel modules to map rbd images                                                                                                              |
+| `mounter`                                                                                           | no                   | if set to `rbd-nbd`, use `rbd-nbd` on nodes that have `rbd-nbd` and `nbd` kernel modules to map rbd images                                                                                                              |

 **NOTE:** An accompanying CSI configuration file, needs to be provided to the
 running pods. Refer to [Creating CSI configuration](../examples/README.md#creating-csi-configuration)
--- a/docs/metrics.md
+++ b/docs/metrics.md
@ -1,9 +1,13 @@
 # Metrics

-CSI deploys a sidecar container that is responsible for collecting metrics.
+- [Metrics](#metrics)
+  - [Liveness](#liveness)
+  - [GRPC metrics](#grpc-metrics)

 ## Liveness

+CSI deploys a sidecar container that is responsible for collecting metrics.
+
 Liveness metrics are intended to be collected by prometheus but can be accesesed
 through a GET request to a specific pod ip.

@ -13,7 +17,7 @@ for example
 the expected output should be

 ```bash
-[root@worker2 /]# curl -X GET http://10.109.65.142:8080/metrics 2>/dev/null | grep csi
+curl -X GET http://10.109.65.142:8080/metrics 2>/dev/null | grep csi
 # HELP csi_liveness Liveness Probe
 # TYPE csi_liveness gauge
 csi_liveness 1
@ -23,10 +27,23 @@ Promethues can be deployed through the promethues operator described [here](http
 The [service-monitor](../examples/service-monitor.yaml) will tell promethues how
 to pull metrics out of CSI.

-Each CSI pod has a service to expose the end point to prometheus. By default rbd
+Each CSI pod has a service to expose the endpoint to prometheus. By default rbd
 pods run on port 8080 and cephfs 8081.
 These can be changed if desired or if multiple ceph clusters are deployed more
 ports will be used for additional CSI pods.

-You may need to open the ports used in your firewall depending on how you
+Note: You may need to open the ports used in your firewall depending on how you
+cluster is setup.
+
+## GRPC metrics
+
+grpc metrics are intended to be collected by prometheus but can be accesesed
+through a GET request to a specific pod ip.
+
+Each CSI pod has a service to expose the endpoint to prometheus. By default rbd
+pods run on port 8090 and cephfs 8091.
+These can be changed if desired or if multiple ceph clusters are deployed more
+ports will be used for additional CSI pods.
+
+Note: You may need to open the ports used in your firewall depending on how you
 cluster is setup.