Migrate from dep to go module

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
2025-06-13 18:43:34 +00:00 · 2020-02-17 17:45:57 +05:30
parent a9174dd953
commit d5a0606c33
642 changed files with 54160 additions and 147015 deletions
--- a/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/README.md
+++ b/vendor/github.com/grpc-ecosystem/go-grpc-prometheus/README.md
@ -0,0 +1,247 @@
+# Go gRPC Interceptors for Prometheus monitoring 
+
+[![Travis Build](https://travis-ci.org/grpc-ecosystem/go-grpc-prometheus.svg)](https://travis-ci.org/grpc-ecosystem/go-grpc-prometheus)
+[![Go Report Card](https://goreportcard.com/badge/github.com/grpc-ecosystem/go-grpc-prometheus)](http://goreportcard.com/report/grpc-ecosystem/go-grpc-prometheus)
+[![GoDoc](http://img.shields.io/badge/GoDoc-Reference-blue.svg)](https://godoc.org/github.com/grpc-ecosystem/go-grpc-prometheus)
+[![SourceGraph](https://sourcegraph.com/github.com/grpc-ecosystem/go-grpc-prometheus/-/badge.svg)](https://sourcegraph.com/github.com/grpc-ecosystem/go-grpc-prometheus/?badge)
+[![codecov](https://codecov.io/gh/grpc-ecosystem/go-grpc-prometheus/branch/master/graph/badge.svg)](https://codecov.io/gh/grpc-ecosystem/go-grpc-prometheus)
+[![Apache 2.0 License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
+
+[Prometheus](https://prometheus.io/) monitoring for your [gRPC Go](https://github.com/grpc/grpc-go) servers and clients.
+
+A sister implementation for [gRPC Java](https://github.com/grpc/grpc-java) (same metrics, same semantics) is in [grpc-ecosystem/java-grpc-prometheus](https://github.com/grpc-ecosystem/java-grpc-prometheus).
+
+## Interceptors
+
+[gRPC Go](https://github.com/grpc/grpc-go) recently acquired support for Interceptors, i.e. middleware that is executed
+by a gRPC Server before the request is passed onto the user's application logic. It is a perfect way to implement
+common patterns: auth, logging and... monitoring.
+
+To use Interceptors in chains, please see [`go-grpc-middleware`](https://github.com/mwitkow/go-grpc-middleware).
+
+## Usage
+
+There are two types of interceptors: client-side and server-side. This package provides monitoring Interceptors for both.
+
+### Server-side
+
+```go
+import "github.com/grpc-ecosystem/go-grpc-prometheus"
+...
+    // Initialize your gRPC server's interceptor.
+    myServer := grpc.NewServer(
+        grpc.StreamInterceptor(grpc_prometheus.StreamServerInterceptor),
+        grpc.UnaryInterceptor(grpc_prometheus.UnaryServerInterceptor),
+    )
+    // Register your gRPC service implementations.
+    myservice.RegisterMyServiceServer(s.server, &myServiceImpl{})
+    // After all your registrations, make sure all of the Prometheus metrics are initialized.
+    grpc_prometheus.Register(myServer)
+    // Register Prometheus metrics handler.    
+    http.Handle("/metrics", promhttp.Handler())
+...
+```
+
+### Client-side
+
+```go
+import "github.com/grpc-ecosystem/go-grpc-prometheus"
+...
+   clientConn, err = grpc.Dial(
+       address,
+		   grpc.WithUnaryInterceptor(grpc_prometheus.UnaryClientInterceptor),
+		   grpc.WithStreamInterceptor(grpc_prometheus.StreamClientInterceptor)
+   )
+   client = pb_testproto.NewTestServiceClient(clientConn)
+   resp, err := client.PingEmpty(s.ctx, &myservice.Request{Msg: "hello"})
+...
+```
+
+# Metrics
+
+## Labels
+
+All server-side metrics start with `grpc_server` as Prometheus subsystem name. All client-side metrics start with `grpc_client`. Both of them have mirror-concepts. Similarly all methods
+contain the same rich labels:
+  
+  * `grpc_service` - the [gRPC service](http://www.grpc.io/docs/#defining-a-service) name, which is the combination of protobuf `package` and
+    the `grpc_service` section name. E.g. for `package = mwitkow.testproto` and 
+     `service TestService` the label will be `grpc_service="mwitkow.testproto.TestService"`
+  * `grpc_method` - the name of the method called on the gRPC service. E.g.  
+    `grpc_method="Ping"`
+  * `grpc_type` - the gRPC [type of request](http://www.grpc.io/docs/guides/concepts.html#rpc-life-cycle). 
+    Differentiating between the two is important especially for latency measurements.
+
+     - `unary` is single request, single response RPC
+     - `client_stream` is a multi-request, single response RPC
+     - `server_stream` is a single request, multi-response RPC
+     - `bidi_stream` is a multi-request, multi-response RPC
+    
+
+Additionally for completed RPCs, the following labels are used:
+
+  * `grpc_code` - the human-readable [gRPC status code](https://github.com/grpc/grpc-go/blob/master/codes/codes.go).
+    The list of all statuses is to long, but here are some common ones:
+      
+      - `OK` - means the RPC was successful
+      - `IllegalArgument` - RPC contained bad values
+      - `Internal` - server-side error not disclosed to the clients
+      
+## Counters
+
+The counters and their up to date documentation is in [server_reporter.go](server_reporter.go) and [client_reporter.go](client_reporter.go) 
+the respective Prometheus handler (usually `/metrics`). 
+
+For the purpose of this documentation we will only discuss `grpc_server` metrics. The `grpc_client` ones contain mirror concepts.
+
+For simplicity, let's assume we're tracking a single server-side RPC call of [`mwitkow.testproto.TestService`](examples/testproto/test.proto),
+calling the method `PingList`. The call succeeds and returns 20 messages in the stream.
+
+First, immediately after the server receives the call it will increment the
+`grpc_server_started_total` and start the handling time clock (if histograms are enabled). 
+
+```jsoniq
+grpc_server_started_total{grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1
+```
+
+Then the user logic gets invoked. It receives one message from the client containing the request 
+(it's a `server_stream`):
+
+```jsoniq
+grpc_server_msg_received_total{grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1
+```
+
+The user logic may return an error, or send multiple messages back to the client. In this case, on 
+each of the 20 messages sent back, a counter will be incremented:
+
+```jsoniq
+grpc_server_msg_sent_total{grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 20
+```
+
+After the call completes, its status (`OK` or other [gRPC status code](https://github.com/grpc/grpc-go/blob/master/codes/codes.go)) 
+and the relevant call labels increment the `grpc_server_handled_total` counter.
+
+```jsoniq
+grpc_server_handled_total{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1
+```
+
+## Histograms
+
+[Prometheus histograms](https://prometheus.io/docs/concepts/metric_types/#histogram) are a great way
+to measure latency distributions of your RPCs. However, since it is bad practice to have metrics
+of [high cardinality](https://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels)
+the latency monitoring metrics are disabled by default. To enable them please call the following
+in your server initialization code:
+
+```jsoniq
+grpc_prometheus.EnableHandlingTimeHistogram()
+```
+
+After the call completes, its handling time will be recorded in a [Prometheus histogram](https://prometheus.io/docs/concepts/metric_types/#histogram)
+variable `grpc_server_handling_seconds`. The histogram variable contains three sub-metrics:
+
+ * `grpc_server_handling_seconds_count` - the count of all completed RPCs by status and method 
+ * `grpc_server_handling_seconds_sum` - cumulative time of RPCs by status and method, useful for 
+   calculating average handling times
+ * `grpc_server_handling_seconds_bucket` - contains the counts of RPCs by status and method in respective
+   handling-time buckets. These buckets can be used by Prometheus to estimate SLAs (see [here](https://prometheus.io/docs/practices/histograms/))
+
+The counter values will look as follows:
+
+```jsoniq
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.005"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.01"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.025"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.05"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.1"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.25"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.5"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="1"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="2.5"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="5"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="10"} 1
+grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="+Inf"} 1
+grpc_server_handling_seconds_sum{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 0.0003866430000000001
+grpc_server_handling_seconds_count{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1
+```
+
+
+## Useful query examples
+
+Prometheus philosophy is to provide raw metrics to the monitoring system, and
+let the aggregations be handled there. The verbosity of above metrics make it possible to have that
+flexibility. Here's a couple of useful monitoring queries:
+
+
+### request inbound rate
+```jsoniq
+sum(rate(grpc_server_started_total{job="foo"}[1m])) by (grpc_service)
+```
+For `job="foo"` (common label to differentiate between Prometheus monitoring targets), calculate the
+rate of requests per second (1 minute window) for each gRPC `grpc_service` that the job has. Please note
+how the `grpc_method` is being omitted here: all methods of a given gRPC service will be summed together.
+
+### unary request error rate
+```jsoniq
+sum(rate(grpc_server_handled_total{job="foo",grpc_type="unary",grpc_code!="OK"}[1m])) by (grpc_service)
+```
+For `job="foo"`, calculate the per-`grpc_service` rate of `unary` (1:1) RPCs that failed, i.e. the 
+ones that didn't finish with `OK` code.
+
+### unary request error percentage
+```jsoniq
+sum(rate(grpc_server_handled_total{job="foo",grpc_type="unary",grpc_code!="OK"}[1m])) by (grpc_service)
+ / 
+sum(rate(grpc_server_started_total{job="foo",grpc_type="unary"}[1m])) by (grpc_service)
+ * 100.0
+```
+For `job="foo"`, calculate the percentage of failed requests by service. It's easy to notice that
+this is a combination of the two above examples. This is an example of a query you would like to
+[alert on](https://prometheus.io/docs/alerting/rules/) in your system for SLA violations, e.g.
+"no more than 1% requests should fail".
+
+### average response stream size
+```jsoniq
+sum(rate(grpc_server_msg_sent_total{job="foo",grpc_type="server_stream"}[10m])) by (grpc_service)
+ /
+sum(rate(grpc_server_started_total{job="foo",grpc_type="server_stream"}[10m])) by (grpc_service)
+```
+For `job="foo"` what is the `grpc_service`-wide `10m` average of messages returned for all `
+server_stream` RPCs. This allows you to track the stream sizes returned by your system, e.g. allows 
+you to track when clients started to send "wide" queries that ret
+Note the divisor is the number of started RPCs, in order to account for in-flight requests.
+
+### 99%-tile latency of unary requests
+```jsoniq
+histogram_quantile(0.99, 
+  sum(rate(grpc_server_handling_seconds_bucket{job="foo",grpc_type="unary"}[5m])) by (grpc_service,le)
+)
+```
+For `job="foo"`, returns an 99%-tile [quantile estimation](https://prometheus.io/docs/practices/histograms/#quantiles)
+of the handling time of RPCs per service. Please note the `5m` rate, this means that the quantile
+estimation will take samples in a rolling `5m` window. When combined with other quantiles
+(e.g. 50%, 90%), this query gives you tremendous insight into the responsiveness of your system 
+(e.g. impact of caching).
+
+### percentage of slow unary queries (>250ms)
+```jsoniq
+100.0 - (
+sum(rate(grpc_server_handling_seconds_bucket{job="foo",grpc_type="unary",le="0.25"}[5m])) by (grpc_service)
+ / 
+sum(rate(grpc_server_handling_seconds_count{job="foo",grpc_type="unary"}[5m])) by (grpc_service)
+) * 100.0
+```
+For `job="foo"` calculate the by-`grpc_service` fraction of slow requests that took longer than `0.25` 
+seconds. This query is relatively complex, since the Prometheus aggregations use `le` (less or equal)
+buckets, meaning that counting "fast" requests fractions is easier. However, simple maths helps.
+This is an example of a query you would like to alert on in your system for SLA violations, 
+e.g. "less than 1% of requests are slower than 250ms".
+
+
+## Status
+
+This code has been used since August 2015 as the basis for monitoring of *production* gRPC micro services  at [Improbable](https://improbable.io).
+
+## License
+
+`go-grpc-prometheus` is released under the Apache 2.0 license. See the [LICENSE](LICENSE) file for details.