Prometheus Metrics

Prometheus is an open-source monitoring system that collects and stores metrics as time series data. Its stable API is compatible with a range of systems and tools like Grafana.

Enterprise Feature

This feature is only available to our Enterprise tier customers. To get started, please reach out to our Enterprise team.

APIs

Groq exposes Prometheus metrics about your organization's usage through VictoriaMetrics. It supports most Prometheus querying API paths:

/api/v1/query
/api/v1/query_range
/api/v1/series
/api/v1/labels
/api/v1/label/<label_name>/values
/api/v1/status/tsdb

MetricsQL

Prometheus queries against Groq endpoints use MetricsQL, a query language that extends Prometheus's native PromQL query language.

Querying

Queries can be sent to the following endpoint:

https://api.groq.com/v1/metrics/prometheus

To Authenticate, you will need to provide your Groq API key as a header in the Authorization: Bearer <your-api-key> format.

Grafana

If you run Grafana, you can add Groq metrics as a Prometheus datasource:

Add a new Prometheus datasource in Grafana by navigating to Settings -> Data Sources -> Add data source -> Prometheus.
Enter the following URL under HTTP -> URL: https://api.groq.com/v1/metrics/prometheus
Set the Authorization header to your Groq API key:

Go to Custom HTTP Headers -> Add Header
- Header: Authorization
- Value: Bearer <your-api-key>

Save & Test.

Available Metrics

Groq provides the following metrics:

Request Metrics

requests:increase1m
- The number of requests made within a minute
requests:rate1m
- The average number of requests per second over a given minute

Broken out by model and status_code

Latency Metrics

e2e_latency_seconds:{percentile}:rate5m
- Percentile end-to-end latency average over a 5 minute window for P99, P95, and P50
ttft_latency_seconds:{percentile}:rate5m
- Percentile time to first token latency average over a 5 minute window for P99, P95, and P50
queue_latency_seconds:{percentile}:rate5m
- Percentile queue latency (time request spends in queue before being processed) average over a 5 minute window for P99, P95, and P50

Broken out by model.

Token Metrics

tokens_in:{percentile}:rate5m
- Percentile number of input tokens average over a 5 minute window for P99, P95, and P50
tokens_out:{percentile}:rate5m
- Percentile number of output tokens average over a 5 minute window for P99, P95, and P50

Broken out by model.

In addition to using the APIs directly, you can see a handful of curated charts directly in our console at Metrics

Get Started

Features

Built-In Tools

Compound

Advanced Features

Prompting Guide

Production Readiness

Developer Resources

Console

Support & Guidelines

Uncategorized