Service Tiers

Groq offers multiple service tiers so you can tune for latency, throughput, and reliability. You can distinguish these by providing the service_tier parameter.

performance: The highest tier we have providing reliable low latency for the most critical production applications. This tier is available to our enterprise users. More info at Performance Tier.
on_demand: This is the default tier if you omit service_tier. This is the standard tier you are used to using and you get the predictable high speeds of Groq's LPU with occasional queue latency during peak times.
flex: higher throughput and provided as best effort. You have high limits but may get over capacity errors. Check out Flex Processing for more info.
auto: Pass this if you dont want to think about tiers and you want to leverage the best tier available to you at any given moment.

import Groq from "groq-sdk";

const client = new Groq({ apiKey: process.env.GROQ_API_KEY });

const completion = await client.chat.completions.create({
  model: "openai/gpt-oss-120b",
  service_tier: "auto",
  messages: [{ role: "user", content: "Summarize the latest release highlights." }],
});

console.log(completion.choices[0].message.content);

import os
from groq import Groq

client = Groq(api_key=os.environ["GROQ_API_KEY"])

completion = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    service_tier="auto",
    messages=[{"role": "user", "content": "Summarize the latest release highlights."}],
)

print(completion.choices[0].message.content)

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-120b",
    "service_tier": "auto",
    "messages": [
      { "role": "user", "content": "Summarize the latest release highlights." }
    ]
  }'

Batch and asynchronous workloads

The Batch API has its own processing window and rate limits and does not accept the service_tier parameter. Use synchronous requests when you need explicit tier control; batch jobs run independently of your per-model synchronous limits.

Getting Started

Core Features

Tools & Integrations

Compound (Agentic AI)

Guides

Service Tiers

Advanced

Production Readiness

Account and Console

Developer Resources

Legal

Service Tiers

Batch and asynchronous workloads

Was this page helpful?

On this page