# https://console.groq.com llms-full.txt
## Script: Types.d (ts)
URL: https://console.groq.com/docs/scripts/types.d
declare module "*.sh" {
const content: string;
export default content;
}
---
## Script: Code Examples (ts)
URL: https://console.groq.com/docs/scripts/code-examples
```
export const getExampleCode = (
modelId: string,
content = "Explain why fast inference is critical for reasoning models",
) => ({
shell: `curl https://api.groq.com/openai/v1/chat/completions \\
-H "Authorization: Bearer $GROQ_API_KEY" \\
-H "Content-Type: application/json" \\
-d '{
"model": "${modelId}",
"messages": [
{
"role": "user",
"content": "${content}"
}
]
}'`,
javascript: `import Groq from "groq-sdk";
const groq = new Groq();
async function main() {
const completion = await groq.chat.completions.create({
model: "${modelId}",
messages: [
{
role: "user",
content: "${content}",
},
],
});
console.log(completion.choices[0]?.message?.content);
}
main().catch(console.error);`,
python: `from groq import Groq
client = Groq()
completion = client.chat.completions.create(
model="${modelId}",
messages=[
{
"role": "user",
"content": "${content}"
}
]
)
print(completion.choices[0].message.content)`,
json: `{
"model": "${modelId}",
"messages": [
{
"role": "user",
"content": "${content}"
}
]
}`,
});
---
## Groq API Reference
URL: https://console.groq.com/docs/api-reference
## Groq API Reference
## Groq API Reference
---
## FlutterFlow + Groq: Fast & Powerful Cross-Platform Apps
URL: https://console.groq.com/docs/flutterflow
## FlutterFlow + Groq: Fast & Powerful Cross-Platform Apps
[**FlutterFlow**](https://flutterflow.io/) is a visual development platform to build high-quality, custom, cross-platform apps. By leveraging Groq's fast AI inference in FlutterFlow, you can build beautiful AI-powered apps to:
- **Build for Scale**: Collaborate efficiently to create robust apps that grow with your needs.
- **Iterate Fast**: Rapidly test, refine, and deploy your app, accelerating your development.
- **Fully Integrate Your Project**: Access databases, APIs, and custom widgets in one place.
- **Deploy Cross-Platform**: Launch on iOS, Android, web, and desktop from a single codebase.
### FlutterFlow + Groq Quick Start (10 minutes to hello world)
####1. Securely store your Groq API Key in FlutterFlow as an App State Variable
Go to the App Values tab in the FlutterFlow Builder, add `groqApiKey` as an app state variable, and enter your API key. It should have type `String` and be `persisted` (that way, the API Key is remembered even if you close out of your application).

*Store your api key securely as an App State variable by selecting "secure persisted fields"*
####2. Create a call to the Groq API
Next, navigate to the API calls tab
Create a new API call, call it `Groq Completion`, set the method type as `POST`, and for the API URL, use: https://api.groq.com/openai/v1/chat/completions
Now, add the following variables:
- `token` - This is your Groq API key, which you can get from the App Values tab.
- `model` - This is the model you want to use. For this example, we'll use `llama-3.3-70b-versatile`.
- `text` - This is the text you want to send to the Groq API.

####3. Define your API call header
Once you have added the relevant variables, define your API call header. You can reference the token variable you defined by putting it in square brackets ([]).
Define your API call header as follows: `Authorization: Bearer [token]`

####4. Define the body of your API call
You can drag and drop your variables into the JSON body, or include them in angle brackets.
Select JSON, and add the following:
- `model` - This is the model we defined in the variables section.
- `messages` - This is the message you want to send to the Groq API. We need to add the 'text' variable we defined in the variables section within the message within the system-message.
You can modify the system message to fit your specific use-case. We are going to use a generic system message:
"Provide a helpful answer for the following question - text"

####5. Test your API call
By clicking on the “Response & Test” button, you can test your API call. Provide values for your variables, and hit “Test API call” to see the response.

####6. Save relevant JSON Paths of the response
Once you have your API response, you can save relevant JSON Paths of the response.
To save the content of the response from Groq, you can scroll down and click “Add JSON Path” for `$.choices[:].message.content` and provide a name for it, such as “groqResponse”

####7. Connect the API call to your UI with an action
Now that you have added & tested your API call, let’s connect the API call to your UI with an action.
*If you are interested in following along, you can* [**clone the project**](https://app.flutterflow.io/project/groq-documentation-vc2rt1) *and include your own API Key. You can also follow along with this [3-minute video.](https://www.loom.com/share/053ee6ab744e4cf4a5179fac1405a800?sid=4960f7cd-2b29-4538-89bb-51aa5b76946c)*
In this page, we create a simple UI that includes a TextField for a user to input their question, a button to trigger our Groq Completion API call, and a Text widget to display the result from the API. We define a page state variable, groqResult, which will be updated to the result from the API. We then bind the Text widget to our page state variable groqResult, as shown below.

####8. Define an action that calls our API
Now that we have created our UI, we can add an action to our button that will call the API, and update our Text with the API’s response.
To do this, click on the button, open the action editor, and add an action to call the Groq Completion API.

To create our first action to the Groq endpoint, create an action of type Backend API call, and set the "group or call name" to `Groq Completion`.
Then add two additional variables:
- `token` - This is your Groq API key, which you can get from the App State tab.
- `text` - This is the text you want to send to the Groq API, which you can get from the TextField widget.
Finally, rename the action output to `groqResponse`.

####9. Update the page state variable
Once the API call succeeds, we can update our page state variable `groqResult` to the contents of the API response from Groq, using the JSON path we created when defining the API call.
Click on the "+" button for True, and add an action of type "Update Page State".
Add a field for `groqResult`, and set the value to `groqResponse`, found under Action Output.
Select `JSON Body` for the API Response Options, `Predifined Path` Path for the Available Options, and `groqResponse` for the Path.


####10. Run your app in test mode
Now that we have connected our API call to the UI as an action, we can run our app in test mode.
*Watch a [video](https://www.loom.com/share/8f965557a51d43c7ba518280b9c4fd12?sid=006c88e6-a0f2-4c31-bf03-6ba7fc8178a3) of the app live in test mode.*


*Result from Test mode session*
**Challenge:** Add to the above example and create a chat-interface, showing the history of the conversation, the current question, and a loading indicator.
### Additional Resources
For additional documentation and support, see the following:
- [Flutterflow Documentation](https://docs.flutterflow.io/)
---
## Vision: Jsonmode (py)
URL: https://console.groq.com/docs/vision/scripts/jsonmode.py
from groq import Groq
import os
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
completion = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "List what you observe in this photo in JSON format."
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/d/da/SF_From_Marin_Highlands3.jpg"
}
}
]
}
],
temperature=1,
max_completion_tokens=1024,
top_p=1,
stream=False,
response_format={"type": "json_object"},
stop=None,
)
print(completion.choices[0].message)
---
## Vision: Vision (js)
URL: https://console.groq.com/docs/vision/scripts/vision
import { Groq } from 'groq-sdk';
const groq = new Groq();
async function main() {
const chatCompletion = await groq.chat.completions.create({
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/f/f2/LPU-v1-die.jpg"
}
}
]
}
],
"model": "meta-llama/llama-4-scout-17b-16e-instruct",
"temperature":1,
"max_completion_tokens":1024,
"top_p":1,
"stream": false,
"stop": null
});
console.log(chatCompletion.choices[0].message.content);
}
main();
---
## Vision: Vision (json)
URL: https://console.groq.com/docs/vision/scripts/vision.json
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/f/f2/LPU-v1-die.jpg"
}
}
]
}
],
"model": "meta-llama/llama-4-scout-17b-16e-instruct",
"temperature":1,
"max_completion_tokens":1024,
"top_p":1,
"stream": false,
"stop": null
}
---
## Vision: Multiturn (py)
URL: https://console.groq.com/docs/vision/scripts/multiturn.py
from groq import Groq
import os
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
completion = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/d/da/SF_From_Marin_Highlands3.jpg"
}
}
]
},
{
"role": "user",
"content": "Tell me more about the area."
}
],
temperature=1,
max_completion_tokens=1024,
top_p=1,
stream=False,
stop=None,
)
print(completion.choices[0].message)
---
## Function to encode the image
URL: https://console.groq.com/docs/vision/scripts/local.py
from groq import Groq
import base64
import os
# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Path to your image
image_path = "sf.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
},
},
],
}
],
model="meta-llama/llama-4-scout-17b-16e-instruct",
)
print(chat_completion.choices[0].message.content)
---
## Vision: Vision (py)
URL: https://console.groq.com/docs/vision/scripts/vision.py
from groq import Groq
import os
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
completion = client.chat.completions.create(
model="meta-llama/llama-4-scout-17b-16e-instruct",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/f/f2/LPU-v1-die.jpg"
}
}
]
}
],
temperature=1,
max_completion_tokens=1024,
top_p=1,
stream=False,
stop=None,
)
print(completion.choices[0].message)
---
## Images and Vision
URL: https://console.groq.com/docs/vision
# Images and Vision
Groq API offers fast inference and low latency for multimodal models with vision capabilities for understanding and interpreting visual data from images. By analyzing the content of an image, multimodal models can generate
human-readable text for providing insights about given visual data.
## Supported Models
Groq API supports powerful multimodal models that can be easily integrated into your applications to provide fast and accurate image processing for tasks such as visual question answering, caption generation,
and Optical Character Recognition (OCR).
## How to Use Vision
Use Groq API vision features via:
- **GroqCloud Console Playground**: Use [Llama4 Scout](/playground?model=meta-llama/llama-4-scout-17b-16e-instruct) or [Llama4 Maverick](/playground?model=meta-llama/llama-4-maverick-17b-128e-instruct) as the model and
upload your image.
- **Groq API Request:** Call the [`chat.completions`](/docs/text-chat#generating-chat-completions-with-groq-sdk) API endpoint and set the model to `meta-llama/llama-4-scout-17b-16e-instruct` or `meta-llama/llama-4-maverick-17b-128e-instruct`.
See code examples below.
## How to Pass Images from URLs as Input
The following are code examples for passing your image to the model via a URL:
## How to Pass Locally Saved Images as Input
To pass locally saved images, we'll need to first encode our image to a base64 format string before passing it as the `image_url` in our API request as follows:
## Tool Use with Images
The `meta-llama/llama-4-scout-17b-16e-instruct`, `meta-llama/llama-4-maverick-17b-128e-instruct` models support tool use! The following cURL example defines a `get_current_weather` tool that the model can leverage to answer a user query that contains a question about the
weather along with an image of a location that the model can infer location (i.e. New York City) from:
The following is the output from our example above that shows how our model inferred the state as New York from the given image and called our example function:
```python
[
{
"id": "call_q0wg",
"function": {
"arguments": "{\"location\": \"New York, NY\",\"unit\": \"fahrenheit\"}",
"name": "get_current_weather"
},
"type": "function"
}
]
```
## JSON Mode with Images
The `meta-llama/llama-4-scout-17b-16e-instruct` and `meta-llama/llama-4-maverick-17b-128e-instruct` models support JSON mode! The following Python example queries the model with an image and text (i.e. "Please pull out relevant information as a JSON object.") with `response_format`
set for JSON mode:
## Multi-turn Conversations with Images
The `meta-llama/llama-4-scout-17b-16e-instruct` and `meta-llama/llama-4-maverick-17b-128e-instruct` models support multi-turn conversations! The following Python example shows a multi-turn user conversation about an image:
## Venture Deeper into Vision
### Use Cases to Explore
Vision models can be used in a wide range of applications. Here are some ideas:
- **Accessibility Applications:** Develop an application that generates audio descriptions for images by using a vision model to generate text descriptions for images, which can then
be converted to audio with one of our audio endpoints.
- **E-commerce Product Description Generation:** Create an application that generates product descriptions for e-commerce websites.
- **Multilingual Image Analysis:** Create applications that can describe images in multiple languages.
- **Multi-turn Visual Conversations:** Develop interactive applications that allow users to have extended conversations about images.
These are just a few ideas to get you started. The possibilities are endless, and we're excited to see what you create with vision models powered by Groq for low latency and fast inference!
### Next Steps
Check out our [Groq API Cookbook](https://github.com/groq/groq-api-cookbook) repository on GitHub (and give us a ⭐) for practical examples and tutorials:
- [Image Moderation](https://github.com/groq/groq-api-cookbook/blob/main/tutorials/image_moderation.ipynb)
- [Multimodal Image Processing (Tool Use, JSON Mode)](https://github.com/groq/groq-api-cookbook/tree/main/tutorials/multimodal-image-processing)
We're always looking for contributions. If you have any cool tutorials or guides to share, submit a pull request for review to help our open-source community!
---
## E2B + Groq: Open-Source Code Interpreter
URL: https://console.groq.com/docs/e2b
## E2B + Groq: Open-Source Code Interpreter
[E2B](https://e2b.dev/) Code Interpreter is an open-source SDK that provides secure, sandboxed environments for executing code generated by LLMs via Groq API. Built specifically for AI data analysts,
coding applications, and reasoning-heavy agents, E2B enables you to both generate and execute code in a secure sandbox environment in real-time.
### Python Quick Start (3 minutes to hello world)
####1. Install the required packages:
```bash
pip install groq e2b-code-interpreter python-dotenv
```
####2. Configure your Groq and [E2B](https://e2b.dev/docs) API keys:
```bash
export GROQ_API_KEY="your-groq-api-key"
export E2B_API_KEY="your-e2b-api-key"
```
####3. Create your first simple and fast Code Interpreter application that generates and executes code to analyze data:
Running the below code will create a secure sandbox environment, generate Python code using `llama-3.3-70b-versatile` powered by Groq, execute the code, and display the results. When you go to your
[E2B Dashboard](https://e2b.dev/dashboard), you'll see your sandbox's data.
```python
from e2b_code_interpreter import Sandbox
from groq import Groq
import os
e2b_api_key = os.environ.get('E2B_API_KEY')
groq_api_key = os.environ.get('GROQ_API_KEY')
# Initialize Groq client
client = Groq(api_key=groq_api_key)
SYSTEM_PROMPT = """You are a Python data scientist. Generate simple code that:
1. Uses numpy to generate5 random numbers
2. Prints only the mean and standard deviation in a clean format
Example output format:
Mean:5.2
Std Dev:1.8"""
def main():
# Create sandbox instance (by default, sandbox instances stay alive for5 mins)
sbx = Sandbox()
# Get code from Groq
response = client.chat.completions.create(
model="llama-3.1-70b-versatile",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": "Generate random numbers and show their mean and standard deviation"}
]
)
# Extract and run the code
code = response.choices[0].message.content
if "```python" in code:
code = code.split("```python")[1].split("```")[0]
print("\nGenerated Python code:")
print(code)
print("\nExecuting code in sandbox...")
execution = sbx.run_code(code)
print(execution.logs.stdout[0])
if __name__ == "__main__":
main()
```
**Challenge**: Try modifying the example to analyze your own dataset or solve a different data science problem!
For more detailed documentation and resources on building with E2B and Groq, see:
- [Tutorial: Code Interpreting with Groq (Python)](https://e2b.dev/blog/guide-code-interpreting-with-groq-and-e2b)
- [Tutorial: Code Interpreting with Groq (JavaScript)](https://e2b.dev/blog/guide-groq-js)
---
## Responses Api: Web Search (py)
URL: https://console.groq.com/docs/responses-api/scripts/web-search.py
import openai
client = openai.OpenAI(
api_key="your-groq-api-key",
base_url="https://api.groq.com/openai/v1"
)
response = client.responses.create(
model="openai/gpt-oss-20b",
input="Analyze the current weather in San Francisco and provide a detailed forecast.",
tool_choice="required",
tools=[
{
"type": "browser_search"
}
]
)
print(response.output_text)
---
## Responses Api: Web Search (js)
URL: https://console.groq.com/docs/responses-api/scripts/web-search
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.GROQ_API_KEY,
baseURL: "https://api.groq.com/openai/v1",
});
const response = await client.responses.create({
model: "openai/gpt-oss-20b",
input: "Analyze the current weather in San Francisco and provide a detailed forecast.",
tool_choice: "required",
tools: [
{
type: "browser_search"
}
]
});
console.log(response.output_text);
---
## Responses Api: Structured Outputs (py)
URL: https://console.groq.com/docs/responses-api/scripts/structured-outputs.py
```python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ.get("GROQ_API_KEY"),
base_url="https://api.groq.com/openai/v1"
)
response = client.responses.create(
model="moonshotai/kimi-k2-instruct-0905",
instructions="Extract product review information from the text.",
input="I bought the UltraSound Headphones last week and I'm really impressed! The noise cancellation is amazing and the battery lasts all day. Sound quality is crisp and clear. I'd give it4.5 out of5 stars.",
text={
"format": {
"type": "json_schema",
"name": "product_review",
"schema": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"rating": {"type": "number"},
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"]
},
"key_features": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["product_name", "rating", "sentiment", "key_features"],
"additionalProperties": False
}
}
)
print(response.output_text)
```
---
## Responses Api: Quickstart (py)
URL: https://console.groq.com/docs/responses-api/scripts/quickstart.py
```python
import openai
client = openai.OpenAI(
api_key="your-groq-api-key",
base_url="https://api.groq.com/openai/v1"
)
response = client.responses.create(
model="llama-3.3-70b-versatile",
input="Tell me a fun fact about the moon in one sentence.",
)
print(response.output_text)
```
---
## Responses Api: Reasoning (py)
URL: https://console.groq.com/docs/responses-api/scripts/reasoning.py
```python
import openai
client = openai.OpenAI(
api_key="your-groq-api-key",
base_url="https://api.groq.com/openai/v1"
)
response = client.responses.create(
model="openai/gpt-oss-20b",
input="How are AI models trained? Be brief.",
reasoning={
"effort": "low"
}
)
print(response.output_text)
```
---
## Responses Api: Code Interpreter (js)
URL: https://console.groq.com/docs/responses-api/scripts/code-interpreter
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.GROQ_API_KEY,
baseURL: "https://api.groq.com/openai/v1",
});
const response = await client.responses.create({
model: "openai/gpt-oss-20b",
input: "What is1312 X3333? Output only the final answer.",
tool_choice: "required",
tools: [
{
type: "code_interpreter",
container: {
"type": "auto"
}
}
]
});
console.log(response.output_text);
---
## Responses Api: Quickstart (js)
URL: https://console.groq.com/docs/responses-api/scripts/quickstart
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.GROQ_API_KEY,
baseURL: "https://api.groq.com/openai/v1",
});
const response = await client.responses.create({
model: "openai/gpt-oss-20b",
input: "Tell me a fun fact about the moon in one sentence.",
});
console.log(response.output_text);
---
## Responses Api: Structured Outputs (js)
URL: https://console.groq.com/docs/responses-api/scripts/structured-outputs
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.GROQ_API_KEY,
baseURL: "https://api.groq.com/openai/v1",
});
const response = await openai.responses.create({
model: "moonshotai/kimi-k2-instruct-0905",
instructions: "Extract product review information from the text.",
input: "I bought the UltraSound Headphones last week and I'm really impressed! The noise cancellation is amazing and the battery lasts all day. Sound quality is crisp and clear. I'd give it4.5 out of5 stars.",
text: {
format: {
type: "json_schema",
name: "product_review",
schema: {
type: "object",
properties: {
product_name: { type: "string" },
rating: { type: "number" },
sentiment: {
type: "string",
enum: ["positive", "negative", "neutral"]
},
key_features: {
type: "array",
items: { type: "string" }
}
},
required: ["product_name", "rating", "sentiment", "key_features"],
additionalProperties: false
}
}
}
});
console.log(response.output_text);
---
## Responses Api: Reasoning (js)
URL: https://console.groq.com/docs/responses-api/scripts/reasoning
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.GROQ_API_KEY,
baseURL: "https://api.groq.com/openai/v1",
});
const response = await client.responses.create({
model: "openai/gpt-oss-20b",
input: "How are AI models trained? Be brief.",
reasoning: {
effort: "low"
}
});
console.log(response.output_text);
---
## Responses Api: Code Interpreter (py)
URL: https://console.groq.com/docs/responses-api/scripts/code-interpreter.py
```python
import openai
client = openai.OpenAI(
api_key="your-groq-api-key",
base_url="https://api.groq.com/openai/v1"
)
response = client.responses.create(
model="openai/gpt-oss-20b",
input="What is1312 X3333? Output only the final answer.",
tool_choice="required",
tools=[
{
"type": "code_interpreter",
"container": {
"type": "auto"
}
}
]
)
print(response.output_text)
```
---
## Responses API
URL: https://console.groq.com/docs/responses-api
# Responses API
Groq's Responses API is fully compatible with OpenAI's Responses API, making it easy to integrate advanced conversational AI capabilities into your applications. The Responses API supports both text and image inputs while producing text outputs, stateful conversations, and function calling to connect with external systems.
The Responses API is currently in beta. Please let us know your feedback in our [Community](https://community.groq.com).
## Configuring OpenAI Client for Responses API
To use the Responses API with OpenAI's client libraries, configure your client with your Groq API key and set the base URL to `https://api.groq.com/openai/v1`:
You can find your API key [here](/keys).
## Unsupported Features
Although Groq's Responses API is mostly compatible with OpenAI's Responses API, there are a few features we don't support just yet:
- `previous_response_id`
- `store`
- `truncation`
- `include`
- `safety_identifier`
- `prompt_cache_key`
## Built-In Tools
In addition to a model's regular [tool use capabilities](/docs/tool-use), the Responses API supports various built-in tools to extend your model's capabilities.
### Model Support
While all models support the Responses API, these built-in tools are only supported for the following models:
| Model ID | [Browser Search](/docs/browser-search) | [Code Execution](/docs/code-execution) |
|---------------------------------|--------------------------------|--------------------------------|
| [openai/gpt-oss-20b](/docs/model/openai/gpt-oss-20b)| ✅ | ✅ |
| [openai/gpt-oss-120b](/docs/model/openai/gpt-oss-120b) | ✅ | ✅ |
Here are examples using code execution and browser search:
### Code Execution Example
Enable your models to write and execute Python code for calculations, data analysis, and problem-solving - see our [code execution documentation](/docs/code-execution) for more details.
### Browser Search Example
Give your models access to real-time web content and up-to-date information - see our [browser search documentation](/docs/browser-search) for more details.
## Structured Outputs
Use structured outputs to ensure the model's response follows a specific JSON schema. This is useful for extracting structured data from text, ensuring consistent response formats, or integrating with downstream systems that expect specific data structures.
For a complete list of models that support structured outputs, see our [structured outputs documentation](/docs/structured-outputs).
## Reasoning
Use reasoning to let the model produce an internal chain of thought before generating a response. This is useful for complex problem solving, multi-step agentic workflow planning, and scientific analysis.
For a complete list of models that support reasoning, see our [reasoning documentation](/docs/reasoning).
The reasoning traces can be found in the `result.output` array as type "reasoning":
## Next Steps
Explore more advanced use cases in our built-in [browser search](/docs/browser-search) and [code execution](/docs/code-execution) documentation.
---
## Set an optional system message. This sets the behavior of the
URL: https://console.groq.com/docs/text-chat/scripts/basic-chat-completion.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
# Set an optional system message. This sets the behavior of the
# assistant and can be used to provide specific instructions for
# how it should behave throughout the conversation.
{
"role": "system",
"content": "You are a helpful assistant."
},
# Set a user message for the assistant to respond to.
{
"role": "user",
"content": "Explain the importance of fast language models",
}
],
# The language model which will generate the completion.
model="llama-3.3-70b-versatile"
)
# Print the completion returned by the LLM.
print(chat_completion.choices[0].message.content)
---
## Required parameters
URL: https://console.groq.com/docs/text-chat/scripts/streaming-chat-completion-with-stop.py
```python
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
#
# Required parameters
#
messages=[
# Set an optional system message. This sets the behavior of the
# assistant and can be used to provide specific instructions for
# how it should behave throughout the conversation.
{
"role": "system",
"content": "You are a helpful assistant."
},
# Set a user message for the assistant to respond to.
{
"role": "user",
"content": "Count to10. Your response must begin with \"1, \". example:1,2,3, ...",
}
],
# The language model which will generate the completion.
model="llama-3.3-70b-versatile",
#
# Optional parameters
#
# Controls randomness: lowering results in less random completions.
# As the temperature approaches zero, the model will become deterministic
# and repetitive.
temperature=0.5,
# The maximum number of tokens to generate. Requests can use up to
#2048 tokens shared between prompt and completion.
max_completion_tokens=1024,
# Controls diversity via nucleus sampling:0.5 means half of all
# likelihood-weighted options are considered.
top_p=1,
# A stop sequence is a predefined or user-specified text string that
# signals an AI to stop generating content, ensuring its responses
# remain focused and concise. Examples include punctuation marks and
# markers like "[end]".
# For this example, we will use ",6" so that the llm stops counting at5.
# If multiple stop values are needed, an array of string may be passed,
# stop=[",6", ", six", ", Six"]
stop=",6",
# If set, partial message deltas will be sent.
stream=False,
)
# Print the completion returned by the LLM.
print(chat_completion.choices[0].message.content)
```
---
## Text Chat: Prompt Engineering.doc (ts)
URL: https://console.groq.com/docs/text-chat/scripts/prompt-engineering.doc
```javascript
import { Groq } from "groq-sdk";
import { z } from "zod";
const client = new Groq();
// Define a schema for validation
const MovieSchema = z.object({
title: z.string(),
year: z.number().int(),
director: z.string(),
genre: z.array(z.string()),
runtime_minutes: z.number().int(),
rating: z.number().min(1).max(10),
box_office_millions: z.number(),
cast: z.array(
z.object({
actor: z.string(),
character: z.string()
})
)
});
type Movie = z.infer;
// Example of a poorly designed prompt
const poorPrompt = `
Give me information about a movie in JSON format.
`;
// Example of a well-designed prompt
const effectivePrompt = `
You are a movie database API. Return information about a movie with the following
JSON structure:
{
"title": "string",
"year": number,
"director": "string",
"genre": ["string"],
"runtime_minutes": number,
"rating": number (1-10 scale),
"box_office_millions": number,
"cast": [
{
"actor": "string",
"character": "string"
}
]
}
The response must:
1. Include ALL fields shown above
2. Use only the exact field names shown
3. Follow the exact data types specified
4. Contain ONLY the JSON object and nothing else
IMPORTANT: Do not include any explanatory text, markdown formatting, or code blocks.
`;
// Function to run the completion and display results
async function getMovieData(prompt: string, title = "Example"): Promise {
console.log(`\n--- ${title} ---`);
try {
const completion = await client.chat.completions.create({
model: "llama-3.3-70b-versatile",
response_format: { type: "json_object" },
messages: [
{ role: "system", content: prompt },
{ role: "user", content: "Tell me about The Matrix" },
],
});
const responseContent = completion.choices[0].message.content;
console.log("Raw response:");
console.log(responseContent);
// Try to parse as JSON
try {
const movieData = JSON.parse(responseContent || "");
console.log("\nSuccessfully parsed as JSON!");
// Validate against schema
try {
const validatedMovie = MovieSchema.parse(movieData);
console.log("All expected fields present and valid!");
return validatedMovie;
} catch (validationError) {
if (validationError instanceof z.ZodError) {
console.log("Schema validation failed:");
console.log(validationError.errors.map(e => `- ${e.path.join('.')}: ${e.message}`).join('\n'));
}
return null;
}
} catch (syntaxError) {
console.log("\nFailed to parse as JSON. Response is not valid JSON.");
return null;
}
} catch (error) {
console.error("Error:", error);
return null;
}
}
// Compare the results of both prompts
async function comparePrompts() {
await getMovieData(poorPrompt, "Poor Prompt Example");
await getMovieData(effectivePrompt, "Effective Prompt Example");
}
// Run the examples
comparePrompts();
```
---
## Required parameters
URL: https://console.groq.com/docs/text-chat/scripts/streaming-chat-completion.py
```python
from groq import Groq
client = Groq()
stream = client.chat.completions.create(
#
# Required parameters
#
messages=[
# Set an optional system message. This sets the behavior of the
# assistant and can be used to provide specific instructions for
# how it should behave throughout the conversation.
{
"role": "system",
"content": "You are a helpful assistant."
},
# Set a user message for the assistant to respond to.
{
"role": "user",
"content": "Explain the importance of fast language models",
}
],
# The language model which will generate the completion.
model="llama-3.3-70b-versatile",
#
# Optional parameters
#
# Controls randomness: lowering results in less random completions.
# As the temperature approaches zero, the model will become deterministic
# and repetitive.
temperature=0.5,
# The maximum number of tokens to generate. Requests can use up to
#2048 tokens shared between prompt and completion.
max_completion_tokens=1024,
# Controls diversity via nucleus sampling:0.5 means half of all
# likelihood-weighted options are considered.
top_p=1,
# A stop sequence is a predefined or user-specified text string that
# signals an AI to stop generating content, ensuring its responses
# remain focused and concise. Examples include punctuation marks and
# markers like "[end]".
stop=None,
# If set, partial message deltas will be sent.
stream=True,
)
# Print the incremental deltas returned by the LLM.
for chunk in stream:
print(chunk.choices[0].delta.content, end="")
```
---
## Required parameters
URL: https://console.groq.com/docs/text-chat/scripts/streaming-async-chat-completion.py
```python
import asyncio
from groq import AsyncGroq
async def main():
client = AsyncGroq()
stream = await client.chat.completions.create(
#
# Required parameters
#
messages=[
# Set an optional system message. This sets the behavior of the
# assistant and can be used to provide specific instructions for
# how it should behave throughout the conversation.
{
"role": "system",
"content": "You are a helpful assistant."
},
# Set a user message for the assistant to respond to.
{
"role": "user",
"content": "Explain the importance of fast language models"
}
],
# The language model which will generate the completion.
model="llama-3.3-70b-versatile",
#
# Optional parameters
#
# Controls randomness: lowering results in less random completions.
# As the temperature approaches zero, the model will become
# deterministic and repetitive.
temperature=0.5,
# The maximum number of tokens to generate. Requests can use up to
#2048 tokens shared between prompt and completion.
max_completion_tokens=1024,
# Controls diversity via nucleus sampling:0.5 means half of all
# likelihood-weighted options are considered.
top_p=1,
# A stop sequence is a predefined or user-specified text string that
# signals an AI to stop generating content, ensuring its responses
# remain focused and concise. Examples include punctuation marks and
# markers like "[end]".
stop=None,
# If set, partial message deltas will be sent.
stream=True,
)
# Print the incremental deltas returned by the LLM.
async for chunk in stream:
print(chunk.choices[0].delta.content, end="")
asyncio.run(main())
```
---
## Data model for LLM to generate
URL: https://console.groq.com/docs/text-chat/scripts/json-mode.py
from typing import List, Optional
import json
from pydantic import BaseModel
from groq import Groq
groq = Groq()
# Data model for LLM to generate
class Ingredient(BaseModel):
name: str
quantity: str
quantity_unit: Optional[str]
class Recipe(BaseModel):
recipe_name: str
ingredients: List[Ingredient]
directions: List[str]
def get_recipe(recipe_name: str) -> Recipe:
chat_completion = groq.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a recipe database that outputs recipes in JSON.\n"
# Pass the json schema to the model. Pretty printing improves results.
f" The JSON object must use the schema: {json.dumps(Recipe.model_json_schema(), indent=2)}",
},
{
"role": "user",
"content": f"Fetch a recipe for {recipe_name}",
},
],
model="meta-llama/llama-4-scout-17b-16e-instruct",
temperature=0,
# Streaming is not supported in JSON mode
stream=False,
# Enable JSON mode by setting the response format
response_format={"type": "json_object"},
)
return Recipe.model_validate_json(chat_completion.choices[0].message.content)
def print_recipe(recipe: Recipe):
print("Recipe:", recipe.recipe_name)
print("\nIngredients:")
for ingredient in recipe.ingredients:
print(
f"- {ingredient.name}: {ingredient.quantity} {ingredient.quantity_unit or ''}"
)
print("\nDirections:")
for step, direction in enumerate(recipe.directions, start=1):
print(f"{step}. {direction}")
recipe = get_recipe("apple pie")
print_recipe(recipe)
---
## Text Chat: Basic Validation Zod.doc (ts)
URL: https://console.groq.com/docs/text-chat/scripts/basic-validation-zod.doc
```javascript
import { Groq } from "groq-sdk";
import { z } from "zod";
const client = new Groq();
// Define a schema with Zod
const ProductSchema = z.object({
id: z.string(),
name: z.string(),
price: z.number().positive(),
description: z.string(),
in_stock: z.boolean(),
tags: z.array(z.string()).default([]),
});
// Infer the TypeScript type from the Zod schema
type Product = z.infer;
// Create a prompt that clearly defines the expected structure
const systemPrompt = `
You are a product catalog assistant. When asked about products,
always respond with valid JSON objects that match this structure:
{
"id": "string",
"name": "string",
"price": number,
"description": "string",
"in_stock": boolean,
"tags": ["string"]
}
Your response should ONLY contain the JSON object and nothing else.
`;
async function getStructuredResponse(): Promise {
try {
// Request structured data from the model
const completion = await client.chat.completions.create({
model: "openai/gpt-oss-20b",
response_format: { type: "json_object" },
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: "Tell me about a popular smartphone product" },
],
});
// Extract the response
const responseContent = completion.choices[0].message.content;
// Parse and validate JSON
const jsonData = JSON.parse(responseContent || "");
const validatedData = ProductSchema.parse(jsonData);
console.log("Validation successful! Structured data:");
console.log(JSON.stringify(validatedData, null,2));
return validatedData;
} catch (error) {
if (error instanceof z.ZodError) {
console.error("Schema validation failed:", error.errors);
} else if (error instanceof SyntaxError) {
console.error("JSON parsing failed: The model did not return valid JSON");
} else {
console.error("Error:", error);
}
return undefined;
}
}
getStructuredResponse();
```
---
## Text Chat: Instructor Example (js)
URL: https://console.groq.com/docs/text-chat/scripts/instructor-example
import Instructor from "@instructor-ai/instructor"; // npm install @instructor-ai/instructor
import { Groq } from "groq-sdk";
import { z } from "zod"; // npm install zod
// Set up the Groq client with Instructor
const client = new Groq();
const instructor = Instructor({
client,
mode: "TOOLS"
});
// Define your schema with Zod
const RecipeIngredientSchema = z.object({
name: z.string(),
quantity: z.string(),
unit: z.string().describe("The unit of measurement, like cup, tablespoon, etc."),
});
const RecipeSchema = z.object({
title: z.string(),
description: z.string(),
prep_time_minutes: z.number().int().positive(),
cook_time_minutes: z.number().int().positive(),
ingredients: z.array(RecipeIngredientSchema),
instructions: z.array(z.string()).describe("Step by step cooking instructions"),
});
async function getRecipe() {
try {
// Request structured data with automatic validation
const recipe = await instructor.chat.completions.create({
model: "openai/gpt-oss-20b",
response_model: {
name: "Recipe",
schema: RecipeSchema,
},
messages: [
{ role: "user", content: "Give me a recipe for chocolate chip cookies" },
],
max_retries:2, // Instructor will retry if validation fails
});
// No need for try/catch or manual validation - instructor handles it!
console.log(`Recipe: ${recipe.title}`);
console.log(`Prep time: ${recipe.prep_time_minutes} minutes`);
console.log(`Cook time: ${recipe.cook_time_minutes} minutes`);
console.log("\nIngredients:");
recipe.ingredients.forEach((ingredient) => {
console.log(`- ${ingredient.quantity} ${ingredient.unit} ${ingredient.name}`);
});
console.log("\nInstructions:");
recipe.instructions.forEach((step, index) => {
console.log(`${index +1}. ${step}`);
});
return recipe;
} catch (error) {
console.error("Error:", error);
}
}
// Run the example
getRecipe();
---
## Text Chat: Streaming Chat Completion (js)
URL: https://console.groq.com/docs/text-chat/scripts/streaming-chat-completion
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
const stream = await getGroqChatStream();
for await (const chunk of stream) {
// Print the completion returned by the LLM.
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
}
export async function getGroqChatStream() {
return groq.chat.completions.create({
//
// Required parameters
//
messages: [
// Set an optional system message. This sets the behavior of the
// assistant and can be used to provide specific instructions for
// how it should behave throughout the conversation.
{
role: "system",
content: "You are a helpful assistant.",
},
// Set a user message for the assistant to respond to.
{
role: "user",
content: "Explain the importance of fast language models",
},
],
// The language model which will generate the completion.
model: "openai/gpt-oss-20b",
//
// Optional parameters
//
// Controls randomness: lowering results in less random completions.
// As the temperature approaches zero, the model will become deterministic
// and repetitive.
temperature:0.5,
// The maximum number of tokens to generate. Requests can use up to
//2048 tokens shared between prompt and completion.
max_completion_tokens:1024,
// Controls diversity via nucleus sampling:0.5 means half of all
// likelihood-weighted options are considered.
top_p:1,
// A stop sequence is a predefined or user-specified text string that
// signals an AI to stop generating content, ensuring its responses
// remain focused and concise. Examples include punctuation marks and
// markers like "[end]".
stop: null,
// If set, partial message deltas will be sent.
stream: true,
});
}
main();
---
## pip install pydantic
URL: https://console.groq.com/docs/text-chat/scripts/complex-schema-example.py
```python
import os
from typing import List, Optional, Dict, Union
from pydantic import BaseModel, Field
from groq import Groq
import instructor
# Set up the client with instructor
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
instructor_client = instructor.patch(client)
# Define a complex nested schema
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str
country: str
class ContactInfo(BaseModel):
email: str
phone: Optional[str] = None
address: Address
class ProductVariant(BaseModel):
id: str
name: str
price: float
inventory_count: int
attributes: Dict[str, str]
class ProductReview(BaseModel):
user_id: str
rating: float = Field(ge=1, le=5)
comment: str
date: str
class Product(BaseModel):
id: str
name: str
description: str
main_category: str
subcategories: List[str]
variants: List[ProductVariant]
reviews: List[ProductReview]
average_rating: float = Field(ge=1, le=5)
manufacturer: Dict[str, Union[str, ContactInfo]]
# System prompt with clear instructions about the complex structure
system_prompt = """
You are a product catalog API. Generate a detailed product with ALL required fields.
Your response must be a valid JSON object matching the following schema:
{
"id": "string",
"name": "string",
"description": "string",
"main_category": "string",
"subcategories": ["string"],
"variants": [
{
"id": "string",
"name": "string",
"price": number,
"inventory_count": number,
"attributes": {"key": "value"}
}
],
"reviews": [
{
"user_id": "string",
"rating": number (1-5),
"comment": "string",
"date": "string (YYYY-MM-DD)"
}
],
"average_rating": number (1-5),
"manufacturer": {
"name": "string",
"founded": "string",
"contact_info": {
"email": "string",
"phone": "string (optional)",
"address": {
"street": "string",
"city": "string",
"state": "string",
"zip_code": "string",
"country": "string"
}
}
}
}
"""
# Use instructor to create and validate in one step
product = instructor_client.chat.completions.create(
model="llama-3.3-70b-versatile",
response_model=Product,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Give me details about a high-end camera product"}
],
max_retries=3
)
# Print the validated complex object
print(f"Product: {product.name}")
print(f"Description: {product.description[:100]}...")
print(f"Variants: {len(product.variants)}")
print(f"Reviews: {len(product.reviews)}")
print(f"Manufacturer: {product.manufacturer.get('name')}")
print("\nManufacturer Contact:")
contact_info = product.manufacturer.get('contact_info')
if isinstance(contact_info, ContactInfo):
print(f" Email: {contact_info.email}")
print(f" Address: {contact_info.address.city}, {contact_info.address.country}")
```
---
## Text Chat: Basic Validation Zod (js)
URL: https://console.groq.com/docs/text-chat/scripts/basic-validation-zod
import { Groq } from "groq-sdk";
import { z } from "zod";
const client = new Groq();
// Define a schema with Zod
const ProductSchema = z.object({
id: z.string(),
name: z.string(),
price: z.number().positive(),
description: z.string(),
in_stock: z.boolean(),
tags: z.array(z.string()).default([]),
});
// Create a prompt that clearly defines the expected structure
const systemPrompt = `
You are a product catalog assistant. When asked about products,
always respond with valid JSON objects that match this structure:
{
"id": "string",
"name": "string",
"price": number,
"description": "string",
"in_stock": boolean,
"tags": ["string"]
}
Your response should ONLY contain the JSON object and nothing else.
`;
async function getStructuredResponse() {
try {
// Request structured data from the model
const completion = await client.chat.completions.create({
model: "openai/gpt-oss-20b",
response_format: { type: "json_object" },
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: "Tell me about a popular smartphone product" },
],
});
// Extract the response
const responseContent = completion.choices[0].message.content;
// Parse and validate JSON
const jsonData = JSON.parse(responseContent || "");
const validatedData = ProductSchema.parse(jsonData);
console.log("Validation successful! Structured data:");
console.log(JSON.stringify(validatedData, null,2));
return validatedData;
} catch (error) {
if (error instanceof z.ZodError) {
console.error("Schema validation failed:", error.errors);
} else if (error instanceof SyntaxError) {
console.error("JSON parsing failed: The model did not return valid JSON");
} else {
console.error("Error:", error);
}
}
}
// Run the example
getStructuredResponse();
---
## Text Chat: System Prompt (js)
URL: https://console.groq.com/docs/text-chat/scripts/system-prompt
```javascript
import { Groq } from "groq-sdk";
const groq = new Groq();
async function main() {
const response = await groq.chat.completions.create({
model: "llama-3.1-8b-instant",
messages: [
{
role: "system",
content: `You are a data analysis API that performs sentiment analysis on text.
Respond only with JSON using this format:
{
"sentiment_analysis": {
"sentiment": "positive|negative|neutral",
"confidence_score":0.95,
"key_phrases": [
{
"phrase": "detected key phrase",
"sentiment": "positive|negative|neutral"
}
],
"summary": "One sentence summary of the overall sentiment"
}
}`
},
{ role: "user", content: "Analyze the sentiment of this customer review: 'I absolutely love this product! The quality exceeded my expectations, though shipping took longer than expected.'" }
],
response_format: { type: "json_object" }
});
console.log(response.choices[0].message.content);
}
main();
```
---
## Text Chat: Json Mode (js)
URL: https://console.groq.com/docs/text-chat/scripts/json-mode
import Groq from "groq-sdk";
const groq = new Groq();
// Define the JSON schema for recipe objects
// This is the schema that the model will use to generate the JSON object,
// which will be parsed into the Recipe class.
const schema = {
$defs: {
Ingredient: {
properties: {
name: { title: "Name", type: "string" },
quantity: { title: "Quantity", type: "string" },
quantity_unit: {
anyOf: [{ type: "string" }, { type: "null" }],
title: "Quantity Unit",
},
},
required: ["name", "quantity", "quantity_unit"],
title: "Ingredient",
type: "object",
},
},
properties: {
recipe_name: { title: "Recipe Name", type: "string" },
ingredients: {
items: { $ref: "#/$defs/Ingredient" },
title: "Ingredients",
type: "array",
},
directions: {
items: { type: "string" },
title: "Directions",
type: "array",
},
},
required: ["recipe_name", "ingredients", "directions"],
title: "Recipe",
type: "object",
};
// Ingredient class representing a single recipe ingredient
class Ingredient {
constructor(name, quantity, quantity_unit) {
this.name = name;
this.quantity = quantity;
this.quantity_unit = quantity_unit || null;
}
}
// Recipe class representing a complete recipe
class Recipe {
constructor(recipe_name, ingredients, directions) {
this.recipe_name = recipe_name;
this.ingredients = ingredients;
this.directions = directions;
}
}
// Generates a recipe based on the recipe name
export async function getRecipe(recipe_name) {
// Pretty printing improves completion results
const jsonSchema = JSON.stringify(schema, null,4);
const chat_completion = await groq.chat.completions.create({
messages: [
{
role: "system",
content: `You are a recipe database that outputs recipes in JSON.\n'The JSON object must use the schema: ${jsonSchema}`,
},
{
role: "user",
content: `Fetch a recipe for ${recipe_name}`,
},
],
model: "openai/gpt-oss-20b",
temperature:0,
stream: false,
response_format: { type: "json_object" },
});
const recipeJson = JSON.parse(chat_completion.choices[0].message.content);
// Map the JSON ingredients to the Ingredient class
const ingredients = recipeJson.ingredients.map((ingredient) => {
return new Ingredient(ingredient.name, ingredient.quantity, ingredient.quantity_unit);
});
// Return the recipe object
return new Recipe(recipeJson.recipe_name, ingredients, recipeJson.directions);
}
// Prints a recipe to the console with nice formatting
function printRecipe(recipe) {
console.log("Recipe:", recipe.recipe_name);
console.log();
console.log("Ingredients:");
recipe.ingredients.forEach((ingredient) => {
console.log(
`- ${ingredient.name}: ${ingredient.quantity} ${
ingredient.quantity_unit || ""
}`,
);
});
console.log();
console.log("Directions:");
recipe.directions.forEach((direction, step) => {
console.log(`${step +1}. ${direction}`);
});
}
// Main function that generates and prints a recipe
export async function main() {
const recipe = await getRecipe("apple pie");
printRecipe(recipe);
}
main();
---
## Text Chat: Complex Schema Example.doc (ts)
URL: https://console.groq.com/docs/text-chat/scripts/complex-schema-example.doc
import Instructor from "@instructor-ai/instructor"; // npm install @instructor-ai/instructor
import { Groq } from "groq-sdk";
import { z } from "zod"; // npm install zod
// Set up the client with Instructor
const groq = new Groq();
const instructor = Instructor({
client: groq,
mode: "TOOLS"
})
// Define a complex nested schema
const AddressSchema = z.object({
street: z.string(),
city: z.string(),
state: z.string(),
zip_code: z.string(),
country: z.string(),
});
const ContactInfoSchema = z.object({
email: z.string().email(),
phone: z.string().optional(),
address: AddressSchema,
});
const ProductVariantSchema = z.object({
id: z.string(),
name: z.string(),
price: z.number().positive(),
inventory_count: z.number().int().nonnegative(),
attributes: z.record(z.string()),
});
const ProductReviewSchema = z.object({
user_id: z.string(),
rating: z.number().min(1).max(5),
comment: z.string(),
date: z.string(),
});
const ManufacturerSchema = z.object({
name: z.string(),
founded: z.string(),
contact_info: ContactInfoSchema,
});
const ProductSchema = z.object({
id: z.string(),
name: z.string(),
description: z.string(),
main_category: z.string(),
subcategories: z.array(z.string()),
variants: z.array(ProductVariantSchema),
reviews: z.array(ProductReviewSchema),
average_rating: z.number().min(1).max(5),
manufacturer: ManufacturerSchema,
});
// Infer TypeScript types from Zod schemas
type Product = z.infer;
// System prompt with clear instructions about the complex structure
const systemPrompt = `
You are a product catalog API. Generate a detailed product with ALL required fields.
Your response must be a valid JSON object matching the schema I will use to validate it.
`;
async function getComplexProduct(): Promise {
try {
// Use instructor to create and validate in one step
const product = await instructor.chat.completions.create({
model: "openai/gpt-oss-20b",
response_model: {
name: "Product",
schema: ProductSchema,
},
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: "Give me details about a high-end camera product" },
],
max_retries:3,
});
// Print the validated complex object
console.log(`Product: ${product.name}`);
console.log(`Description: ${product.description.substring(0,100)}...`);
console.log(`Variants: ${product.variants.length}`);
console.log(`Reviews: ${product.reviews.length}`);
console.log(`Manufacturer: ${product.manufacturer.name}`);
console.log(`\nManufacturer Contact:`);
console.log(` Email: ${product.manufacturer.contact_info.email}`);
console.log(` Address: ${product.manufacturer.contact_info.address.city}, ${product.manufacturer.contact_info.address.country}`);
return product;
} catch (error) {
console.error("Error:", error);
return undefined;
}
}
// Run the example
getComplexProduct();
---
## Set your API key
URL: https://console.groq.com/docs/text-chat/scripts/prompt-engineering.py
```python
import os
import json
from groq import Groq
# Set your API key
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
# Example of a poorly designed prompt
poor_prompt = """
Give me information about a movie in JSON format.
"""
# Example of a well-designed prompt
effective_prompt = """
You are a movie database API. Return information about a movie with the following
JSON structure:
{
"title": "string",
"year": number,
"director": "string",
"genre": ["string"],
"runtime_minutes": number,
"rating": number (1-10 scale),
"box_office_millions": number,
"cast": [
{
"actor": "string",
"character": "string"
}
]
}
The response must:
1. Include ALL fields shown above
2. Use only the exact field names shown
3. Follow the exact data types specified
4. Contain ONLY the JSON object and nothing else
IMPORTANT: Do not include any explanatory text, markdown formatting, or code blocks.
"""
# Function to run the completion and display results
def get_movie_data(prompt, title="Example"):
print(f"\n--- {title} ---")
completion = client.chat.completions.create(
model="llama-3.3-70b-versatile",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": prompt},
{"role": "user", "content": "Tell me about The Matrix"}
]
)
response_content = completion.choices[0].message.content
print("Raw response:")
print(response_content)
# Try to parse as JSON
try:
movie_data = json.loads(response_content)
print("\nSuccessfully parsed as JSON!")
# Check for expected fields
expected_fields = ["title", "year", "director", "genre",
"runtime_minutes", "rating", "box_office_millions", "cast"]
missing_fields = [field for field in expected_fields if field not in movie_data]
if missing_fields:
print(f"Missing fields: {', '.join(missing_fields)}")
else:
print("All expected fields present!")
except json.JSONDecodeError:
print("\nFailed to parse as JSON. Response is not valid JSON.")
# Compare the results of both prompts
get_movie_data(poor_prompt, "Poor Prompt Example")
get_movie_data(effective_prompt, "Effective Prompt Example")
```
---
## Text Chat: Basic Chat Completion (js)
URL: https://console.groq.com/docs/text-chat/scripts/basic-chat-completion
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
const completion = await getGroqChatCompletion();
console.log(completion.choices[0]?.message?.content || "");
}
export const getGroqChatCompletion = async () => {
return groq.chat.completions.create({
messages: [
// Set an optional system message. This sets the behavior of the
// assistant and can be used to provide specific instructions for
// how it should behave throughout the conversation.
{
role: "system",
content: "You are a helpful assistant.",
},
// Set a user message for the assistant to respond to.
{
role: "user",
content: "Explain the importance of fast language models",
},
],
model: "openai/gpt-oss-20b",
});
};
main();
---
## Required parameters
URL: https://console.groq.com/docs/text-chat/scripts/performing-async-chat-completion.py
```python
import asyncio
from groq import AsyncGroq
async def main():
client = AsyncGroq()
chat_completion = await client.chat.completions.create(
#
# Required parameters
#
messages=[
# Set an optional system message. This sets the behavior of the
# assistant and can be used to provide specific instructions for
# how it should behave throughout the conversation.
{
"role": "system",
"content": "You are a helpful assistant."
},
# Set a user message for the assistant to respond to.
{
"role": "user",
"content": "Explain the importance of fast language models"
}
],
# The language model which will generate the completion.
model="llama-3.3-70b-versatile",
#
# Optional parameters
#
# Controls randomness: lowering results in less random completions.
# As the temperature approaches zero, the model will become
# deterministic and repetitive.
temperature=0.5,
# The maximum number of tokens to generate. Requests can use up to
#2048 tokens shared between prompt and completion.
max_completion_tokens=1024,
# Controls diversity via nucleus sampling:0.5 means half of all
# likelihood-weighted options are considered.
top_p=1,
# A stop sequence is a predefined or user-specified text string that
# signals an AI to stop generating content, ensuring its responses
# remain focused and concise. Examples include punctuation marks and
# markers like "[end]".
stop=None,
# If set, partial message deltas will be sent.
stream=False,
)
# Print the completion returned by the LLM.
print(chat_completion.choices[0].message.content)
asyncio.run(main())
```
---
## pip install pydantic
URL: https://console.groq.com/docs/text-chat/scripts/basic-validation-zod.py
```python
import os
import json
from groq import Groq
from pydantic import BaseModel, Field, ValidationError
from typing import List
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
# Define a schema with Pydantic (Python's equivalent to Zod)
class Product(BaseModel):
id: str
name: str
price: float
description: str
in_stock: bool
tags: List[str] = Field(default_factory=list)
# Prompt design is critical for structured outputs
system_prompt = """
You are a product catalog assistant. When asked about products,
always respond with valid JSON objects that match this structure:
{
"id": "string",
"name": "string",
"price": number,
"description": "string",
"in_stock": boolean,
"tags": ["string"]
}
Your response should ONLY contain the JSON object and nothing else.
"""
# Request structured data from the model
completion = client.chat.completions.create(
model="llama-3.3-70b-versatile",
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Tell me about a popular smartphone product"}
]
)
# Extract and validate the response
try:
response_content = completion.choices[0].message.content
# Parse JSON
json_data = json.loads(response_content)
# Validate against schema
product = Product(**json_data)
print("Validation successful! Structured data:")
print(product.model_dump_json(indent=2))
except json.JSONDecodeError:
print("Error: The model did not return valid JSON")
except ValidationError as e:
print(f"Error: The JSON did not match the expected schema: {e}")
```
---
## Text Chat: Instructor Example.doc (ts)
URL: https://console.groq.com/docs/text-chat/scripts/instructor-example.doc
import Instructor from "@instructor-ai/instructor"; // npm install @instructor-ai/instructor
import { Groq } from "groq-sdk";
import { z } from "zod"; // npm install zod
// Set up the Groq client with Instructor
const client = new Groq();
const instructor = Instructor({
client,
mode: "TOOLS"
});
// Define your schema with Zod
const RecipeIngredientSchema = z.object({
name: z.string(),
quantity: z.string(),
unit: z.string().describe("The unit of measurement, like cup, tablespoon, etc."),
});
const RecipeSchema = z.object({
title: z.string(),
description: z.string(),
prep_time_minutes: z.number().int().positive(),
cook_time_minutes: z.number().int().positive(),
ingredients: z.array(RecipeIngredientSchema),
instructions: z.array(z.string()).describe("Step by step cooking instructions"),
});
// Infer TypeScript types from Zod schemas
type Recipe = z.infer;
async function getRecipe(): Promise {
try {
// Request structured data with automatic validation
const recipe = await instructor.chat.completions.create({
model: "openai/gpt-oss-20b",
response_model: {
name: "Recipe",
schema: RecipeSchema,
},
messages: [
{ role: "user", content: "Give me a recipe for chocolate chip cookies" },
],
max_retries:2, // Instructor will retry if validation fails
});
// No need for try/catch or manual validation - instructor handles it!
console.log(`Recipe: ${recipe.title}`);
console.log(`Prep time: ${recipe.prep_time_minutes} minutes`);
console.log(`Cook time: ${recipe.cook_time_minutes} minutes`);
console.log("\nIngredients:");
recipe.ingredients.forEach((ingredient) => {
console.log(`- ${ingredient.quantity} ${ingredient.unit} ${ingredient.name}`);
});
console.log("\nInstructions:");
recipe.instructions.forEach((step, index) => {
console.log(`${index +1}. ${step}`);
});
return recipe;
} catch (error) {
console.error("Error:", error);
return undefined;
}
}
// Run the example
getRecipe();
---
## Text Chat: System Prompt (py)
URL: https://console.groq.com/docs/text-chat/scripts/system-prompt.py
from groq import Groq
client = Groq()
response = client.chat.completions.create(
model="llama-3.1-8b-instant",
messages=[
{
"role": "system",
"content": "You are a data analysis API that performs sentiment analysis on text. Respond only with JSON using this format: {\"sentiment_analysis\": {\"sentiment\": \"positive|negative|neutral\", \"confidence_score\":0.95, \"key_phrases\": [{\"phrase\": \"detected key phrase\", \"sentiment\": \"positive|negative|neutral\"}], \"summary\": \"One sentence summary of the overall sentiment\"}}"
},
{
"role": "user",
"content": "Analyze the sentiment of this customer review: 'I absolutely love this product! The quality exceeded my expectations, though shipping took longer than expected.'"
}
],
response_format={"type": "json_object"}
)
print(response.choices[0].message.content)
---
## Text Chat: Prompt Engineering (js)
URL: https://console.groq.com/docs/text-chat/scripts/prompt-engineering
```javascript
import { Groq } from "groq-sdk";
const client = new Groq();
// Example of a poorly designed prompt
const poorPrompt = `
Give me information about a movie in JSON format.
`;
// Example of a well-designed prompt
const effectivePrompt = `
You are a movie database API. Return information about a movie with the following
JSON structure:
{
"title": "string",
"year": number,
"director": "string",
"genre": ["string"],
"runtime_minutes": number,
"rating": number (1-10 scale),
"box_office_millions": number,
"cast": [
{
"actor": "string",
"character": "string"
}
]
}
The response must:
1. Include ALL fields shown above
2. Use only the exact field names shown
3. Follow the exact data types specified
4. Contain ONLY the JSON object and nothing else
IMPORTANT: Do not include any explanatory text, markdown formatting, or code blocks.
`;
// Function to run the completion and display results
async function getMovieData(prompt, title = "Example") {
console.log(`\n--- ${title} ---`);
try {
const completion = await client.chat.completions.create({
model: "openai/gpt-oss-20b",
response_format: { type: "json_object" },
messages: [
{ role: "system", content: prompt },
{ role: "user", content: "Tell me about The Matrix" },
],
});
const responseContent = completion.choices[0].message.content;
console.log("Raw response:");
console.log(responseContent);
// Try to parse as JSON
try {
const movieData = JSON.parse(responseContent || "");
console.log("\nSuccessfully parsed as JSON!");
// Check for expected fields
const expectedFields = ["title", "year", "director", "genre",
"runtime_minutes", "rating", "box_office_millions", "cast"];
const missingFields = expectedFields.filter(field => !(field in movieData));
if (missingFields.length >0) {
console.log(`Missing fields: ${missingFields.join(', ')}`);
} else {
console.log("All expected fields present!");
}
return movieData;
} catch (syntaxError) {
console.log("\nFailed to parse as JSON. Response is not valid JSON.");
return null;
}
} catch (error) {
console.error("Error:", error);
return null;
}
}
// Compare the results of both prompts
async function comparePrompts() {
await getMovieData(poorPrompt, "Poor Prompt Example");
await getMovieData(effectivePrompt, "Effective Prompt Example");
}
// Run the examples
comparePrompts();
```
---
## pip install pydantic
URL: https://console.groq.com/docs/text-chat/scripts/instructor-example.py
```python
import os
from typing import List
from pydantic import BaseModel, Field
import instructor
from groq import Groq
# Set up instructor with Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
# Patch the client with instructor
instructor_client = instructor.patch(client)
# Define your schema with Pydantic
class RecipeIngredient(BaseModel):
name: str
quantity: str
unit: str = Field(description="The unit of measurement, like cup, tablespoon, etc.")
class Recipe(BaseModel):
title: str
description: str
prep_time_minutes: int
cook_time_minutes: int
ingredients: List[RecipeIngredient]
instructions: List[str] = Field(description="Step by step cooking instructions")
# Request structured data with automatic validation
recipe = instructor_client.chat.completions.create(
model="llama-3.3-70b-versatile",
response_model=Recipe,
messages=[
{"role": "user", "content": "Give me a recipe for chocolate chip cookies"}
],
max_retries=2
)
# No need for try/except or manual validation - instructor handles it!
print(f"Recipe: {recipe.title}")
print(f"Prep time: {recipe.prep_time_minutes} minutes")
print(f"Cook time: {recipe.cook_time_minutes} minutes")
print("\nIngredients:")
for ingredient in recipe.ingredients:
print(f"- {ingredient.quantity} {ingredient.unit} {ingredient.name}")
print("\nInstructions:")
for i, step in enumerate(recipe.instructions,1):
print(f"{i}. {step}")
```
---
## Text Chat: Streaming Chat Completion With Stop (js)
URL: https://console.groq.com/docs/text-chat/scripts/streaming-chat-completion-with-stop
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
const stream = await getGroqChatStream();
for await (const chunk of stream) {
// Print the completion returned by the LLM.
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
}
export async function getGroqChatStream() {
return groq.chat.completions.create({
//
// Required parameters
//
messages: [
// Set an optional system message. This sets the behavior of the
// assistant and can be used to provide specific instructions for
// how it should behave throughout the conversation.
{
role: "system",
content: "You are a helpful assistant.",
},
// Set a user message for the assistant to respond to.
{
role: "user",
content:
"Start at1 and count to10. Separate each number with a comma and a space",
},
],
// The language model which will generate the completion.
model: "llama-3.3-70b-versatile",
//
// Optional parameters
//
// Controls randomness: lowering results in less random completions.
// As the temperature approaches zero, the model will become deterministic
// and repetitive.
temperature:0.5,
// The maximum number of tokens to generate. Requests can use up to
//2048 tokens shared between prompt and completion.
max_completion_tokens:1024,
// Controls diversity via nucleus sampling:0.5 means half of all
// likelihood-weighted options are considered.
top_p:1,
// A stop sequence is a predefined or user-specified text string that
// signals an AI to stop generating content, ensuring its responses
// remain focused and concise. Examples include punctuation marks and
// markers like "[end]".
//
// For this example, we will use ",6" so that the llm stops counting at5.
// If multiple stop values are needed, an array of string may be passed,
// stop: [",6", ", six", ", Six"]
stop: ",6",
// If set, partial message deltas will be sent.
stream: true,
});
}
main();
---
## Text Chat: Complex Schema Example (js)
URL: https://console.groq.com/docs/text-chat/scripts/complex-schema-example
```javascript
import Instructor from "@instructor-ai/instructor"; // npm install @instructor-ai/instructor
import { Groq } from "groq-sdk";
import { z } from "zod"; // npm install zod
// Set up the client with Instructor
const groq = new Groq();
const instructor = Instructor({
client: groq,
mode: "TOOLS"
})
// Define a complex nested schema
const AddressSchema = z.object({
street: z.string(),
city: z.string(),
state: z.string(),
zip_code: z.string(),
country: z.string(),
});
const ContactInfoSchema = z.object({
email: z.string().email(),
phone: z.string().optional(),
address: AddressSchema,
});
const ProductVariantSchema = z.object({
id: z.string(),
name: z.string(),
price: z.number().positive(),
inventory_count: z.number().int().nonnegative(),
attributes: z.record(z.string()),
});
const ProductReviewSchema = z.object({
user_id: z.string(),
rating: z.number().min(1).max(5),
comment: z.string(),
date: z.string(),
});
const ManufacturerSchema = z.object({
name: z.string(),
founded: z.string(),
contact_info: ContactInfoSchema,
});
const ProductSchema = z.object({
id: z.string(),
name: z.string(),
description: z.string(),
main_category: z.string(),
subcategories: z.array(z.string()),
variants: z.array(ProductVariantSchema),
reviews: z.array(ProductReviewSchema),
average_rating: z.number().min(1).max(5),
manufacturer: ManufacturerSchema,
});
// System prompt with clear instructions about the complex structure
const systemPrompt = `
You are a product catalog API. Generate a detailed product with ALL required fields.
Your response must be a valid JSON object matching the schema I will use to validate it.
`;
async function getComplexProduct() {
try {
// Use instructor to create and validate in one step
const product = await instructor.chat.completions.create({
model: "openai/gpt-oss-20b",
response_model: {
name: "Product",
schema: ProductSchema,
},
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: "Give me details about a high-end camera product" },
],
max_retries:3,
});
// Print the validated complex object
console.log(`Product: ${product.name}`);
console.log(`Description: ${product.description.substring(0,100)}...`);
console.log(`Variants: ${product.variants.length}`);
console.log(`Reviews: ${product.reviews.length}`);
console.log(`Manufacturer: ${product.manufacturer.name}`);
console.log(`\nManufacturer Contact:`);
console.log(` Email: ${product.manufacturer.contact_info.email}`);
console.log(` Address: ${product.manufacturer.contact_info.address.city}, ${product.manufacturer.contact_info.address.country}`);
return product;
} catch (error) {
console.error("Error:", error);
}
}
// Run the example
getComplexProduct();
```
---
## Text Generation
URL: https://console.groq.com/docs/text-chat
# Text Generation
Generating text with Groq's Chat Completions API enables you to have natural, conversational interactions with Groq's large language models. It processes a series of messages and generates human-like responses that can be used for various applications including conversational agents, content generation, task automation, and generating structured data outputs like JSON for your applications.
## Chat Completions
Chat completions allow your applications to have dynamic interactions with Groq's models. You can send messages that include user inputs and system instructions, and receive responses that match the conversational context.
Chat models can handle both multi-turn discussions (conversations with multiple back-and-forth exchanges) and single-turn tasks where you need just one response.
For details about all available parameters, [visit the API reference page.](https://console.groq.com/docs/api-reference#chat-create)
### Getting Started with Groq SDK
To start using Groq's Chat Completions API, you'll need to install the [Groq SDK](/docs/libraries) and set up your [API key](https://console.groq.com/keys).
## Performing a Basic Chat Completion
The simplest way to use the Chat Completions API is to send a list of messages and receive a single response. Messages are provided in chronological order, with each message containing a role ("system", "user", or "assistant") and content.
## Streaming a Chat Completion
For a more responsive user experience, you can stream the model's response in real-time. This allows your application to display the response as it's being generated, rather than waiting for the complete response.
To enable streaming, set the parameter `stream=True`. The completion function will then return an iterator of completion deltas rather than a single, full completion.
## Performing a Chat Completion with a Stop Sequence
Stop sequences allow you to control where the model should stop generating. When the model encounters any of the specified stop sequences, it will halt generation at that point. This is useful when you need responses to end at specific points.
## Performing an Async Chat Completion
For applications that need to maintain responsiveness while waiting for completions, you can use the asynchronous client. This lets you make non-blocking API calls using Python's asyncio framework.
### Streaming an Async Chat Completion
You can combine the benefits of streaming and asynchronous processing by streaming completions asynchronously. This is particularly useful for applications that need to handle multiple concurrent conversations.
## Structured Outputs and JSON
Need reliable, type-safe JSON responses that match your exact schema? Groq's Structured Outputs feature is designed so that model responses strictly conform to your JSON Schema without validation or retry logic.
For complete guides on implementing structured outputs with JSON Schema or using JSON Object Mode, see our [structured outputs documentation](/docs/structured-outputs).
Key capabilities:
- **JSON Schema enforcement**: Responses match your schema exactly
- **Type-safe outputs**: No validation or retry logic needed
- **Programmatic refusal detection**: Handle safety-based refusals programmatically
- **JSON Object Mode**: Basic JSON output with prompt-guided structure
---
## Built In Tools: Enable Specific Tools (js)
URL: https://console.groq.com/docs/compound/built-in-tools/scripts/enable-specific-tools
import Groq from "groq-sdk";
const groq = new Groq();
const response = await groq.chat.completions.create({
model: "groq/compound",
messages: [
{
role: "user",
content: "Search for recent AI developments and then visit the Groq website"
}
],
compound_custom: {
tools: {
enabled_tools: ["web_search", "visit_website"]
}
}
});
---
## Built In Tools: Code Execution Only (py)
URL: https://console.groq.com/docs/compound/built-in-tools/scripts/code-execution-only.py
response = client.chat.completions.create(
model="groq/compound",
messages=[
{
"role": "user",
"content": "Calculate the square root of12345"
}
],
compound_custom={
"tools": {
"enabled_tools": ["code_interpreter"]
}
}
)
---
## Built In Tools: Enable Specific Tools (py)
URL: https://console.groq.com/docs/compound/built-in-tools/scripts/enable-specific-tools.py
from groq import Groq
client = Groq()
response = client.chat.completions.create(
model="groq/compound",
messages=[
{
"role": "user",
"content": "Search for recent AI developments and then visit the Groq website"
}
],
compound_custom={
"tools": {
"enabled_tools": ["web_search", "visit_website"]
}
}
)
---
## Built In Tools: Code Execution Only (js)
URL: https://console.groq.com/docs/compound/built-in-tools/scripts/code-execution-only
const response = await groq.chat.completions.create({
model: "groq/compound",
messages: [
{
role: "user",
content: "Calculate the square root of12345"
}
],
compound_custom: {
tools: {
enabled_tools: ["code_interpreter"]
}
}
});
---
## Built-in Tools
URL: https://console.groq.com/docs/compound/built-in-tools
# Built-in Tools
Compound systems come equipped with a comprehensive set of built-in tools that can be intelligently called to answer user queries. These tools not only expand the capabilities of language models by providing access to real-time information, computational power, and interactive environments, but also eliminate the need to build and maintain the underlying infrastructure for these tools yourself.
**Built-in tools with Compound systems are not HIPAA Covered Cloud Services under Groq's Business Associate Addendum at this time. These tools are also not available currently for use with regional / sovereign endpoints.**
## Default Tools
The tools enabled by default vary depending on your Compound system version:
| Version | Web Search | Code Execution | Visit Website |
|---------|------------|----------------|---------------|
| Newer than `2025-07-23` (Latest) | ✅ | ✅ | ✅ |
| `2025-07-23` (Default) | ✅ | ✅ | ❌ |
All tools are automatically enabled by default. Compound systems intelligently decide when to use each tool based on the user's query.
For more information on how to set your Compound system version, see the [Compound System Versioning](/docs/compound#system-versioning) page.
## Available Tools
These are all the available built-in tools on Groq's Compound systems.
| Tool | Description | Identifier |
|------|-------------|------------|
| [Web Search](/docs/web-search) | Access real-time web content and up-to-date information with automatic citations | `web_search` |
| [Visit Website](/docs/visit-website) | Fetch and analyze content from specific web pages | `visit_website` |
| [Browser Automation](/docs/browser-automation) | Interact with web pages through automated browser actions | `browser_automation` |
| [Code Execution](/docs/code-execution) | Execute Python code automatically in secure sandboxed environments | `code_interpreter` |
| [Wolfram Alpha](/docs/wolfram-alpha) | Access computational knowledge and mathematical calculations | `wolfram_alpha` |
Jump to the [Configuring Tools](#configuring-tools) section to learn how to enable specific tools via their identifiers.
## Configuring Tools
You can customize which tools are available to Compound systems using the `compound_custom.tools.enabled_tools` parameter.
This allows you to restrict or specify exactly which tools should be available for a particular request.
For a list of available tool identifiers, see the [Available Tools](#available-tools) section.
### Example: Enable Specific Tools
Enable specific tools using the following code examples:
* Python
```python
# Python code example
```
* JavaScript
```javascript
// JavaScript code example
```
* cURL
```bash
# cURL code example
```
### Example: Code Execution Only
Code execution only using the following code examples:
* Python
```python
# Python code example
```
* JavaScript
```javascript
// JavaScript code example
```
* cURL
```bash
# cURL code example
```
## Pricing
See the [Pricing](https://groq.com/pricing) page for detailed information on costs for each tool.
---
## Compound: Fact Checker.doc (ts)
URL: https://console.groq.com/docs/compound/scripts/fact-checker.doc
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
const user_query = "What were the main highlights from the latest Apple keynote event?"
// Or: "What's the current weather in San Francisco?"
// Or: "Summarize the latest developments in fusion energy research this week."
const completion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: user_query,
},
],
// The *only* change needed: Specify the compound model!
model: "groq/compound",
});
console.log(`Query: ${user_query}`);
console.log(`Compound Response:\n${completion.choices[0]?.message?.content || ""}`);
// You might also inspect chat_completion.choices[0].message.executed_tools
// if you want to see if/which tool was used, though it's not necessary.
}
main();
---
## Ensure your GROQ_API_KEY is set as an environment variable
URL: https://console.groq.com/docs/compound/scripts/fact-checker.py
import os
from groq import Groq
# Ensure your GROQ_API_KEY is set as an environment variable
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
user_query = "What were the main highlights from the latest Apple keynote event?"
# Or: "What's the current weather in San Francisco?"
# Or: "Summarize the latest developments in fusion energy research this week."
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": user_query,
}
],
# The *only* change needed: Specify the compound model!
model="groq/compound",
)
print(f"Query: {user_query}")
print(f"Compound Response:\n{chat_completion.choices[0].message.content}")
# You might also inspect chat_completion.choices[0].message.executed_tools
# if you want to see if/which tool was used, though it's not necessary.
---
## Compound: Executed Tools.doc (ts)
URL: https://console.groq.com/docs/compound/scripts/executed_tools.doc
import Groq from 'groq-sdk';
const groq = new Groq();
async function main() {
const response = await groq.chat.completions.create({
model: 'groq/compound',
messages: [
{
role: 'user',
content: 'What did Groq release last week?'
}
]
})
// Log the tools that were used to generate the response
console.log(response.choices[0].message.executed_tools)
}
main();
---
## Compound: Natural Language (js)
URL: https://console.groq.com/docs/compound/scripts/natural-language
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
// Example1: Calculation
const computationQuery = "Calculate the monthly payment for a $30,000 loan over5 years at6% annual interest.";
// Example2: Simple code execution
const codeQuery = "What is the output of this Python code snippet: `data = {'a':1, 'b':2}; print(data.keys())`";
// Choose one query to run
const selectedQuery = computationQuery;
const completion = await groq.chat.completions.create({
messages: [
{
role: "system",
content: "You are a helpful assistant capable of performing calculations and executing simple code when asked.",
},
{
role: "user",
content: selectedQuery,
}
],
// Use the compound model
model: "groq/compound-mini",
});
console.log(`Query: ${selectedQuery}`);
console.log(`Compound Mini Response:\n${completion.choices[0]?.message?.content || ""}`);
}
main();
---
## Compound: Code Debugger (js)
URL: https://console.groq.com/docs/compound/scripts/code-debugger
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
// Example1: Error Explanation (might trigger search)
const debugQuerySearch = "I'm getting a 'Kubernetes CrashLoopBackOff' error on my pod. What are the common causes based on recent discussions?";
// Example2: Code Check (might trigger code execution)
const debugQueryExec = "Will this Python code raise an error? `import numpy as np; a = np.array([1,2]); b = np.array([3,4,5]); print(a+b)`";
// Choose one query to run
const selectedQuery = debugQueryExec;
const completion = await groq.chat.completions.create({
messages: [
{
role: "system",
content: "You are a helpful coding assistant. You can explain errors, potentially searching for recent information, or check simple code snippets by executing them.",
},
{
role: "user",
content: selectedQuery,
}
],
// Use the compound model
model: "groq/compound-mini",
});
console.log(`Query: ${selectedQuery}`);
console.log(`Compound Mini Response:\n${completion.choices[0]?.message?.content || ""}`);
}
main();
---
## Compound: Version (py)
URL: https://console.groq.com/docs/compound/scripts/version.py
from groq import Groq
client = Groq(
default_headers={
"Groq-Model-Version": "latest"
}
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What is the weather today?",
}
],
model="groq/compound",
)
print(chat_completion.choices[0].message.content)
---
## Compound: Code Debugger.doc (ts)
URL: https://console.groq.com/docs/compound/scripts/code-debugger.doc
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
// Example1: Error Explanation (might trigger search)
const debugQuerySearch = "I'm getting a 'Kubernetes CrashLoopBackOff' error on my pod. What are the common causes based on recent discussions?";
// Example2: Code Check (might trigger code execution)
const debugQueryExec = "Will this Python code raise an error? `import numpy as np; a = np.array([1,2]); b = np.array([3,4,5]); print(a+b)`";
// Choose one query to run
const selectedQuery = debugQueryExec;
const completion = await groq.chat.completions.create({
messages: [
{
role: "system",
content: "You are a helpful coding assistant. You can explain errors, potentially searching for recent information, or check simple code snippets by executing them.",
},
{
role: "user",
content: selectedQuery,
}
],
// Use the compound model
model: "groq/compound-mini",
});
console.log(`Query: ${selectedQuery}`);
console.log(`Compound Response:\n${completion.choices[0]?.message?.content || ""}`);
}
main();
---
## Compound: Usage (js)
URL: https://console.groq.com/docs/compound/scripts/usage
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
const completion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "What is the current weather in Tokyo?",
},
],
// Change model to compound to use built-in tools
// model: "llama-3.3-70b-versatile",
model: "groq/compound",
});
console.log(completion.choices[0]?.message?.content || "");
// Print all tool calls
// console.log(completion.choices[0]?.message?.executed_tools || "");
}
main();
---
## Log the tools that were used to generate the response
URL: https://console.groq.com/docs/compound/scripts/executed_tools.py
import os
from groq import Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
response = client.chat.completions.create(
model="groq/compound",
messages=[
{"role": "user", "content": "What did Groq release last week?"}
]
)
# Log the tools that were used to generate the response
print(response.choices[0].message.executed_tools)
---
## Compound: Executed Tools (js)
URL: https://console.groq.com/docs/compound/scripts/executed_tools
import Groq from 'groq-sdk';
const groq = new Groq();
async function main() {
const response = await groq.chat.completions.create({
model: 'groq/compound',
messages: [
{
role: 'user',
content: 'What did Groq release last week?'
}
]
})
// Log the tools that were used to generate the response
console.log(response.choices[0].message.executed_tools)
}
main();
---
## Compound: Fact Checker (js)
URL: https://console.groq.com/docs/compound/scripts/fact-checker
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
const user_query = "What were the main highlights from the latest Apple keynote event?"
// Or: "What's the current weather in San Francisco?"
// Or: "Summarize the latest developments in fusion energy research this week."
const completion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: user_query,
},
],
// The *only* change needed: Specify the compound model!
model: "groq/compound",
});
console.log(`Query: ${user_query}`);
console.log(`Compound Response:\n${completion.choices[0]?.message?.content || ""}`);
// You might also inspect chat_completion.choices[0].message.executed_tools
// if you want to see if/which tool was used, though it's not necessary.
}
main();
---
## Example 1: Calculation
URL: https://console.groq.com/docs/compound/scripts/natural-language.py
```python
import os
from groq import Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
# Example1: Calculation
computation_query = "Calculate the monthly payment for a $30,000 loan over5 years at6% annual interest."
# Example2: Simple code execution
code_query = "What is the output of this Python code snippet: `data = {'a':1, 'b':2}; print(data.keys())`"
# Choose one query to run
selected_query = computation_query
chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a helpful assistant capable of performing calculations and executing simple code when asked.",
},
{
"role": "user",
"content": selected_query,
}
],
# Use the compound model
model="groq/compound-mini",
)
print(f"Query: {selected_query}")
print(f"Compound Mini Response:\n{chat_completion.choices[0].message.content}")
```
---
## Compound: Natural Language.doc (ts)
URL: https://console.groq.com/docs/compound/scripts/natural-language.doc
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
// Example1: Calculation
const computationQuery = "Calculate the monthly payment for a $30,000 loan over5 years at6% annual interest.";
// Example2: Simple code execution
const codeQuery = "What is the output of this Python code snippet: `data = {'a':1, 'b':2}; print(data.keys())`";
// Choose one query to run
const selectedQuery = computationQuery;
const completion = await groq.chat.completions.create({
messages: [
{
role: "system",
content: "You are a helpful assistant capable of performing calculations and executing simple code when asked.",
},
{
role: "user",
content: selectedQuery,
}
],
// Use the compound model
model: "groq/compound-mini",
});
console.log(`Query: ${selectedQuery}`);
console.log(`Compound Mini Response:\n${completion.choices[0]?.message?.content || ""}`);
}
main();
---
## Compound: Usage.doc (ts)
URL: https://console.groq.com/docs/compound/scripts/usage.doc
import Groq from "groq-sdk";
const groq = new Groq();
export async function main() {
const completion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "What is the current weather in Tokyo?",
},
],
// Change model to compound to use built-in tools
// model: "llama-3.3-70b-versatile",
model: "groq/compound",
});
console.log(completion.choices[0]?.message?.content || "");
// Print all tool calls
// console.log(completion.choices[0]?.message?.executed_tools || "");
}
main();
---
## Compound: Version (js)
URL: https://console.groq.com/docs/compound/scripts/version
import { Groq } from "groq-sdk";
const groq = new Groq({
defaultHeaders: {
"Groq-Model-Version": "latest"
}
});
const chatCompletion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "What is the weather today?",
},
],
model: "groq/compound",
});
console.log(chatCompletion.choices[0].message.content);
---
## Change model to compound to use built-in tools
URL: https://console.groq.com/docs/compound/scripts/usage.py
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What is the current weather in Tokyo?",
}
],
# Change model to compound to use built-in tools
# model: "llama-3.3-70b-versatile",
model="groq/compound",
)
print(completion.choices[0].message.content)
# Print all tool calls
# print(completion.choices[0].message.executed_tools)
---
## Example 1: Error Explanation (might trigger search)
URL: https://console.groq.com/docs/compound/scripts/code-debugger.py
```python
import os
from groq import Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
# Example1: Error Explanation (might trigger search)
debug_query_search = "I'm getting a 'Kubernetes CrashLoopBackOff' error on my pod. What are the common causes based on recent discussions?"
# Example2: Code Check (might trigger code execution)
debug_query_exec = "Will this Python code raise an error? `import numpy as np; a = np.array([1,2]); b = np.array([3,4,5]); print(a+b)`"
# Choose one query to run
selected_query = debug_query_exec
chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a helpful coding assistant. You can explain errors, potentially searching for recent information, or check simple code snippets by executing them.",
},
{
"role": "user",
"content": selected_query,
}
],
# Use the compound model
model="groq/compound-mini",
)
print(f"Query: {selected_query}")
print(f"Compound Mini Response:\n{chat_completion.choices[0].message.content}")
```
---
## Search Settings: Page (mdx)
URL: https://console.groq.com/docs/compound/search-settings
No content to display.
---
## Key Technical Specifications
URL: https://console.groq.com/docs/compound/systems/compound
### Key Technical Specifications
Compound is powered by Llama4 Scout and GPT-OSS120B for intelligent reasoning and tool use.
* **Model Architecture**:
+ Compound is powered by [Llama4 Scout](https://console.groq.com/docs/model/meta-llama/llama-4-scout-17b-16e-instruct) and [GPT-OSS120B](/docs/model/openai/gpt-oss-120b) for intelligent reasoning and tool use.
* **Performance Metrics**:
+ Groq developed a new evaluation benchmark for measuring search capabilities called [RealtimeEval](https://github.com/groq/realtime-eval).
+ This benchmark is designed to evaluate tool-using systems on current events and live data.
+ On the benchmark, Compound outperformed GPT-4o-search-preview and GPT-4o-mini-search-preview significantly.
###
Discover how to build powerful applications with real-time web search and code execution
### Key Use Cases
* **Realtime Web Search**:
+ Automatically access up-to-date information from the web using the built-in web search tool.
* **Code Execution**:
+ Execute Python code automatically using the code execution tool powered by [E2B](https://e2b.dev/).
* **Code Generation and Technical Tasks**:
+ Create AI tools for code generation, debugging, and technical problem-solving with high-quality multilingual support.
### Best Practices
* Use system prompts to improve steerability and reduce false refusals. Compound is designed to be highly steerable with appropriate system prompts.
* Consider implementing system-level protections like Llama Guard for input filtering and response validation.
* Deploy with appropriate safeguards when working in specialized domains or with critical content.
* Compound should not be used by customers for processing protected health information. It is not a HIPAA Covered Cloud Service under Groq's Business Associate Addendum for customers at this time.
### Quick Start
Experience the capabilities of `groq/compound` on Groq:
---
## Compound Beta: Page (mdx)
URL: https://console.groq.com/docs/compound/systems/compound-beta
No content to display.
---
## Compound Beta Mini: Page (mdx)
URL: https://console.groq.com/docs/compound/systems/compound-beta-mini
No content to clean. The provided content consists only of import and export statements, and a redirect function call, with no actual documentation content present.
---
## Key Technical Specifications
URL: https://console.groq.com/docs/compound/systems/compound-mini
### Key Technical Specifications
Compound mini is powered by Llama3.3 70B and GPT-OSS 120B for intelligent reasoning and tool use. Unlike groq/compound, it can only use one tool per request, but has an average of 3x lower latency.
Groq developed a new evaluation benchmark for measuring search capabilities called RealtimeEval. This benchmark is designed to evaluate tool-using systems on current events and live data. On the benchmark, Compound Mini outperformed GPT-4o-search-preview and GPT-4o-mini-search-preview significantly.
###
## Learn More About Agentic Tooling
Discover how to build powerful applications with real-time web search and code execution
##
### Key Use Cases
#### Realtime Web Search
Automatically access up-to-date information from the web using the built-in web search tool.
#### Code Execution
Execute Python code automatically using the code execution tool powered by E2B.
#### Code Generation and Technical Tasks
Create AI tools for code generation, debugging, and technical problem-solving with high-quality multilingual support.
##
### Best Practices
* Use system prompts to improve steerability and reduce false refusals. Compound mini is designed to be highly steerable with appropriate system prompts.
* Consider implementing system-level protections like Llama Guard for input filtering and response validation.
* Deploy with appropriate safeguards when working in specialized domains or with critical content.
### Quick Start
Experience the capabilities of `groq/compound-mini` on Groq:
---
## Systems
URL: https://console.groq.com/docs/compound/systems
# Systems
Groq offers two compound AI systems that intelligently use external tools to provide more accurate, up-to-date, and capable responses than traditional LLMs alone. Both systems support web search and code execution, but differ in their approach to tool usage.
- **[Compound](/docs/compound/systems/compound)** (`groq/compound`) - Full-featured system with up to10 tool calls per request
- **[Compound Mini](/docs/compound/systems/compound-mini)** (`groq/compound-mini`) - Streamlined system with up to1 tool call and average3x lower latency
Groq's compound AI systems should not be used by customers for processing protected health information as it is not a HIPAA Covered Cloud Service under Groq's Business Associate Addendum at this time.
## Getting Started
Both systems use the same API interface - simply change the `model` parameter to `groq/compound` or `groq/compound-mini` to get started.
## System Comparison
| Feature | Compound | Compound Mini |
|---------|---------------|-------------------|
| **Tool Calls per Request** | Up to10 | Up to1 |
| **Average Latency** | Standard |3x Lower |
| **Token Speed** | ~350 tps | ~350 tps |
| **Best For** | Complex multi-step tasks | Quick single-step queries |
## Key Differences
### Compound
- **Multiple Tool Calls**: Can perform up to **10 server-side tool calls** before returning an answer
- **Complex Workflows**: Ideal for tasks requiring multiple searches, code executions, or iterative problem-solving
- **Comprehensive Analysis**: Can gather information from multiple sources and perform multi-step reasoning
- **Use Cases**: Research tasks, complex data analysis, multi-part coding challenges
### Compound Mini
- **Single Tool Call**: Performs up to **1 server-side tool call** before returning an answer
- **Fast Response**: Average3x lower latency compared to Compound
- **Direct Answers**: Perfect for straightforward queries that need one piece of current information
- **Use Cases**: Quick fact-checking, single calculations, simple web searches
## Available Tools
Both systems support the same set of tools:
- **Web Search** - Access real-time information from the web
- **Code Execution** - Execute Python code automatically
- **Visit Website** - Access and analyze specific website content
- **Browser Automation** - Interact with web pages through automated browser actions
- **Wolfram Alpha** - Access computational knowledge and mathematical calculations
For more information about tool capabilities, see the [Built-in Tools](/docs/compound/built-in-tools) page.
## When to Choose Which System
### Choose Compound When:
- You need comprehensive research across multiple sources
- Your task requires iterative problem-solving
- You're building complex analytical workflows
- You need multi-step code generation and testing
### Choose Compound Mini When:
- You need quick answers to straightforward questions
- Latency is a critical factor for your application
- You're building real-time applications
- Your queries typically require only one tool call
---
## Use Cases
URL: https://console.groq.com/docs/compound/use-cases
# Use Cases
Groq's compound systems excel at a wide range of use cases, particularly when real-time information is required.
## Real-time Fact Checker and News Agent
Your application needs to answer questions or provide information that requires up-to-the-minute knowledge, such as:
- Latest news
- Current stock prices
- Recent events
- Weather updates
Building and maintaining your own web scraping or search API integration is complex and time-consuming.
### Solution with Compound
Simply send the user's query to `groq/compound`. If the query requires current information beyond its training data, it will automatically trigger its built-in web search tool to fetch relevant, live data before formulating the answer.
### Why It's Great
- Get access to real-time information instantly without writing any extra code for search integration
- Leverage Groq's speed for a real-time, responsive experience
### Code Example
### Why It's Great
- Provides a unified interface for getting code help
- Potentially draws on live web data for new errors
- Executes code directly for validation
- Speeds up the debugging process
**Note**: `groq/compound-mini` uses one tool per turn, so it might search OR execute, not both simultaneously in one response.
## Chart Generation
Need to quickly create data visualizations from natural language descriptions? Compound's code execution capabilities can help generate charts without writing visualization code directly.
### Solution with Compound
Describe the chart you want in natural language, and Compound will generate and execute the appropriate Python visualization code. The model automatically parses your request, generates the visualization code using libraries like matplotlib or seaborn, and returns the chart.
### Why It's Great
- Generate charts from simple natural language descriptions
- Supports common chart types (scatter, line, bar, etc.)
- Handles all visualization code generation and execution
- Customize data points, labels, colors, and layouts as needed
### Usage and Results
## Natural Language Calculator and Code Extractor
You want users to perform calculations, run simple data manipulations, or execute small code snippets using natural language commands within your application, without building a dedicated parser or execution environment.
### Solution with Compound
Frame the user's request as a task involving computation or code. `groq/compound-mini` can recognize these requests and use its secure code execution tool to compute the result.
### Why It's Great
- Effortlessly add computational capabilities
- Users can ask things like:
- "What's15% of $540?"
- "Calculate the standard deviation of [10,12,11,15,13]"
- "Run this python code: print('Hello from Compound!')"
## Code Debugging Assistant
Developers often need quick help understanding error messages or testing small code fixes. Searching documentation or running snippets requires switching contexts.
### Solution with Compound
Users can paste an error message and ask for explanations or potential causes. Compound Mini might use web search to find recent discussions or documentation about that specific error. Alternatively, users can provide a code snippet and ask "What's wrong with this code?" or "Will this Python code run: ...?". It can use code execution to test simple, self-contained snippets.
### Why It's Great
- Provides a unified interface for getting code help
- Potentially draws on live web data for new errors
- Executes code directly for validation
- Speeds up the debugging process
**Note**: `groq/compound-mini` uses one tool per turn, so it might search OR execute, not both simultaneously in one response.
---
## Compound
URL: https://console.groq.com/docs/compound
# Compound
While LLMs excel at generating text, Groq's Compound systems take the next step.
Compound is an advanced AI system that is designed to solve problems by taking action and intelligently uses external tools, such as web search and code execution, alongside the powerful [GPT-OSS120B](/docs/model/openai/gpt-oss-120b), [Llama4 Scout](/docs/model/meta-llama/llama-4-scout-17b-16e-instruct), and [Llama3.370B](/docs/model/llama-3.3-70b-versatile) models.
This allows it access to real-time information and interaction with external environments, providing more accurate, up-to-date, and capable responses than an LLM alone.
Groq's compound AI system should not be used by customers for processing protected health information as it is not a HIPAA Covered Cloud Service under Groq's Business Associate Addendum at this time. This system is also not available currently for use with regional / sovereign endpoints.
## Available Compound Systems
There are two compound systems available:
- [`groq/compound`](/docs/compound/systems/compound): supports multiple tool calls per request. This system is great for use cases that require multiple web searches or code executions per request.
- [`groq/compound-mini`](/docs/compound/systems/compound-mini): supports a single tool call per request. This system is great for use cases that require a single web search or code execution per request. `groq/compound-mini` has an average of3x lower latency than `groq/compound`.
Both systems support the following tools:
- [Web Search](/docs/web-search)
- [Visit Website](/docs/visit-website)
- [Code Execution](/docs/code-execution)
- [Browser Automation](/docs/browser-automation)
- [Wolfram Alpha](/docs/wolfram-alpha)
Custom [user-provided tools](/docs/tool-use) are not supported at this time.
## Quickstart
To use compound systems, change the `model` parameter to either `groq/compound` or `groq/compound-mini`:
And that's it!
When the API is called, it will intelligently decide when to use search or code execution to best answer the user's query.
These tool calls are performed on the server side, so no additional setup is required on your part to use built-in tools.
In the above example, the API will use its build in web search tool to find the current weather in Tokyo.
If you didn't use compound systems, you might have needed to add your own custom tools to make API requests to a weather service, then perform multiple API calls to Groq to get a final result.
Instead, with compound systems, you can get a final result with a single API call.
## Executed Tools
To view the tools (search or code execution) used automatically by the compound system, check the `executed_tools` field in the response:
## Model Usage Details
The `usage_breakdown` field in responses provides detailed information about all the underlying models used during the compound system's execution.
```json
"usage_breakdown": {
"models": [
{
"model": "llama-3.3-70b-versatile",
"usage": {
"queue_time":0.017298032,
"prompt_tokens":226,
"prompt_time":0.023959775,
"completion_tokens":16,
"completion_time":0.061639794,
"total_tokens":242,
"total_time":0.085599569
}
},
{
"model": "openai/gpt-oss-120b",
"usage": {
"queue_time":0.019125835,
"prompt_tokens":903,
"prompt_time":0.033082052,
"completion_tokens":873,
"completion_time":1.776467372,
"total_tokens":1776,
"total_time":1.809549424
}
}
]
}
```
## System Versioning
Compound systems support versioning through the `Groq-Model-Version` header. In most cases, you won't need to change anything since you'll automatically be on the latest stable version. To view the latest changes to the compound systems, see the [Compound Changelog](/docs/changelog/compound).
### Available Systems and Versions
| System | Default Version (no header) | Latest Version (`Groq-Model-Version: latest`) |
|--------|--------------------------------|---------------------------------------------------|
| [`groq/compound`](/docs/compound/systems/compound) | `2025-07-23` (stable) | `2025-08-16` (prerelease) |
| [`groq/compound-mini`](/docs/compound/systems/compound-mini) | `2025-07-23` (stable) | `2025-08-16` (prerelease) |
### Version Details
- **Default (no header)**: Uses version `2025-07-23`, the latest stable version that has been fully tested and deployed
- **Latest** (`Groq-Model-Version: latest`): Uses version `2025-08-16`, the prerelease version with the newest features before they're rolled out to everyone
To use a specific version, pass the version in the `Groq-Model-Version` header:
## What's Next?
Now that you understand the basics of compound systems, explore these topics:
- **[Systems](/docs/compound/systems)** - Learn about the two compound systems and when to use each one
- **[Built-in Tools](/docs/compound/built-in-tools)** - Learn about the built-in tools available in Groq's Compound systems
- **[Search Settings](/docs/web-search#search-settings)** - Customize web search behavior with domain filtering
- **[Use Cases](/docs/compound/use-cases)** - Explore practical applications and detailed examples
---
## Script: Openai Compat (py)
URL: https://console.groq.com/docs/scripts/openai-compat.py
import os
import openai
client = openai.OpenAI(
base_url="https://api.groq.com/openai/v1",
api_key=os.environ.get("GROQ_API_KEY")
)
---
## Script: Openai Compat (js)
URL: https://console.groq.com/docs/scripts/openai-compat
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.GROQ_API_KEY,
baseURL: "https://api.groq.com/openai/v1"
});
---
## API Error Codes and Responses
URL: https://console.groq.com/docs/errors
# API Error Codes and Responses
Our API uses standard HTTP response status codes to indicate the success or failure of an API request. In cases of errors, the body of the response will contain a JSON object with details about the error. Below are the error codes you may encounter, along with their descriptions and example response bodies.
## Success Codes
- **200 OK**: The request was successfully executed. No further action is needed.
## Client Error Codes
- **400 Bad Request**: The server could not understand the request due to invalid syntax. Review the request format and ensure it is correct.
- **401 Unauthorized**: The request was not successful because it lacks valid authentication credentials for the requested resource. Ensure the request includes the necessary authentication credentials and the api key is valid.
- **404 Not Found**: The requested resource could not be found. Check the request URL and the existence of the resource.
- **413 Request Entity Too Large**: The request body is too large. Please reduce the size of the request body.
- **422 Unprocessable Entity**: The request was well-formed but could not be followed due to semantic errors. Verify the data provided for correctness and completeness.
- **429 Too Many Requests**: Too many requests were sent in a given timeframe. Implement request throttling and respect rate limits.
- **498 Custom: Flex Tier Capacity Exceeded**: This is a custom status code we use and will return in the event that the flex tier is at capacity and the request won't be processed. You can try again later.
- **499 Custom: Request Cancelled**: This is a custom status code we use in our logs page to signify when the request is cancelled by the caller.
## Server Error Codes
- **500 Internal Server Error**: A generic error occurred on the server. Try the request again later or contact support if the issue persists.
- **502 Bad Gateway**: The server received an invalid response from an upstream server. This may be a temporary issue; retrying the request might resolve it.
- **503 Service Unavailable**: The server is not ready to handle the request, often due to maintenance or overload. Wait before retrying the request.
## Informational Codes
- **206 Partial Content**: Only part of the resource is being delivered, usually in response to range headers sent by the client. Ensure this is expected for the request being made.
## Error Object Explanation
When an error occurs, our API returns a structured error object containing detailed information about the issue. This section explains the components of the error object to aid in troubleshooting and error handling.
## Error Object Structure
The error object follows a specific structure, providing a clear and actionable message alongside an error type classification:
```json
{
"error": {
"message": "String - description of the specific error",
"type": "invalid_request_error"
}
}
```
## Components
- **`error` (object):** The primary container for error details.
- **`message` (string):** A descriptive message explaining the nature of the error, intended to aid developers in diagnosing the problem.
- **`type` (string):** A classification of the error type, such as `"invalid_request_error"`, indicating the general category of the problem encountered.
---
## FAQs
URL: https://console.groq.com/docs/billing-faqs
# FAQs
## Understanding Groq Billing Model
### How does Groq's billing cycle work?
Groq uses a monthly billing cycle, but for new users, we also apply progressive billing thresholds to help ease you into pay-as-you-go usage.
### How does Progressive Billing work?
When you first start using Groq on the Developer plan, your billing follows a progressive billing model. In this model, an invoice is automatically triggered and payment is deducted when your cumulative usage reaches specific thresholds: $1, $10, $100, $500, and $1,000.
**Special billing for customers in India:** Customers with a billing address in India have different progressive billing thresholds. For India customers, the thresholds are only $1, $10, and then $100 recurring. The $500 and $1,000 thresholds do not apply to India customers. Instead, after reaching the initial $1 and $10 thresholds, billing will continue to trigger every time usage reaches another $100 increment.
This helps you monitor early usage and ensures you're not surprised by a large first bill. These are one-time thresholds for most customers. Once you cross the $1,000 lifetime usage threshold, only monthly billing continues (this does not apply to India customers who continue with recurring $100 billing).
### What if I don't reach the next threshold?
If you don't reach the next threshold, your usage will be billed on your regular end-of-month invoice.
**Example:**
- You cross $1 → you're charged immediately.
- You then use $2 more for the entire month (lifetime usage = $3, still below $10).
- That $2 will be invoiced at the end of your monthly billing cycle, not immediately.
This ensures you're not repeatedly charged for small amounts and are charged only when hitting a lifetime cumulative threshold or when your billing period ends.
Once your lifetime usage crosses the $1,000 threshold, the progressive thresholds no longer apply. From this point forward, your account is billed solely on a monthly cycle. All future usage is accrued and billed once per month, with payment automatically deducted when the invoice is issued.
### When is payment withdrawn from my account?
Payment is withdrawn automatically from your connected payment method each time an invoice is issued. This can happen in two cases:
- **Progressive billing phase:** When your usage first crosses the $1, $10, $100, $500, or $1,000 thresholds. For customers in India, payment is withdrawn at $1, $10, and then every $100 thereafter (the $500 and $1,000 thresholds do not apply).
- **Monthly billing phase:** At the end of each monthly billing cycle.
> **Note:** We only bill you once your usage has reached at least $0.50. If you see a total charge of < $0.50 or you get an invoice for < $0.50, there is no action required on your end.
### Can I downgrade to the Free tier after I upgrade?
Yes. You are able to downgrade at any time in your account Settings under [**Billing**](/settings/billing)
> **Note:** When you downgrade, we will issue a final invoice for any outstanding usage not yet billed.
## Monitoring Your Spending & Usage
### How can I view my current usage and spending in real time?
You can monitor your usage and charges in near real-time directly within your Groq Cloud dashboard. Simply navigate to [**Dashboard** → **Usage**](/dashboard/usage)
This dashboard allows you to:
- Track your current usage across models
- Understand how your consumption aligns with pricing per model
### Can I set spending limits or receive budget alerts?
Yes, Groq provides Spend Limits to help you control your API costs. You can set automated spending limits and receive proactive usage alerts as you approach your defined budget thresholds. [**More details here**](/docs/spend-limits)
## Invoices, Billing Info & Credits
### Where can I find my past invoices and payment history?
You can view and download all your invoices and receipts in the Groq Console:
[**Settings** → **Billing** → **Manage Billing**](/settings/billing/manage)
### Can I change my billing info and payment method?
You can update your billing details anytime from the Groq Console:
[**Settings** → **Billing** → **Manage Billing**](/settings/billing/manage)
### What payment methods do you accept?
Groq accepts credit cards (Visa, MasterCard, American Express, Discover), United States bank accounts, and SEPA debit accounts as payment methods.
### Are there promotional credits, or trial offers?
Yes! We occasionally offer promotional credits, such as during hackathons and special events. We encourage you to visit our [**Groq Community**](https://community.groq.com/) page to learn more and stay updated on announcements.
If you're building a startup, you may be eligible for the [**Groq for Startups**](https://groq.com/groq-for-startups) program, which unlocks $10,000 in credits to help you scale faster.
## Common Billing Questions & Troubleshooting
### How are refunds handled, if applicable?
Refunds are handled on a case-by-case basis. Due to the specific circumstances involved in each situation, we recommend reaching out directly to our customer support team at **support@groq.com** for assistance. They will review your case and provide guidance.
### What if a user believes there's an error in their bill?
Check your console's Usage and Billing tab first. If you still believe there's an issue:
Please contact our customer support team immediately at **support@groq.com**. They will investigate the specific circumstances of your billing dispute and guide you through the resolution process.
### Under what conditions can my account be suspended due to billing issues?
Account suspension or restriction due to billing issues typically occurs when there's a prolonged period of non-payment or consistently failed payment attempts. However, the exact conditions and resolution process are handled on a case-by-case basis. If your account is impacted, or if you have concerns, please reach out to our customer support team directly at **support@groq.com** for specific guidance regarding your account status.
### What happens if my payment fails? Why did my payment fail?
You may attempt to retry the payment up to two times. Before doing so, we recommend updating your payment method to ensure successful processing. If the issue persists, please contact our support team at support@groq.com for further assistance. Failed payments may result in service suspension. We will email you to remind you of your unpaid invoice.
### What should I do if my billing question isn't answered in the FAQ?
Feel free to contact **support@groq.com**
---
Need help? Contact our support team at **support@groq.com** with details about your billing questions.
---
## Projects
URL: https://console.groq.com/docs/projects
# Projects
Projects provide organizations with a powerful framework for managing multiple applications, environments, and teams within a single Groq account. By organizing your work into projects, you can isolate workloads to gain granular control over resources, costs, access permissions, and usage tracking on a per-project basis.
## Why Use Projects?
- **Isolation and Organization:** Projects create logical boundaries between different applications, environments (development, staging, production), and use cases. This prevents resource conflicts and enables clear separation of concerns across your organization.
- **Cost Control and Visibility:** Track spending, usage patterns, and resource consumption at the project level. This granular visibility enables accurate cost allocation, budget management, and ROI analysis for specific initiatives.
- **Team Collaboration:** Control who can access what resources through project-based permissions. Teams can work independently within their projects while maintaining organizational oversight and governance.
- **Operational Excellence:** Configure rate limits, monitor performance, and debug issues at the project level. This enables optimized resource allocation and simplified troubleshooting workflows.
## Project Structure
Projects inherit settings and permissions from your organization while allowing project-specific customization. Your organization-level role determines your maximum permissions within any project.
Each project acts as an isolated workspace containing:
- **API Keys:** Project-specific credentials for secure access
- **Rate Limits:** Customizable quotas for each available model
- **Usage Data:** Consumption metrics, costs, and request logs
- **Team Access:** Role-based permissions for project members
The following are the roles that are inherited from your organization along with their permissions within a project:
- **Owner:** Full access to creating, updating, and deleting projects, modifying limits for models within projects, managing API keys, viewing usage and spending data across all projects, and managing project access.
- **Developer:** Currently same as Owner.
- **Reader:** Read-only access to projects and usage metrics, logs, and spending data.
## Getting Started
### Creating Your First Project
**1. Access Projects**: Navigate to the **Projects** section at the top lefthand side of the Console. You will see a dropdown that looks like **Organization** / **Projects**.
**2. Create Project:** Click the rightside **Projects** dropdown and click **Create Project** to create a new project by inputting a project name. You will also notice that there is an option to **Manage Projects** that will be useful later.
>
> **Note:** Create separate projects for development, staging, and production environments, and use descriptive, consistent naming conventions (e.g. "myapp-dev", "myapp-staging", "myapp-prod") to avoid conflicts and maintain clear project boundaries.
>
**3. Configure Settings**: Once you create a project, you will be able to see it in the dropdown and under **Manage Projects**. Click **Manage Projects** and click **View** to customize project rate limits.
>
> **Note:** Start with conservative limits for new projects, increase limits based on actual usage patterns and needs, and monitor usage regularly to adjust as needed.
>
**4. Generate API Keys:** Once you've configured your project and selected it in the dropdown, it will persist across the console. Any API keys generated will be specific to the project you have selected. Any logs will also be project-specific.
**5. Start Building:** Begin making API calls using your project-specific API credentials
### Project Selection
Use the project selector in the top navigation to switch between projects. All Console sections automatically filter to show data for the selected project:
- API Keys
- Batch Jobs
- Logs and Usage Analytics
## Rate Limit Management
### Understanding Rate Limits
Rate limits control the maximum number of requests your project can make to models within a specific time window. Rate limits are applied per project, meaning each project has its own separate quota that doesn't interfere with other projects in your organization.
Each project can be configured to have custom rate limits for every available model, which allows you to:
- Allocate higher limits to production projects
- Set conservative limits for experimental or development projects
- Customize limits based on specific use case requirements
Custom project rate limits can only be set to values equal to or lower than your organization's limits. Setting a custom rate limit for a project does not increase your organization's overall limits, it only allows you to set more restrictive limits for that specific project. Organization limits always take precedence and act as a ceiling for all project limits.
### Configuring Rate Limits
To configure rate limits for a project:
1. Navigate to **Projects** in your settings
2. Select the project you want to configure
3. Adjust the limits for each model as needed
### Example: Rate Limits Across Projects
Let's say you've created three projects for your application:
- myapp-prod for production
- myapp-staging for testing
- myapp-dev for development
**Scenario:**
- Organization Limit:100 requests per minute
- myapp-prod:80 requests per minute
- myapp-staging:30 requests per minute
- myapp-dev: Using default organization limits
**Here's how the rate limits work in practice:**
1. myapp-prod
- Can make up to80 requests per minute (custom project limit)
- Even if other projects are idle, cannot exceed80 requests per minute
- Contributing to the organization's total limit of100 requests per minute
2. myapp-staging
- Limited to30 requests per minute (custom project limit)
- Cannot exceed this limit even if organization has capacity
- Contributing to the organization's total limit of100 requests per minute
3. myapp-dev
- Inherits the organization limit of100 requests per minute
- Actual available capacity depends on usage from other projects
- If myapp-prod is using80 requests/min and myapp-staging is using15 requests/min, myapp-dev can only use5 requests/min
**What happens during high concurrent usage:**
If both myapp-prod and myapp-staging try to use their maximum configured limits simultaneously:
- myapp-prod attempts to use80 requests/min
- myapp-staging attempts to use30 requests/min
- Total attempted usage:110 requests/min
- Organization limit:100 requests/min
In this case, some requests will fail with rate limit errors because the combined usage exceeds the organization's limit. Even though each project is within its configured limits, the organization limit of100 requests/min acts as a hard ceiling.
## Usage Tracking
Projects provide comprehensive usage tracking including:
- Monthly spend tracking: Monitor costs and spending patterns for each project
- Usage metrics: Track API calls, token usage, and request patterns
- Request logs: Access detailed logs for debugging and monitoring
Dashboard pages will automatically be filtered by your selected project. Access these insights by:
1. Selecting your project in the top left of the navigation bar
2. Navigate to the **Dashboard** to see your project-specific **Usage**, **Metrics**, and **Logs** pages
## Next Steps
- **Explore** the [Rate Limits](/docs/rate-limits) documentation for detailed rate limit configuration
- **Learn** about [Groq Libraries](/docs/libraries) to integrate Projects into your applications
- **Join** our [developer community](https://community.groq.com) for Projects tips and best practices
Ready to get started? Create your first project in the [Projects dashboard](https://console.groq.com/settings/projects) and begin organizing your Groq applications today.
---
## Compound: Page (mdx)
URL: https://console.groq.com/docs/agentic-tooling/groq/compound
No content to display.
---
## Compound Mini: Page (mdx)
URL: https://console.groq.com/docs/agentic-tooling/groq/compound-mini
No content to display.
---
## Compound Beta: Page (mdx)
URL: https://console.groq.com/docs/agentic-tooling/compound-beta
No content to display.
---
## Compound Beta Mini: Page (mdx)
URL: https://console.groq.com/docs/agentic-tooling/compound-beta-mini
No content to display.
---
## Agentic Tooling: Page (mdx)
URL: https://console.groq.com/docs/agentic-tooling
No content to clean.
---
## Overview Refresh: Page (mdx)
URL: https://console.groq.com/docs/overview-refresh
No content to display.
---
## Understanding and Optimizing Latency on Groq
URL: https://console.groq.com/docs/production-readiness/optimizing-latency
# Understanding and Optimizing Latency on Groq
## Overview
Latency is a critical factor when building production applications with Large Language Models (LLMs). This guide helps you understand, measure, and optimize latency across your Groq-powered applications, providing a comprehensive foundation for production deployment.
## Understanding Latency in LLM Applications
### Key Metrics in Groq Console
Your Groq Console [dashboard](/dashboard) contains pages for metrics, usage, logs, and more. When you view your Groq API request logs, you'll see important data regarding your API requests. The following are ones relevant to latency that we'll call out and define:
- **Time to First Token (TTFT)**: Time from API request sent to first token received from the model
- **Latency**: Total server time from API request to completion
- **Input Tokens**: Number of tokens provided to the model (e.g. system prompt, user query, assistant message), directly affecting TTFT
- **Output Tokens**: Number of tokens generated, impacting total latency
- **Tokens/Second**: Generation speed of model outputs
### The Complete Latency Picture
The users of the applications you build with APIs in general experience total latency that includes:
`User-Experienced Latency = Network Latency + Server-side Latency`
Server-side Latency is shown in the console.
**Important**: Groq Console metrics show server-side latency only. Client-side network latency measurement examples are provided in the Network Latency Analysis section below.
## How Input Size Affects TTFT
Input token count is the primary driver of TTFT performance. Understanding this relationship allows developers to optimize prompt design and context management for predictable latency characteristics.
### The Scaling Pattern
TTFT demonstrates linear scaling characteristics across input token ranges:
- **Minimal inputs (100 tokens)**: Consistently fast TTFT across all model sizes
- **Standard contexts (1K tokens)**: TTFT remains highly responsive
- **Large contexts (10K tokens)**: TTFT increases but remains competitive
- **Maximum contexts (100K tokens)**: TTFT increases to process all the input tokens
### Model Architecture Impact on TTFT
Model architecture fundamentally determines input processing characteristics, with parameter count, attention mechanisms, and specialized capabilities creating distinct performance profiles.
**Parameter Scaling Patterns**:
- **8B models**: Minimal TTFT variance across context lengths, optimal for latency-critical applications
- **32B models**: Linear TTFT scaling with manageable overhead for balanced workloads
- **70B and above**: Exponential TTFT increases at maximum context, requiring context management
**Architecture-Specific Considerations**:
- **Reasoning models**: Additional computational overhead for chain-of-thought processing increases baseline latency by10-40%
- **Mixture of Experts (MoE)**: Router computation adds fixed latency cost but maintains competitive TTFT scaling
- **Vision-language models**: Image encoding preprocessing significantly impacts TTFT independent of text token count
## Output Token Generation Dynamics
Sequential token generation represents the primary latency bottleneck in LLM inference. Unlike parallel input processing, each output token requires a complete forward pass through the model, creating linear scaling between output length and total generation time. Token generation demands significantly higher computational resources than input processing due to the autoregressive nature of transformer architectures.
## Infrastructure Optimization
### Network Latency Analysis
Network latency can significantly impact user-experienced performance. If client-measured total latency substantially exceeds server-side metrics returned in API responses, network optimization becomes critical.
**Diagnostic Approach**:
Compare client vs server latency
- Verify request routing and identify optimization opportunities
The `x-groq-region` header confirms which datacenter processed your request, enabling latency correlation with geographic proximity. This information helps you understand if your requests are being routed to the optimal datacenter for your location.
### Context Length Management
As shown above, TTFT scales with input length. End users can employ several prompting strategies to optimize context usage and reduce latency:
- **Prompt Chaining**: Decompose complex tasks into sequential subtasks where each prompt's output feeds the next.
- **Zero-Shot vs Few-Shot Selection**: For concise, well-defined tasks, zero-shot prompting ("Classify this sentiment") minimizes context length while leveraging model capabilities.
- **Strategic Context Prioritization**: Place critical information at prompt beginning or end, as models perform best with information in these positions.
## Groq's Processing Options
### Service Tier Architecture
Groq offers three service tiers that influence latency characteristics and processing behavior:
**On-Demand Processing**: For real-time applications requiring guaranteed processing, the standard API delivers:
- Industry-leading low latency with consistent performance
- Streaming support for immediate perceived response
- Controlled rate limits to ensure fairness and consistent experience
**Flex Processing**: [Flex Processing](/docs/flex-processing) optimizes for throughput with higher request volumes in exchange for occasional failures.
**Auto Processing**: Auto Processing uses on-demand rate limits initially, then automatically falls back to flex tier processing if those limits are exceeded.
### Batch Processing
[Batch Processing](/docs/batch) enables cost-effective asynchronous processing with a completion window, optimized for scenarios where immediate responses aren't required.
**Latency Considerations**: While batch processing trades immediate response for efficiency, understanding its latency characteristics helps optimize workload planning:
- **Submission latency**: Minimal overhead for batch job creation and validation
- **Queue processing**: Variable based on system load and batch size
- **Completion notification**: Webhook or polling-based status updates
- **Result retrieval**: Standard API latency for downloading completed outputs
## Streaming Implementation
### Server-Sent Events Best Practices
Implement streaming to improve perceived latency:
**Key Benefits**:
- Users see immediate response initiation
- Better user engagement and experience
- Error handling during generation
## Next Steps
Go over to our [Production-Ready Checklist](/docs/production-readiness/production-ready-checklist) and start the process of getting your AI applications scaled up to all your users with consistent performance.
Building something amazing? Need help optimizing? Our team is here to help you achieve production-ready performance at scale. Join our [developer community](https://community.groq.com)!
---
## Production-Ready Checklist for Applications on GroqCloud
URL: https://console.groq.com/docs/production-readiness/production-ready-checklist
# Production-Ready Checklist for Applications on GroqCloud
Deploying LLM applications to production involves critical decisions that directly impact user experience, operational costs, and system reliability. **This comprehensive checklist** guides you through the essential steps to launch and scale your Groq-powered application with confidence.
From selecting the optimal model architecture and configuring processing tiers to implementing robust monitoring and cost controls, each section addresses the common pitfalls that can derail even the most promising LLM applications.
## Pre-Launch Requirements
### Model Selection Strategy
* Document latency requirements for each use case
* Test quality/latency trade-offs across model sizes
* Reference the Model Selection Workflow in the Latency Optimization Guide
### Prompt Engineering Optimization
* Optimize prompts for token efficiency using context management strategies
* Implement prompt templates with variable injection
* Test structured output formats for consistency
* Document optimization results and token savings
### Processing Tier Configuration
* Reference the Processing Tier Selection Workflow in the Latency Optimization Guide
* Implement retry logic for Flex Processing failures
* Design callback handlers for Batch Processing
## Performance Optimization
### Streaming Implementation
* Test streaming vs non-streaming latency impact and user experience
* Configure appropriate timeout settings
* Handle streaming errors gracefully
### Network and Infrastructure
* Measure baseline network latency to Groq endpoints
* Configure timeouts based on expected response lengths
* Set up retry logic with exponential backoff
* Monitor API response headers for routing information
### Load Testing
* Test with realistic traffic patterns
* Validate linear scaling characteristics
* Test different processing tier behaviors
* Measure TTFT and generation speed under load
## Monitoring and Observability
### Key Metrics to Track
* **TTFT percentiles** (P50, P90, P95, P99)
* **End-to-end latency** (client to completion)
* **Token usage and costs** per endpoint
* **Error rates** by processing tier
* **Retry rates** for Flex Processing (less then5% target)
### Alerting Setup
* Set up alerts for latency degradation (>20% increase)
* Monitor error rates (alert if >0.5%)
* Track cost increases (alert if >20% above baseline)
* Use Groq Console for usage monitoring
## Cost Optimization
### Usage Monitoring
* Track token efficiency metrics
* Monitor cost per request across different models
* Set up cost alerting thresholds
* Analyze high-cost endpoints weekly
### Optimization Strategies
* Leverage smaller models where quality permits
* Use Batch Processing for non-urgent workloads (50% cost savings)
* Implement intelligent processing tier selection
* Optimize prompts to reduce input/output tokens
## Launch Readiness
### Final Validation
* Complete end-to-end testing with production-like loads
* Test all failure scenarios and error handling
* Validate cost projections against actual usage
* Verify monitoring and alerting systems
* Test graceful degradation strategies
### Go-Live Preparation
* Define gradual rollout plan
* Document rollback procedures
* Establish performance baselines
* Define success metrics and SLAs
## Post-Launch Optimization
### First Week
* Monitor all metrics closely
* Address any performance issues immediately
* Fine-tune timeout and retry settings
* Gather user feedback on response quality and speed
### First Month
* Review actual vs projected costs
* Optimize high-frequency prompts based on usage patterns
* Evaluate processing tier effectiveness
* A/B test prompt optimizations
* Document optimization wins and lessons learned
## Key Performance Targets
| Metric | Target | Alert Threshold |
|--------|--------|-----------------|
| TTFT P95 | Model-dependent* | >20% increase |
| Error Rate | <0.1% | >0.5% |
| Flex Retry Rate | <5% | >10% |
| Cost per1K tokens | Baseline | +20% |
*Reference [Artificial Analysis](https://artificialanalysis.ai/providers/groq) for current model benchmarks
## Resources
- [Groq API Documentation](/docs/api-reference)
- [Prompt Engineering Guide](/docs/prompting)
- [Understanding and Optimizing Latency on Groq](/docs/production-readiness/optimizing-latency)
- [Groq Developer Community](https://community.groq.com)
---
*This checklist should be customized based on your specific application requirements and updated based on production learnings.*
---
## Rate Limits
URL: https://console.groq.com/docs/rate-limits
# Rate Limits
Rate limits act as control measures to regulate how frequently users and applications can access our API within specified timeframes. These limits help ensure service stability, fair access, and protection against misuse so that we can serve reliable and fast inference for all.
## Understanding Rate Limits
Rate limits are measured in:
- **RPM:** Requests per minute
- **RPD:** Requests per day
- **TPM:** Tokens per minute
- **TPD:** Tokens per day
- **ASH:** Audio seconds per hour
- **ASD:** Audio seconds per day
Rate limits apply at the organization level, not individual users. You can hit any limit type depending on which threshold you reach first.
**Example:** Let's say your RPM =50 and your TPM =200K. If you were to send50 requests with only100 tokens within a minute, you would reach your limit even though you did not send200K tokens within those 50 requests.
## Rate Limits
The following is a high level summary and there may be exceptions to these limits. You can view the current, exact rate limits for your organization on the [limits page](/settings/limits) in your account settings.
## Rate Limit Headers
In addition to viewing your limits on your account's [limits](https://console.groq.com/settings/limits) page, you can also view rate limit information such as remaining requests and tokens in HTTP response headers as follows:
The following headers are set (values are illustrative):
## Handling Rate Limits
When you exceed rate limits, our API returns a `429 Too Many Requests` HTTP status code.
**Note**: `retry-after` is only set if you hit the rate limit and status code429 is returned. The other headers are always included.
## Need Higher Rate Limits?
If you need higher rate limits, you can [request them here](https://groq.com/self-serve-support).
---
## Prefilling: Example2 (py)
URL: https://console.groq.com/docs/prefilling/scripts/example2.py
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{
"role": "user",
"content": "Extract the title, author, published date, and description from the following book as a JSON object:\n\n\"The Great Gatsby\" is a novel by F. Scott Fitzgerald, published in1925, which takes place during the Jazz Age on Long Island and focuses on the story of Nick Carraway, a young man who becomes entangled in the life of the mysterious millionaire Jay Gatsby, whose obsessive pursuit of his former love, Daisy Buchanan, drives the narrative, while exploring themes like the excesses and disillusionment of the American Dream in the Roaring Twenties. \n"
},
{
"role": "assistant",
"content": "```json"
}
],
stream=True,
stop="```",
)
for chunk in completion:
print(chunk.choices[0].delta.content or "", end="")
---
## Prefilling: Example2 (json)
URL: https://console.groq.com/docs/prefilling/scripts/example2.json
{
"messages": [
{
"role": "user",
"content": "Extract the title, author, published date, and description from the following book as a JSON object:\n\n\"The Great Gatsby\" is a novel by F. Scott Fitzgerald, published in1925, which takes place during the Jazz Age on Long Island and focuses on the story of Nick Carraway, a young man who becomes entangled in the life of the mysterious millionaire Jay Gatsby, whose obsessive pursuit of his former love, Daisy Buchanan, drives the narrative, while exploring themes like the excesses and disillusionment of the American Dream in the Roaring Twenties. \n"
},
{
"role": "assistant",
"content": "```json"
}
],
"model": "llama-3.3-70b-versatile",
"stop": "```"
}
---
## Prefilling: Example1 (json)
URL: https://console.groq.com/docs/prefilling/scripts/example1.json
{
"messages": [
{
"role": "user",
"content": "Write a Python function to calculate the factorial of a number."
},
{
"role": "assistant",
"content": "```python"
}
],
"model": "llama-3.3-70b-versatile",
"stop": "```"
}
---
## Prefilling: Example1 (py)
URL: https://console.groq.com/docs/prefilling/scripts/example1.py
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{
"role": "user",
"content": "Write a Python function to calculate the factorial of a number."
},
{
"role": "assistant",
"content": "```python"
}
],
stream=True,
stop="```",
)
for chunk in completion:
print(chunk.choices[0].delta.content or "", end="")
---
## Prefilling: Example1 (js)
URL: https://console.groq.com/docs/prefilling/scripts/example1
import { Groq } from 'groq-sdk';
const groq = new Groq();
async function main() {
const chatCompletion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "Write a Python function to calculate the factorial of a number."
},
{
role: "assistant",
content: "```python"
}
],
stream: true,
model: "openai/gpt-oss-20b",
stop: "```"
});
for await (const chunk of chatCompletion) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
}
main();
---
## Prefilling: Example2 (js)
URL: https://console.groq.com/docs/prefilling/scripts/example2
import { Groq } from 'groq-sdk';
const groq = new Groq();
async function main() {
const chatCompletion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "Extract the title, author, published date, and description from the following book as a JSON object:\n\n\"The Great Gatsby\" is a novel by F. Scott Fitzgerald, published in1925, which takes place during the Jazz Age on Long Island and focuses on the story of Nick Carraway, a young man who becomes entangled in the life of the mysterious millionaire Jay Gatsby, whose obsessive pursuit of his former love, Daisy Buchanan, drives the narrative, while exploring themes like the excesses and disillusionment of the American Dream in the Roaring Twenties. \n"
},
{
role: "assistant",
content: "```json"
}
],
stream: true,
model: "openai/gpt-oss-20b",
stop: "```"
});
for await (const chunk of chatCompletion) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
}
main();
---
## Assistant Message Prefilling
URL: https://console.groq.com/docs/prefilling
# Assistant Message Prefilling
When using Groq API, you can have more control over your model output by prefilling `assistant` messages. This technique gives you the ability to direct any text-to-text model powered by Groq to:
- Skip unnecessary introductions or preambles
- Enforce specific output formats (e.g., JSON, XML)
- Maintain consistency in conversations
## How to Prefill Assistant Messages
To prefill, simply include your desired starting text in the `assistant` message and the model will generate a response starting with the `assistant` message.
**Note:** For some models, adding a newline after the prefill `assistant` message leads to better results.
**💡 Tip:** Use the stop sequence (`stop`) parameter in combination with prefilling for even more concise results. We recommend using this for generating code snippets.
## Example Usage
**Example1: Controlling output format for concise code snippets**
When trying the below code, first try a request without the prefill and then follow up by trying another request with the prefill included to see the difference!
**Example2: Extracting structured data from unstructured input**
---
## Changelog
URL: https://console.groq.com/docs/legacy-changelog
## Changelog
Welcome to the Groq Changelog, where you can follow ongoing developments to our API.
### April5,2025
- Shipped Meta's Llama4 models. See more on our [models page](/docs/models).
### April4,2025
- Shipped new console home page. See yours [here](/home).
### March26,2025
- Shipped text-to-speech models `playai-tts` and `playai-tts-arabic`. See more on our [models page](/docs/models).
### March13,2025
- Batch processing is50% off now until end of April2025! Learn how to submit a batch job [here](/docs/batch).
### March11,2025
- Added support for word level timestamps. See more in our [speech-to-text docs](/docs/speech-to-text).
- Added [llms.txt](/llms.txt) and [llms-full.txt](/llms-full.txt) files to make it easy for you to use our docs as context for models and AI agents.
### March5,2025
- Shipped `qwen-qwq-32b`. See more on our [models page](/docs/models).
### February25,2025
- Shipped `mistral-saba-24b`. See more on our [models page](/docs/models).
### February13,2025
- Shipped `qwen-2.5-coder-32b`. See more on our [models page](/docs/models).
### February10,2025
- Shipped `qwen-2.5-32b`. See more on our [models page](/docs/models).
- Shipped `deepseek-r1-distill-qwen-32b`. See more on our [models page](/docs/models).
### February5,2025
- Updated integrations to include [Agno](/docs/agno).
### February3,2025
- Shipped `deepseek-r1-distill-llama-70b-specdec`. See more on our [models page](/docs/models).
### January29,2025
- Added support for tool use and JSON mode for `deepseek-r1-distill-llama-70b`.
### January26,2025
- Released `deepseek-r1-distill-llama-70b`. See more on our [models page](/docs/models).
### January9,2025
- Added [batch API docs](/docs/batch).
### January7,2025
- Updated integrations pages to include quick start guides and additional resources.
- Updated [deprecations](/docs/deprecations) for Llama3.1 and Llama3.0 Tool Use models.
- Updated [speech docs](/docs/speech-text)
### December17,2024
- Updated integrations to include [CrewAI](/docs/crewai).
- Updated [deprecations page](/docs/deprecations) to include `gemma-7b-it`.
### December6,2024
- Released `llama-3.3-70b-versatile` and `llama-3.3-70b-specdec`. See more on our [models page](https://console.groq.com/docs/models).
### November15,2024
- Released `llama-3.1-70b-specdec` model for customers. See more on our [models page](https://console.groq.com/docs/models).
### October18,2024
- Deprecated `llava-v1.5-7b-4096-preview` model.
### October9,2024
- Released `whisper-large-v3-turbo` model. See more on our [models page](https://console.groq.com/docs/models).
- Released `llama-3.2-90b-vision-preview` model. See more on our [models page](https://console.groq.com/docs/models).
- Updated integrations to include [xRx](https://console.groq.com/docs/xrx).
### September27,2024
- Released `llama-3.2-11b-vision-preview` model. See more on our [models page](https://console.groq.com/docs/models).
- Updated Integrations to include [JigsawStack](https://console.groq.com/docs/jigsawstack).
### September25,2024
- Released `llama-3.2-1b-preview` model. See more on our [models page](https://console.groq.com/docs/models).
- Released `llama-3.2-3b-preview` model. See more on our [models page](https://console.groq.com/docs/models).
- Released `llama-3.2-90b-text-preview` model. See more on our [models page](https://console.groq.com/docs/models).
### September24,2024
- Revamped tool use documentation with in-depth explanations and code examples.
- Upgraded code box style and design.
### September3,2024
- Released `llava-v1.5-7b-4096-preview` model.
- Updated Integrations to include [E2B](https://console.groq.com/docs/e2b).
### August20,2024
- Released 'distil-whisper-large-v3-en' model. See more on our [models page](https://console.groq.com/docs/models).
### August8,2024
- Moved 'llama-3.1-405b-reasoning' from preview to offline due to overwhelming demand. Stay tuned for updates on availability!
### August1,2024
- Released 'llama-guard-3-8b' model. See more on our [models page](https://console.groq.com/docs/models).
### July23,2024
- Released Llama3.1 suite of models in preview ('llama-3.1-8b-instant', 'llama-3.1-70b-versatile', 'llama-3.1-405b-reasoning'). Learn more in [our blog post](https://groq.link/llama3405bblog).
### July16,2024
- Released 'Llama3-groq-70b-tool-use' and 'Llama3-groq-8b-tool-use' models in
preview, learn more in [our blog post](https://wow.groq.com/introducing-llama-3-groq-tool-use-models/).
### June24,2024
- Released 'whisper-large-v3' model.
### May8,2024
- Released 'whisper-large-v3' model as a private beta.
### April19,2024
- Released 'llama3-70b-8192' and 'llama3-8b-8192' models.
### April10,2024
- Upgraded Gemma to `gemma-1.1-7b-it`.
### April3,2024
- [Tool use](/docs/tool-use) released in beta.
### March28,2024
- Launched the [Groq API Cookbook](https://github.com/groq/groq-api-cookbook).
### March21,2024
- Added JSON mode and streaming to [Playground](https://console.groq.com/playground).
### March8,2024
- Released `gemma-7b-it` model.
### March6,2024
- Released [JSON mode](/docs/text-chat#json-mode-object-object), added `seed` parameter.
### Feb26,2024
- Released Python and Javascript LlamaIndex [integrations](/docs/llama-index).
### Feb21,2024
- Released Python and Javascript Langchain [integrations](/docs/langchain).
### Feb16,2024
- Beta launch
- Released GroqCloud [Javascript SDK](/docs/libraries).
### Feb7,2024
- Private Beta launch
- Released `llama2-70b` and `mixtral-8x7b` models.
- Released GroqCloud [Python SDK](/docs/libraries).
---
## MLflow + Groq: Open-Source GenAI Observability
URL: https://console.groq.com/docs/mlflow
## MLflow + Groq: Open-Source GenAI Observability
[MLflow](https://mlflow.org/) is an open-source platform developed by Databricks to assist in building better Generative AI (GenAI) applications.
MLflow provides a tracing feature that enhances model observability in your GenAI applications by capturing detailed information about the requests
you make to the models within your applications. Tracing provides a way to record the inputs, outputs, and metadata associated with each
intermediate step of a request, enabling you to easily pinpoint the source of bugs and unexpected behaviors.
The MLflow integration with Groq includes the following features:
- **Tracing Dashboards**: Monitor your interactions with models via Groq API with dashboards that include inputs, outputs, and metadata of spans
- **Automated Tracing**: A fully automated integration with Groq, which can be enabled by running `mlflow.groq.autolog()`
- **Easy Manual Trace Instrumentation**: Customize trace instrumentation through MLflow's high-level fluent APIs such as decorators, function wrappers and context managers
- **OpenTelemetry Compatibility**: MLflow Tracing supports exporting traces to an OpenTelemetry Collector, which can then be used to export traces to various backends such as Jaeger, Zipkin, and AWS X-Ray
- **Package and Deploy Agents**: Package and deploy your agents with Groq LLMs to an inference server with a variety of deployment targets
- **Evaluation**: Evaluate your agents using Groq LLMs with a wide range of metrics using a convenient API called `mlflow.evaluate()`
## Python Quick Start (2 minutes to hello world)
###1. Install the required packages:
```python
# The Groq integration is available in mlflow >=2.20.0
pip install mlflow groq
```
###2. Configure your Groq API key:
```bash
export GROQ_API_KEY="your-api-key"
```
###3. (Optional) Start your mlflow server
```bash
# This process is optional, but it is recommended to use MLflow tracking server for better visualization and additional features
mlflow server
```
###4. Create your first traced Groq application:
Let's enable MLflow auto-tracing with the Groq SDK. For more configurations, refer to the [documentation for `mlflow.groq`](https://mlflow.org/docs/latest/python_api/mlflow.groq.html).
```python
import mlflow
import groq
# Optional: Set a tracking URI and an experiment name if you have a tracking server
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Groq")
# Turn on auto tracing for Groq by calling mlflow.groq.autolog()
client = groq.Groq()
# Use the create method to create new message
message = client.chat.completions.create(
model="qwen-2.5-32b",
messages=[
{
"role": "user",
"content": "Explain the importance of low latency LLMs.",
}
],
)
print(message.choices[0].message.content)
```
###5. Visualize model usage on the MLflow tracing dashboard:
Now traces for your Groq usage are captured by MLflow! Let's get insights into our application's activities by visiting the MLflow tracking server
we set in Step4 above (`mlflow.set_tracking_uri("http://localhost:5000")`), which we can do by opening http://localhost:5000 in our browser.

## Additional Resources
For more configuration and detailed resources for managing your Groq applications with MLflow, see:
- [Getting Started with MLflow](https://mlflow.org/docs/latest/getting-started/index.html)
- [MLflow LLMs Overview](https://mlflow.org/docs/latest/llms/index.html)
- [MLflow Tracing for LLM Observability](https://mlflow.org/docs/latest/llms/tracing/index.html)
---
## Arize + Groq: Open-Source AI Observability
URL: https://console.groq.com/docs/arize
## Arize + Groq: Open-Source AI Observability
[Arize Phoenix](https://docs.arize.com/phoenix) developed by [Arize AI](https://arize.com/) is an open-source AI observability library that enables comprehensive tracing and monitoring for your AI
applications. By integrating Arize's observability tools with your Groq-powered applications, you can gain deep insights into your LLM worklflow's performance and behavior with features including:
- **Automatic Tracing:** Capture detailed metrics about LLM calls, including latency, token usage, and exceptions
- **Real-time Monitoring:** Track application performance and identify bottlenecks in production
- **Evaluation Framework:** Utilize pre-built templates to assess LLM performance
- **Prompt Management:** Easily iterate on prompts and test changes against your data
### Python Quick Start (3 minutes to hello world)
####1. Install the required packages:
```bash
pip install arize-phoenix-otel openinference-instrumentation-groq groq
```
####2. Sign up for an [Arize Phoenix account](https://app.phoenix.arize.com).
####2. Configure your Groq and Arize Phoenix API keys:
```bash
export GROQ_API_KEY="your-groq-api-key"
export PHOENIX_API_KEY="your-phoenix-api-key"
```
####3. (Optional) [Create a new project](https://app.phoenix.arize.com/projects) or use the "default" project as your `project_name` below.
####4. Create your first traced Groq application:
In Arize Phoenix, **traces** capture the complete journey of an LLM request through your application, while **spans** represent individual operations within that trace. The instrumentation
automatically captures important metrics and metadata.
```python
import os
from phoenix.otel import register
from openinference.instrumentation.groq import GroqInstrumentor
from groq import Groq
# Configure environment variables for Phoenix
os.environ["OTEL_EXPORTER_OTLP_HEADERS"] = f"api_key={os.getenv('PHOENIX_API_KEY')}"
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('PHOENIX_API_KEY')}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"
# Configure Phoenix tracer
tracer_provider = register(
project_name="default",
endpoint="https://app.phoenix.arize.com/v1/traces",
)
# Initialize Groq instrumentation
GroqInstrumentor().instrument(tracer_provider=tracer_provider)
# Create Groq client
client = Groq(api_key=os.getenv("GROQ_API_KEY"))
# Make an instrumented LLM call
chat_completion = client.chat.completions.create(
messages=[{
"role": "user",
"content": "Explain the importance of AI observability"
}],
model="llama-3.3-70b-versatile",
)
print(chat_completion.choices[0].message.content)
```
Running the above code will create an automatically instrumented Groq application! The traces will be available in your Phoenix dashboard within the `default` project, showing
detailed information about:
- **Application Latency:** Identify slow components and bottlenecks
- **Token Usage:** Track token consumption across different operations
- **Runtime Exceptions:** Capture and analyze errors and rate limits
- **LLM Parameters:** Monitor temperature, system prompts, and other settings
- **Response Analysis:** Examine LLM outputs and their characteristics
**Challenge**: Update an existing Groq-powered application you've built to add Arize Phoenix tracing!
For more detailed documentation and resources on building observable LLM applications with Groq and Arize, see:
- [Official Documentation: Groq Integration Guide](https://docs.arize.com/phoenix/tracing/integrations-tracing/groq)
- [Blog: Tracing with Groq](https://arize.com/blog/tracing-groq/)
- [Webinar: Tracing and Evaluating LLM Apps with Groq and Arize Phoenix](https://youtu.be/KjtrILr6JZI?si=iX8Udo-EYsK2JOvF)
---
## Structured Outputs: Step2 Example (js)
URL: https://console.groq.com/docs/structured-outputs/scripts/step2-example
import Groq from "groq-sdk";
const groq = new Groq();
const response = await groq.chat.completions.create({
model: "moonshotai/kimi-k2-instruct-0905",
messages: [
{ role: "system", content: "You are a helpful math tutor. Guide the user through the solution step by step." },
{ role: "user", content: "how can I solve8x +7 = -23" }
],
response_format: {
type: "json_schema",
json_schema: {
name: "math_response",
schema: {
type: "object",
properties: {
steps: {
type: "array",
items: {
type: "object",
properties: {
explanation: { type: "string" },
output: { type: "string" }
},
required: ["explanation", "output"],
additionalProperties: false
}
},
final_answer: { type: "string" }
},
required: ["steps", "final_answer"],
additionalProperties: false
}
}
}
});
const result = JSON.parse(response.choices[0].message.content || "{}");
console.log(result);
---
## Structured Outputs: Support Ticket Zod.doc (ts)
URL: https://console.groq.com/docs/structured-outputs/scripts/support-ticket-zod.doc
```javascript
import Groq from "groq-sdk";
import { z } from "zod";
const groq = new Groq();
const supportTicketSchema = z.object({
category: z.enum(["api", "billing", "account", "bug", "feature_request", "integration", "security", "performance"]),
priority: z.enum(["low", "medium", "high", "critical"]),
urgency_score: z.number(),
customer_info: z.object({
name: z.string(),
company: z.string().optional(),
tier: z.enum(["free", "paid", "enterprise", "trial"])
}),
technical_details: z.array(z.object({
component: z.string(),
error_code: z.string().optional(),
description: z.string()
})),
keywords: z.array(z.string()),
requires_escalation: z.boolean(),
estimated_resolution_hours: z.number(),
follow_up_date: z.string().datetime().optional(),
summary: z.string()
});
type SupportTicket = z.infer;
const response = await groq.chat.completions.create({
model: "moonshotai/kimi-k2-instruct-0905",
messages: [
{
role: "system",
content: `You are a customer support ticket classifier for SaaS companies.
Analyze support tickets and categorize them for efficient routing and resolution.
Output JSON only using the schema provided.`,
},
{
role: "user",
content: `Hello! I love your product and have been using it for6 months.
I was wondering if you could add a dark mode feature to the dashboard?
Many of our team members work late hours and would really appreciate this.
Also, it would be great to have keyboard shortcuts for common actions.
Not urgent, but would be a nice enhancement!
Best, Mike from StartupXYZ`
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "support_ticket_classification",
schema: z.toJSONSchema(supportTicketSchema)
}
}
});
const rawResult = JSON.parse(response.choices[0].message.content || "{}");
const result = supportTicketSchema.parse(rawResult);
console.log(result);
```
---
## Structured Outputs: Sql Query Generation (js)
URL: https://console.groq.com/docs/structured-outputs/scripts/sql-query-generation
```javascript
import Groq from "groq-sdk";
const groq = new Groq();
const response = await groq.chat.completions.create({
model: "moonshotai/kimi-k2-instruct-0905",
messages: [
{
role: "system",
content: "You are a SQL expert. Generate structured SQL queries from natural language descriptions with proper syntax validation and metadata.",
},
{ role: "user", content: "Find all customers who made orders over $500 in the last30 days, show their name, email, and total order amount" },
],
response_format: {
type: "json_schema",
json_schema: {
name: "sql_query_generation",
schema: {
type: "object",
properties: {
query: { type: "string" },
query_type: {
type: "string",
enum: ["SELECT", "INSERT", "UPDATE", "DELETE", "CREATE", "ALTER", "DROP"]
},
tables_used: {
type: "array",
items: { type: "string" }
},
estimated_complexity: {
type: "string",
enum: ["low", "medium", "high"]
},
execution_notes: {
type: "array",
items: { type: "string" }
},
validation_status: {
type: "object",
properties: {
is_valid: { type: "boolean" },
syntax_errors: {
type: "array",
items: { type: "string" }
}
},
required: ["is_valid", "syntax_errors"],
additionalProperties: false
}
},
required: ["query", "query_type", "tables_used", "estimated_complexity", "execution_notes", "validation_status"],
additionalProperties: false
}
}
}
});
const result = JSON.parse(response.choices[0].message.content || "{}");
console.log(result);
```
---
## Structured Outputs: Api Response Validation Response (json)
URL: https://console.groq.com/docs/structured-outputs/scripts/api-response-validation-response.json
```
{
"validation_result": {
"is_valid": false,
"status_code": 400,
"error_count": 2
},
"field_validations": [
{
"field_name": "user_id",
"field_type": "string",
"is_valid": true,
"error_message": "",
"expected_format": "string"
},
{
"field_name": "email",
"field_type": "string",
"is_valid": false,
"error_message": "Invalid email format",
"expected_format": "valid email address (e.g., user@example.com)"
},
{
"field_name": "created_at",
"field_type": "string",
"is_valid": true,
"error_message": "",
"expected_format": "ISO8601 datetime string"
},
{
"field_name": "status",
"field_type": "string",
"is_valid": true,
"error_message": "",
"expected_format": "string"
},
{
"field_name": "profile",
"field_type": "object",
"is_valid": true,
"error_message": "",
"expected_format": "object"
}
],
"data_quality_score": 0.7,
"suggested_fixes": [
"Fix email format validation to ensure proper email structure",
"Add proper error handling structure to response",
"Include metadata fields like timestamp and request_id",
"Add success/failure status indicators",
"Implement standardized error format"
],
"compliance_check": {
"follows_rest_standards": false,
"has_proper_error_handling": false,
"includes_metadata": false
},
"standardized_response": {
"success": false,
"data": {
"user_id": "12345",
"email": "invalid-email",
"created_at": "2024-01-15T10:30:00Z",
"status": "active",
"profile": {
"name": "John Doe",
"age": 25
}
},
"errors": [
"Invalid email format: invalid-email",
"Response lacks proper error handling structure"
],
"metadata": {
"timestamp": "2024-01-15T10:30:00Z",
"request_id": "req_12345",
"version": "1.0"
}
}
}
```
---
## Structured Outputs: Step2 Example (py)
URL: https://console.groq.com/docs/structured-outputs/scripts/step2-example.py
from groq import Groq
import json
client = Groq()
response = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[
{"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
{"role": "user", "content": "how can I solve8x +7 = -23"}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "math_response",
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {"type": "string"},
"output": {"type": "string"}
},
"required": ["explanation", "output"],
"additionalProperties": False
}
},
"final_answer": {"type": "string"}
},
"required": ["steps", "final_answer"],
"additionalProperties": False
}
}
)
result = json.loads(response.choices[0].message.content)
print(json.dumps(result, indent=2))
---
## Structured Outputs: Product Review (py)
URL: https://console.groq.com/docs/structured-outputs/scripts/product-review.py
from groq import Groq
from pydantic import BaseModel
from typing import Literal
import json
client = Groq()
class ProductReview(BaseModel):
product_name: str
rating: float
sentiment: Literal["positive", "negative", "neutral"]
key_features: list[str]
response = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[
{"role": "system", "content": "Extract product review information from the text."},
{
"role": "user",
"content": "I bought the UltraSound Headphones last week and I'm really impressed! The noise cancellation is amazing and the battery lasts all day. Sound quality is crisp and clear. I'd give it4.5 out of5 stars.",
},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "product_review",
"schema": ProductReview.model_json_schema()
}
}
)
review = ProductReview.model_validate(json.loads(response.choices[0].message.content))
print(json.dumps(review.model_dump(), indent=2))
---
## Structured Outputs: Support Ticket Pydantic (py)
URL: https://console.groq.com/docs/structured-outputs/scripts/support-ticket-pydantic.py
from groq import Groq
from pydantic import BaseModel, Field
from typing import List, Optional, Literal
from enum import Enum
import json
client = Groq()
class SupportCategory(str, Enum):
API = "api"
BILLING = "billing"
ACCOUNT = "account"
BUG = "bug"
FEATURE_REQUEST = "feature_request"
INTEGRATION = "integration"
SECURITY = "security"
PERFORMANCE = "performance"
class Priority(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class CustomerTier(str, Enum):
FREE = "free"
PAID = "paid"
ENTERPRISE = "enterprise"
TRIAL = "trial"
class CustomerInfo(BaseModel):
name: str
company: Optional[str] = None
tier: CustomerTier
class TechnicalDetail(BaseModel):
component: str
error_code: Optional[str] = None
description: str
class SupportTicket(BaseModel):
category: SupportCategory
priority: Priority
urgency_score: float
customer_info: CustomerInfo
technical_details: List[TechnicalDetail]
keywords: List[str]
requires_escalation: bool
estimated_resolution_hours: float
follow_up_date: Optional[str] = Field(None, description="ISO datetime string")
summary: str
response = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[
{
"role": "system",
"content": """You are a customer support ticket classifier for SaaS companies.
Analyze support tickets and categorize them for efficient routing and resolution.
Output JSON only using the schema provided.""",
},
{
"role": "user",
"content": """Hello! I love your product and have been using it for6 months.
I was wondering if you could add a dark mode feature to the dashboard?
Many of our team members work late hours and would really appreciate this.
Also, it would be great to have keyboard shortcuts for common actions.
Not urgent, but would be a nice enhancement!
Best, Mike from StartupXYZ"""
},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "support_ticket_classification",
"schema": SupportTicket.model_json_schema()
}
}
)
raw_result = json.loads(response.choices[0].message.content or "{}")
result = SupportTicket.model_validate(raw_result)
print(result.model_dump_json(indent=2))
---
## Structured Outputs: Task Creation Schema (json)
URL: https://console.groq.com/docs/structured-outputs/scripts/task-creation-schema.json
{
"name": "create_task",
"description": "Creates a new task in the project management system",
"strict": true,
"parameters": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The task title or summary"
},
"priority": {
"type": "string",
"description": "Task priority level",
"enum": ["low", "medium", "high", "urgent"]
}
},
"additionalProperties": false,
"required": ["title", "priority"]
}
}
---
## Structured Outputs: Project Milestones Schema (json)
URL: https://console.groq.com/docs/structured-outputs/scripts/project-milestones-schema.json
{
"type": "object",
"properties": {
"milestones": {
"type": "array",
"items": {
"$ref": "#/$defs/milestone"
}
},
"project_status": {
"type": "string",
"enum": ["planning", "in_progress", "completed", "on_hold"]
}
},
"$defs": {
"milestone": {
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "Milestone name"
},
"deadline": {
"type": "string",
"description": "Due date in ISO format"
},
"completed": {
"type": "boolean"
}
},
"required": ["title", "deadline", "completed"],
"additionalProperties": false
}
},
"required": ["milestones", "project_status"],
"additionalProperties": false
}
---
## Structured Outputs: File System Schema (json)
URL: https://console.groq.com/docs/structured-outputs/scripts/file-system-schema.json
{
"type": "object",
"properties": {
"file_system": {
"$ref": "#/$defs/file_node"
}
},
"$defs": {
"file_node": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "File or directory name"
},
"type": {
"type": "string",
"enum": ["file", "directory"]
},
"size": {
"type": "number",
"description": "Size in bytes (0 for directories)"
},
"children": {
"anyOf": [
{
"type": "array",
"items": {
"$ref": "#/$defs/file_node"
}
},
{
"type": "null"
}
]
}
},
"additionalProperties": false,
"required": ["name", "type", "size", "children"]
}
},
"additionalProperties": false,
"required": ["file_system"]
}
---
## Structured Outputs: Json Object Mode (js)
URL: https://console.groq.com/docs/structured-outputs/scripts/json-object-mode
import { Groq } from "groq-sdk";
const groq = new Groq();
async function main() {
const response = await groq.chat.completions.create({
model: "openai/gpt-oss-20b",
messages: [
{
role: "system",
content: `You are a data analysis API that performs sentiment analysis on text.
Respond only with JSON using this format:
{
"sentiment_analysis": {
"sentiment": "positive|negative|neutral",
"confidence_score":0.95,
"key_phrases": [
{
"phrase": "detected key phrase",
"sentiment": "positive|negative|neutral"
}
],
"summary": "One sentence summary of the overall sentiment"
}
}`
},
{ role: "user", content: "Analyze the sentiment of this customer review: 'I absolutely love this product! The quality exceeded my expectations, though shipping took longer than expected.'" }
],
response_format: { type: "json_object" }
});
const result = JSON.parse(response.choices[0].message.content || "{}");
console.log(result);
}
main();
---
## Structured Outputs: Organization Chart Schema (json)
URL: https://console.groq.com/docs/structured-outputs/scripts/organization-chart-schema.json
{
"name": "organization_chart",
"description": "Company organizational structure",
"strict": true,
"schema": {
"type": "object",
"properties": {
"employee_id": {
"type": "string",
"description": "Unique employee identifier"
},
"name": {
"type": "string",
"description": "Employee full name"
},
"position": {
"type": "string",
"description": "Job title or position",
"enum": ["CEO", "Manager", "Developer", "Designer", "Analyst", "Intern"]
},
"direct_reports": {
"type": "array",
"description": "Employees reporting to this person",
"items": {
"$ref": "#"
}
},
"contact_info": {
"type": "array",
"description": "Contact information for the employee",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"description": "Type of contact info",
"enum": ["email", "phone", "slack"]
},
"value": {
"type": "string",
"description": "The contact value"
}
},
"additionalProperties": false,
"required": ["type", "value"]
}
}
},
"required": [
"employee_id",
"name",
"position",
"direct_reports",
"contact_info"
],
"additionalProperties": false
}
---
## Structured Outputs: Email Classification (js)
URL: https://console.groq.com/docs/structured-outputs/scripts/email-classification
```javascript
import Groq from "groq-sdk";
const groq = new Groq();
const response = await groq.chat.completions.create({
model: "moonshotai/kimi-k2-instruct-0905",
messages: [
{
role: "system",
content: "You are an email classification expert. Classify emails into structured categories with confidence scores, priority levels, and suggested actions.",
},
{
role: "user",
content: "Subject: URGENT: Server downtime affecting production\n\nHi Team,\n\nOur main production server went down at2:30 PM EST. Customer-facing services are currently unavailable. We need immediate action to restore services. Please join the emergency call.\n\nBest regards,\nDevOps Team"
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "email_classification",
schema: {
type: "object",
properties: {
category: {
type: "string",
enum: ["urgent", "support", "sales", "marketing", "internal", "spam", "notification"]
},
priority: {
type: "string",
enum: ["low", "medium", "high", "critical"]
},
confidence_score: {
type: "number",
minimum:0,
maximum:1
},
sentiment: {
type: "string",
enum: ["positive", "negative", "neutral"]
},
key_entities: {
type: "array",
items: {
type: "object",
properties: {
entity: { type: "string" },
type: {
type: "string",
enum: ["person", "organization", "location", "datetime", "system", "product"]
}
},
required: ["entity", "type"],
additionalProperties: false
}
},
suggested_actions: {
type: "array",
items: { type: "string" }
},
requires_immediate_attention: { type: "boolean" },
estimated_response_time: { type: "string" }
},
required: ["category", "priority", "confidence_score", "sentiment", "key_entities", "suggested_actions", "requires_immediate_attention", "estimated_response_time"],
additionalProperties: false
}
}
}
});
const result = JSON.parse(response.choices[0].message.content || "{}");
console.log(result);
```
---
## Structured Outputs: Email Classification Response (json)
URL: https://console.groq.com/docs/structured-outputs/scripts/email-classification-response.json
```
{
"category": "urgent",
"priority": "critical",
"confidence_score":0.95,
"sentiment": "negative",
"key_entities": [
{
"entity": "production server",
"type": "system"
},
{
"entity": "2:30 PM EST",
"type": "datetime"
},
{
"entity": "DevOps Team",
"type": "organization"
},
{
"entity": "customer-facing services",
"type": "system"
}
],
"suggested_actions": [
"Join emergency call immediately",
"Escalate to senior DevOps team",
"Activate incident response protocol",
"Prepare customer communication",
"Monitor service restoration progress"
],
"requires_immediate_attention": true,
"estimated_response_time": "immediate"
}
```
---
## Structured Outputs: Product Review (js)
URL: https://console.groq.com/docs/structured-outputs/scripts/product-review
import Groq from "groq-sdk";
const groq = new Groq();
const response = await groq.chat.completions.create({
model: "moonshotai/kimi-k2-instruct-0905",
messages: [
{ role: "system", content: "Extract product review information from the text." },
{
role: "user",
content: "I bought the UltraSound Headphones last week and I'm really impressed! The noise cancellation is amazing and the battery lasts all day. Sound quality is crisp and clear. I'd give it4.5 out of5 stars.",
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "product_review",
schema: {
type: "object",
properties: {
product_name: { type: "string" },
rating: { type: "number" },
sentiment: {
type: "string",
enum: ["positive", "negative", "neutral"]
},
key_features: {
type: "array",
items: { type: "string" }
}
},
required: ["product_name", "rating", "sentiment", "key_features"],
additionalProperties: false
}
}
}
});
const result = JSON.parse(response.choices[0].message.content || "{}");
console.log(result);
---
## Structured Outputs: Email Classification (py)
URL: https://console.groq.com/docs/structured-outputs/scripts/email-classification.py
from groq import Groq
from pydantic import BaseModel
import json
client = Groq()
class KeyEntity(BaseModel):
entity: str
type: str
class EmailClassification(BaseModel):
category: str
priority: str
confidence_score: float
sentiment: str
key_entities: list[KeyEntity]
suggested_actions: list[str]
requires_immediate_attention: bool
estimated_response_time: str
response = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[
{
"role": "system",
"content": "You are an email classification expert. Classify emails into structured categories with confidence scores, priority levels, and suggested actions.",
},
{"role": "user", "content": "Subject: URGENT: Server downtime affecting production\\n\\nHi Team,\\n\\nOur main production server went down at2:30 PM EST. Customer-facing services are currently unavailable. We need immediate action to restore services. Please join the emergency call.\\n\\nBest regards,\\nDevOps Team"},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "email_classification",
"schema": EmailClassification.model_json_schema()
}
}
)
email_classification = EmailClassification.model_validate(json.loads(response.choices[0].message.content))
print(json.dumps(email_classification.model_dump(), indent=2))
---
## Structured Outputs: Api Response Validation (py)
URL: https://console.groq.com/docs/structured-outputs/scripts/api-response-validation.py
from groq import Groq
from pydantic import BaseModel
import json
client = Groq()
class ValidationResult(BaseModel):
is_valid: bool
status_code: int
error_count: int
class FieldValidation(BaseModel):
field_name: str
field_type: str
is_valid: bool
error_message: str
expected_format: str
class ComplianceCheck(BaseModel):
follows_rest_standards: bool
has_proper_error_handling: bool
includes_metadata: bool
class Metadata(BaseModel):
timestamp: str
request_id: str
version: str
class StandardizedResponse(BaseModel):
success: bool
data: dict
errors: list[str]
metadata: Metadata
class APIResponseValidation(BaseModel):
validation_result: ValidationResult
field_validations: list[FieldValidation]
data_quality_score: float
suggested_fixes: list[str]
compliance_check: ComplianceCheck
standardized_response: StandardizedResponse
response = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[
{
"role": "system",
"content": "You are an API response validation expert. Validate and structure API responses with error handling, status codes, and standardized data formats for reliable integration.",
},
{"role": "user", "content": "Validate this API response: {\"user_id\": \"12345\", \"email\": \"invalid-email\", \"created_at\": \"2024-01-15T10:30:00Z\", \"status\": \"active\", \"profile\": {\"name\": \"John Doe\", \"age\":25}}"},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "api_response_validation",
"schema": APIResponseValidation.model_json_schema()
}
}
)
api_response_validation = APIResponseValidation.model_validate(json.loads(response.choices[0].message.content))
print(json.dumps(api_response_validation.model_dump(), indent=2))
---
## Structured Outputs: Api Response Validation (js)
URL: https://console.groq.com/docs/structured-outputs/scripts/api-response-validation
```javascript
import Groq from "groq-sdk";
const groq = new Groq();
const response = await groq.chat.completions.create({
model: "moonshotai/kimi-k2-instruct-0905",
messages: [
{
role: "system",
content: "You are an API response validation expert. Validate and structure API responses with error handling, status codes, and standardized data formats for reliable integration.",
},
{
role: "user",
content: "Validate this API response: {\"user_id\": \"12345\", \"email\": \"invalid-email\", \"created_at\": \"2024-01-15T10:30:00Z\", \"status\": \"active\", \"profile\": {\"name\": \"John Doe\", \"age\":25}}"
},
],
response_format: {
type: "json_schema",
json_schema: {
name: "api_response_validation",
schema: {
type: "object",
properties: {
validation_result: {
type: "object",
properties: {
is_valid: { type: "boolean" },
status_code: { type: "integer" },
error_count: { type: "integer" }
},
required: ["is_valid", "status_code", "error_count"],
additionalProperties: false
},
field_validations: {
type: "array",
items: {
type: "object",
properties: {
field_name: { type: "string" },
field_type: { type: "string" },
is_valid: { type: "boolean" },
error_message: { type: "string" },
expected_format: { type: "string" }
},
required: ["field_name", "field_type", "is_valid", "error_message", "expected_format"],
additionalProperties: false
}
},
data_quality_score: {
type: "number",
minimum:0,
maximum:1
},
suggested_fixes: {
type: "array",
items: { type: "string" }
},
compliance_check: {
type: "object",
properties: {
follows_rest_standards: { type: "boolean" },
has_proper_error_handling: { type: "boolean" },
includes_metadata: { type: "boolean" }
},
required: ["follows_rest_standards", "has_proper_error_handling", "includes_metadata"],
additionalProperties: false
},
standardized_response: {
type: "object",
properties: {
success: { type: "boolean" },
data: { type: "object" },
errors: {
type: "array",
items: { type: "string" }
},
metadata: {
type: "object",
properties: {
timestamp: { type: "string" },
request_id: { type: "string" },
version: { type: "string" }
},
required: ["timestamp", "request_id", "version"],
additionalProperties: false
}
},
required: ["success", "data", "errors", "metadata"],
additionalProperties: false
}
},
required: ["validation_result", "field_validations", "data_quality_score", "suggested_fixes", "compliance_check", "standardized_response"],
additionalProperties: false
}
}
}
});
const result = JSON.parse(response.choices[0].message.content || "{}");
console.log(result);
```
---
## Structured Outputs: Json Object Mode (py)
URL: https://console.groq.com/docs/structured-outputs/scripts/json-object-mode.py
from groq import Groq
import json
client = Groq()
def main():
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{
"role": "system",
"content": """You are a data analysis API that performs sentiment analysis on text.
Respond only with JSON using this format:
{
"sentiment_analysis": {
"sentiment": "positive|negative|neutral",
"confidence_score":0.95,
"key_phrases": [
{
"phrase": "detected key phrase",
"sentiment": "positive|negative|neutral"
}
],
"summary": "One sentence summary of the overall sentiment"
}
}"""
},
{
"role": "user",
"content": "Analyze the sentiment of this customer review: 'I absolutely love this product! The quality exceeded my expectations, though shipping took longer than expected.'"
}
],
response_format={"type": "json_object"}
)
result = json.loads(response.choices[0].message.content)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()
---
## Structured Outputs: Sql Query Generation (py)
URL: https://console.groq.com/docs/structured-outputs/scripts/sql-query-generation.py
from groq import Groq
from pydantic import BaseModel
import json
client = Groq()
class ValidationStatus(BaseModel):
is_valid: bool
syntax_errors: list[str]
class SQLQueryGeneration(BaseModel):
query: str
query_type: str
tables_used: list[str]
estimated_complexity: str
execution_notes: list[str]
validation_status: ValidationStatus
response = client.chat.completions.create(
model="moonshotai/kimi-k2-instruct-0905",
messages=[
{
"role": "system",
"content": "You are a SQL expert. Generate structured SQL queries from natural language descriptions with proper syntax validation and metadata.",
},
{"role": "user", "content": "Find all customers who made orders over $500 in the last30 days, show their name, email, and total order amount"},
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "sql_query_generation",
"schema": SQLQueryGeneration.model_json_schema()
}
}
)
sql_query_generation = SQLQueryGeneration.model_validate(json.loads(response.choices[0].message.content))
print(json.dumps(sql_query_generation.model_dump(), indent=2))
---
## Structured Outputs: Appointment Booking Schema (json)
URL: https://console.groq.com/docs/structured-outputs/scripts/appointment-booking-schema.json
{
"name": "book_appointment",
"description": "Books a medical appointment",
"strict": true,
"schema": {
"type": "object",
"properties": {
"patient_name": {
"type": "string",
"description": "Full name of the patient"
},
"appointment_type": {
"type": "string",
"description": "Type of medical appointment",
"enum": ["consultation", "checkup", "surgery", "emergency"]
}
},
"additionalProperties": false,
"required": ["patient_name", "appointment_type"]
}
}
---
## Structured Outputs: Payment Method Schema (json)
URL: https://console.groq.com/docs/structured-outputs/scripts/payment-method-schema.json
{
"type": "object",
"properties": {
"payment_method": {
"anyOf": [
{
"type": "object",
"description": "Credit card payment information",
"properties": {
"card_number": {
"type": "string",
"description": "The credit card number"
},
"expiry_date": {
"type": "string",
"description": "Card expiration date in MM/YY format"
},
"cvv": {
"type": "string",
"description": "Card security code"
}
},
"additionalProperties": false,
"required": ["card_number", "expiry_date", "cvv"]
},
{
"type": "object",
"description": "Bank transfer payment information",
"properties": {
"account_number": {
"type": "string",
"description": "Bank account number"
},
"routing_number": {
"type": "string",
"description": "Bank routing number"
},
"bank_name": {
"type": "string",
"description": "Name of the bank"
}
},
"additionalProperties": false,
"required": ["account_number", "routing_number", "bank_name"]
}
]
}
},
"additionalProperties": false,
"required": ["payment_method"]
}
---
## Structured Outputs
URL: https://console.groq.com/docs/structured-outputs
# Structured Outputs
Guarantee model responses strictly conform to your JSON schema for reliable, type-safe data structures.
## Introduction
Structured Outputs is a feature that makes your model responses strictly conform to your provided [JSON Schema](https://json-schema.org/overview/what-is-jsonschema) or throws an error if the model cannot produce a compliant response. The endpoint provides customers with the ability to obtain reliable data structures.
This feature's performance is dependent on the model's ability to produce a valid answer that matches your schema. If the model fails to generate a conforming response, the endpoint will return an error rather than an invalid or incomplete result.
Key benefits:
1. **Binary output:** Either returns valid JSON Schema-compliant output or throws an error
2. **Type-safe responses:** No need to validate or retry malformed outputs
3. **Programmatic refusal detection:** Detect safety-based model refusals programmatically
4. **Simplified prompting:** No complex prompts needed for consistent formatting
In addition to supporting Structured Outputs in our API, our SDKs also enable you to easily define your schemas with [Pydantic](https://docs.pydantic.dev/latest/) and [Zod](https://zod.dev/) to ensure further type safety. The examples below show how to extract structured information from unstructured text.
## Supported models
Structured Outputs is available with the following models:
| Model ID | Model |
|---------------------------------|--------------------------------|
| openai/gpt-oss-20b | [GPT-OSS20B](/docs/model/openai/gpt-oss-20b)
| openai/gpt-oss-120b | [GPT-OSS120B](/docs/model/openai/gpt-oss-120b)
| moonshotai/kimi-k2-instruct-0905 | [Kimi K2 Instruct](/docs/model/moonshotai/kimi-k2-instruct-0905)
| meta-llama/llama-4-maverick-17b-128e-instruct | [Llama4 Maverick](/docs/model/meta-llama/llama-4-maverick-17b-128e-instruct)
| meta-llama/llama-4-scout-17b-16e-instruct | [Llama4 Scout](/docs/model/meta-llama/llama-4-scout-17b-16e-instruct)
For all other models, you can use [JSON Object Mode](#json-object-mode) to get a valid JSON object, though it may not match your schema.
**Note:** [streaming](/docs/text-chat#streaming-a-chat-completion) and [tool use](/docs/tool-use) are not currently supported with Structured Outputs.
### Getting a structured response from unstructured text
### SQL Query Generation
You can generate structured SQL queries from natural language descriptions, helping ensure proper syntax and including metadata about the query structure.
**Example Output**
```json
{
"query": "SELECT c.name, c.email, SUM(o.total_amount) as total_order_amount FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.order_date >= DATE_SUB(NOW(), INTERVAL30 DAY) AND o.total_amount >500 GROUP BY c.customer_id, c.name, c.email ORDER BY total_order_amount DESC",
"query_type": "SELECT",
"tables_used": ["customers", "orders"],
"estimated_complexity": "medium",
"execution_notes": [
"Query uses JOIN to connect customers and orders tables",
"DATE_SUB function calculates30 days ago from current date",
"GROUP BY aggregates orders per customer",
"Results ordered by total order amount descending"
],
"validation_status": {
"is_valid": true,
"syntax_errors": []
}
}
```
### Email Classification
You can classify emails into structured categories with confidence scores, priority levels, and suggested actions.
**Example Output**
```json
{
"category": "urgent",
"priority": "critical",
"confidence_score":0.95,
"sentiment": "negative",
"key_entities": [
{
"entity": "production server",
"type": "system"
},
{
"entity": "2:30 PM EST",
"type": "datetime"
},
{
"entity": "DevOps Team",
"type": "organization"
},
{
"entity": "customer-facing services",
"type": "system"
}
],
"suggested_actions": [
"Join emergency call immediately",
"Escalate to senior DevOps team",
"Activate incident response protocol",
"Prepare customer communication",
"Monitor service restoration progress"
],
"requires_immediate_attention": true,
"estimated_response_time": "immediate"
}
```
### API Response Validation
You can validate and structure API responses with error handling, status codes, and standardized data formats for reliable integration.
**Example Output**
```json
{
"validation_result": {
"is_valid": false,
"status_code":400,
"error_count":2
},
"field_validations": [
{
"field_name": "user_id",
"field_type": "string",
"is_valid": true,
"error_message": "",
"expected_format": "string"
},
{
"field_name": "email",
"field_type": "string",
"is_valid": false,
"error_message": "Invalid email format",
"expected_format": "valid email address (e.g., user@example.com)"
}
],
"data_quality_score":0.7,
"suggested_fixes": [
"Fix email format validation to ensure proper email structure",
"Add proper error handling structure to response"
],
"compliance_check": {
"follows_rest_standards": false,
"has_proper_error_handling": false,
"includes_metadata": false
}
}
```
## Schema Validation Libraries
When working with Structured Outputs, you can use popular schema validation libraries like [Zod](https://zod.dev/) for TypeScript and [Pydantic](https://docs.pydantic.dev/latest/) for Python. These libraries provide type safety, runtime validation, and seamless integration with JSON Schema generation.
### Support Ticket Classification
This example demonstrates how to classify customer support tickets using structured schemas with both Zod and Pydantic, ensuring consistent categorization and routing.
**Example Output**
```json
{
"category": "feature_request",
"priority": "low",
"urgency_score":2.5,
"customer_info": {
"name": "Mike",
"company": "StartupXYZ",
"tier": "paid"
},
"technical_details": [
{
"component": "dashboard",
"description": "Request for dark mode feature"
},
{
"component": "user_interface",
"description": "Request for keyboard shortcuts"
}
],
"keywords": ["dark mode", "dashboard", "keyboard shortcuts", "enhancement"],
"requires_escalation": false,
"estimated_resolution_hours":40,
"summary": "Feature request for dark mode and keyboard shortcuts from paying customer"
}
```
## Implementation Guide
### Schema Definition
Design your JSON Schema to constrain model responses. Reference the [examples](#examples) above and see [supported schema features](#schema-requirements) for technical limitations.
**Schema optimization tips:**
- Use descriptive property names and clear descriptions for complex fields
- Create evaluation sets to test schema effectiveness
- Include titles for important structural elements
### API Integration
Include the schema in your API request using the `response_format` parameter:
```json
response_format: { type: "json_schema", json_schema: { name: "schema_name", schema: … } }
```
Complete implementation example:
### Error Handling
Schema validation failures return HTTP400 errors with the message `Generated JSON does not match the expected schema. Please adjust your prompt.`
**Resolution strategies:**
- Retry requests for transient failures
- Refine prompts for recurring schema mismatches
- Simplify complex schemas if validation consistently fails
### Best Practices
**User input handling:** Include explicit instructions for invalid or incompatible inputs. Models attempt schema adherence even with unrelated data, potentially causing hallucinations. Specify fallback responses (empty fields, error messages) for incompatible inputs.
**Output quality:** Structured outputs are designed to output schema compliance but not semantic accuracy. For persistent errors, refine instructions, add system message examples, or decompose complex tasks. See the [prompt engineering guide](/docs/prompting) for optimization techniques.
## Schema Requirements
Structured Outputs supports a [JSON Schema](https://json-schema.org/docs) subset with specific constraints for performance and reliability.
### Supported Data Types
- **Primitives:** String, Number, Boolean, Integer
- **Complex:** Object, Array, Enum
- **Composition:** anyOf (union types)
### Mandatory Constraints
**Required fields:** All schema properties must be marked as `required`. Optional fields are not supported.
```json
{
"type": "object",
"properties": {
"task_name": {"type": "string"},
"due_date": {"type": "string", "format": "date"}
},
"required": ["task_name", "due_date"]
}
```
**Closed objects:** All objects must set `additionalProperties: false` to prevent undefined properties.
```json
{
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string", "format": "email"}
},
"additionalProperties": false
}
```
**Union types:** Each schema within `anyOf` must comply with all subset restrictions:
```json
{
"anyOf": [
{
"type": "object",
"properties": {"payment_method": {"type": "string", "enum": ["credit_card", "paypal"]}},
"required": ["payment_method"]
},
{
"type": "object",
"properties": {"payment_method": {"type": "string", "enum": ["bank_transfer"]}},
"required": ["payment_method"]
}
]
}
```
**Reusable subschemas:** Define reusable components with `$defs` and reference them using `$ref`:
```json
{
"$defs": {
"address": {
"type": "object",
"properties": {"street": {"type": "string"}, "city": {"type": "string"}},
"required": ["street", "city"]
}
},
"type": "object",
"properties": {"billing_address": {"$ref": "#/$defs/address"}},
"required": ["billing_address"]
}
```
**Root recursion:** Use `#` to reference the root schema:
```json
{
"$ref": "#"
}
```
**Explicit recursion** through definition references:
```json
{
"$defs": {
"node": {
"type": "object",
"properties": {"name": {"type": "string"}, "children": {"type": "array", "items": {"$ref": "#/$defs/node"}}}
}
}
}
```
## JSON Object Mode
JSON Object Mode provides basic JSON output validation without schema enforcement. Unlike Structured Outputs with `json_schema` mode, it is designed to output valid JSON syntax but not schema compliance. The endpoint will either return valid JSON or throw an error if the model cannot produce valid JSON syntax. Use [Structured Outputs](#introduction) when available for your use case.
Enable JSON Object Mode by setting `response_format` to `{ "type": "json_object" }`.
**Requirements and limitations:**
- Include explicit JSON instructions in your prompt (system message or user input)
- Outputs are syntactically valid JSON but may not match your intended schema
- Combine with validation libraries and retry logic for schema compliance
### Sentiment Analysis Example
This example shows prompt-guided JSON generation for sentiment analysis, adaptable to classification, extraction, or summarization tasks:
**Example Output**
```json
{
"sentiment_analysis": {
"sentiment": "positive",
"confidence_score":0.84,
"key_phrases": [
{
"phrase": "absolutely love this product",
"sentiment": "positive"
},
{
"phrase": "quality exceeded my expectations",
"sentiment": "positive"
}
],
"summary": "The reviewer loves the product's quality, but was slightly disappointed with the shipping time."
}
}
```
**Response structure:**
- **sentiment**: Classification (positive/negative/neutral)
- **confidence_score**: Confidence level (0-1 scale)
- **key_phrases**: Extracted phrases with individual sentiment scores
- **summary**: Analysis overview and main findings
---
## Code Execution: Gpt Oss Quickstart (js)
URL: https://console.groq.com/docs/code-execution/scripts/gpt-oss-quickstart
import Groq from "groq-sdk";
const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
const response = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "Calculate the square root of12345. Output only the final answer.",
},
],
model: "openai/gpt-oss-20b", // or "openai/gpt-oss-120b"
tool_choice: "required",
tools: [
{
type: "code_interpreter"
},
],
});
// Final output
console.log(response.choices[0].message.content);
// Reasoning + internal tool calls
console.log(response.choices[0].message.reasoning);
// Code execution tool call
console.log(response.choices[0].message.executed_tools?.[0]);
---
## or "openai/gpt-oss-120b"
URL: https://console.groq.com/docs/code-execution/scripts/gpt-oss-quickstart.py
from groq import Groq
client = Groq(api_key="your-api-key-here")
response = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Calculate the square root of12345. Output only the final answer.",
}
],
model="openai/gpt-oss-20b", # or "openai/gpt-oss-120b"
tool_choice="required",
tools=[
{
"type": "code_interpreter"
}
],
)
# Final output
print(response.choices[0].message.content)
# Reasoning + internal tool calls
print(response.choices[0].message.reasoning)
# Code execution tool calls
print(response.choices[0].message.executed_tools[0])
---
## Code Execution: Calculation (js)
URL: https://console.groq.com/docs/code-execution/scripts/calculation
import Groq from "groq-sdk";
const groq = new Groq();
const chatCompletion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "Calculate the monthly payment for a $30,000 loan over 5 years at 6% annual interest rate using the standard loan payment formula. Use python code.",
},
],
model: "groq/compound-mini",
});
console.log(chatCompletion.choices[0]?.message?.content || "");
---
## Final output
URL: https://console.groq.com/docs/code-execution/scripts/quickstart.py
import os
from groq import Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
response = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Calculate the square root of101 and show me the Python code you used",
}
],
model="groq/compound-mini",
)
# Final output
print(response.choices[0].message.content)
# Reasoning + internal tool calls
print(response.choices[0].message.reasoning)
# Code execution tool call
if response.choices[0].message.executed_tools:
print(response.choices[0].message.executed_tools[0])
---
## Code Execution: Debugging (js)
URL: https://console.groq.com/docs/code-execution/scripts/debugging
import Groq from "groq-sdk";
const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
const chatCompletion = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "Will this Python code raise an error? `import numpy as np; a = np.array([1,2]); b = np.array([3,4,5]); print(a + b)`",
},
],
model: "groq/compound-mini",
});
console.log(chatCompletion.choices[0]?.message?.content || "");
---
## Code Execution: Quickstart (js)
URL: https://console.groq.com/docs/code-execution/scripts/quickstart
import Groq from "groq-sdk";
const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
const response = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "Calculate the square root of101 and show me the Python code you used",
},
],
model: "groq/compound-mini",
});
// Final output
console.log(response.choices[0].message.content);
// Reasoning + internal tool calls
console.log(response.choices[0].message.reasoning);
// Code execution tool call
console.log(response.choices[0].message.executed_tools?.[0]);
---
## Code Execution: Calculation (py)
URL: https://console.groq.com/docs/code-execution/scripts/calculation.py
```python
import os
from groq import Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Calculate the monthly payment for a $30,000 loan over5 years at6% annual interest rate using the standard loan payment formula. Use python code.",
}
],
model="groq/compound-mini",
)
print(chat_completion.choices[0].message.content)
```
---
## Code Execution: Debugging (py)
URL: https://console.groq.com/docs/code-execution/scripts/debugging.py
import os
from groq import Groq
client = Groq(api_key=os.environ.get("GROQ_API_KEY"))
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Will this Python code raise an error? `import numpy as np; a = np.array([1,2]); b = np.array([3,4,5]); print(a + b)`",
}
],
model="groq/compound-mini",
)
print(chat_completion.choices[0].message.content)
---
## Code Execution
URL: https://console.groq.com/docs/code-execution
# Code Execution
Some models and systems on Groq have native support for automatic code execution, allowing them to perform calculations, run code snippets, and solve computational problems in real-time.
Only Python is currently supported for code execution.
The use of this tool with a supported model or system in GroqCloud is not a HIPAA Covered Cloud Service under Groq's Business Associate Addendum at this time. This tool is also not available currently for use with regional / sovereign endpoints.
## Supported Models and Systems
Built-in code execution is supported for the following models and systems:
| Model ID | Model |
|---------------------------------|--------------------------------|
| OpenAI GPT-OSS20B | [OpenAI GPT-OSS20B](/docs/model/openai/gpt-oss-20b)
| OpenAI GPT-OSS120B | [OpenAI GPT-OSS120B](/docs/model/openai/gpt-oss-120b)
| Compound | [Compound](/docs/compound/systems/compound)
| Compound Mini | [Compound Mini](/docs/compound/systems/compound-mini)
For a comparison between the `groq/compound` and `groq/compound-mini` systems and more information regarding additional capabilities, see the [Compound Systems](/docs/compound/systems#system-comparison) page.
## Quick Start (Compound)
To use code execution with [Groq's Compound systems](/docs/compound), change the `model` parameter to one of the supported models or systems.
*And that's it!*
When the API is called, it will intelligently decide when to use code execution to best answer the user's query. Code execution is performed on the server side in a secure sandboxed environment, so no additional setup is required on your part.
### Final Output
This is the final response from the model, containing the answer based on code execution results. The model combines computational results with explanatory text to provide a comprehensive response. Use this as the primary output for user-facing applications.
The square root of101 is:
10.04987562112089
Here is the Python code I used:
```
python
import math
print("The square root of101 is: ")
print(math.sqrt(101))
```
### Reasoning and Internal Tool Calls
This shows the model's internal reasoning process and the Python code it executed to solve the problem. You can inspect this to understand how the model approached the computational task and what code it generated. This is useful for debugging and understanding the model's decision-making process.
python(import math; print("The square root of101 is: "); print(math.sqrt(101)))
### Executed Tools Information
This contains the raw executed tools data, including the generated Python code, execution output, and metadata. You can use this to access the exact code that was run and its results programmatically.
111.1080555135405112450044
### Reasoning and Internal Tool Calls
This shows the model's internal reasoning process and the Python code it executed to solve the problem. You can inspect this to understand how the model approached the computational task and what code it generated.
We need sqrt(12345). Compute.math.sqrt returns111.1080555... Let's compute with precision.Let's get more precise.We didn't get output because decimal sqrt needs context. Let's compute.It didn't output because .sqrt() might not be available for Decimal? Actually Decimal has sqrt method? There is sqrt in Decimal from Python3.11? Actually it's decimal.Decimal.sqrt() available. But maybe need import Decimal. Let's try.It outputs nothing? Actually maybe need to print.
### Executed Tools Information
This contains the raw executed tools data, including the generated Python code, execution output, and metadata. You can use this to access the exact code that was run and its results programmatically.
[LlamaIndex](https://www.llamaindex.ai/) is a data framework for LLM-based applications that benefit from context augmentation, such as Retrieval-Augmented Generation (RAG) systems. LlamaIndex provides the essential abstractions to more easily ingest, structure, and access private or domain-specific data, resulting in safe and reliable injection into LLMs for more accurate text generation.
For more information, read the LlamaIndex Groq integration documentation for [Python](https://docs.llamaindex.ai/en/stable/examples/llm/groq.html) and [JavaScript](https://ts.llamaindex.ai/modules/llms/available_llms/groq).
---
## 🦜️🔗 LangChain + Groq
URL: https://console.groq.com/docs/langchain
## 🦜️🔗 LangChain + Groq
While you could use the Groq SDK directly, [LangChain](https://www.langchain.com/) is a framework that makes it easy to build sophisticated applications
with LLMs. Combined with Groq API for fast inference speed, you can leverage LangChain components such as:
- **Chains:** Compose multiple operations into a single workflow, connecting LLM calls, prompts, and tools together seamlessly (e.g., prompt → LLM → output parser)
- **Prompt Templates:** Easily manage your prompts and templates with pre-built structures to consisently format queries that can be reused across different models
- **Memory:** Add state to your applications by storing and retrieving conversation history and context
- **Tools:** Extend your LLM applications with external capabilities like calculations, external APIs, or data retrievals
- **Agents:** Create autonomous systems that can decide which tools to use and how to approach complex tasks
### Quick Start (3 minutes to hello world)
####1. Install the package:
```bash
pip install langchain-groq
```
####2. Set up your API key:
```bash
export GROQ_API_KEY="your-groq-api-key"
```
####3. Create your first LangChain assistant:
Running the below code will create a simple chain that calls a model to extract product information from text and output it
as structured JSON. The chain combines a prompt that tells the model what information to extract, a parser that ensures the output follows a
specific JSON format, and `llama-3.3-70b-versatile` to do the actual text processing.
```python
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser
import json
# Initialize Groq LLM
llm = ChatGroq(
model_name="llama-3.3-70b-versatile",
temperature=0.7
)
# Define the expected JSON structure
parser = JsonOutputParser(pydantic_object={
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
"features": {
"type": "array",
"items": {"type": "string"}
}
}
})
# Create a simple prompt
prompt = ChatPromptTemplate.from_messages([
("system", """Extract product details into JSON with this structure:
{{
"name": "product name here",
"price": number_here_without_currency_symbol,
"features": ["feature1", "feature2", "feature3"]
}}"""),
("user", "{input}")
])
# Create the chain that guarantees JSON output
chain = prompt | llm | parser
def parse_product(description: str) -> dict:
result = chain.invoke({"input": description})
print(json.dumps(result, indent=2))
# Example usage
description = """The Kees Van Der Westen Speedster is a high-end, single-group espresso machine known for its precision, performance,
and industrial design. Handcrafted in the Netherlands, it features dual boilers for brewing and steaming, PID temperature control for
consistency, and a unique pre-infusion system to enhance flavor extraction. Designed for enthusiasts and professionals, it offers
customizable aesthetics, exceptional thermal stability, and intuitive operation via a lever system. The pricing is approximatelyt $14,499
depending on the retailer and customization options."""
parse_product(description)
```
**Challenge:** Make the above code your own! Try extending it to include memory with conversation history handling via LangChain to enable
users to ask follow-up questions.
For more information on how to build robust, realtime applications with LangChain and Groq, see:
- [Official Documentation: LangChain](https://python.langchain.com/docs/integrations/chat/groq)
- [Groq API Cookbook: Benchmarking a RAG Pipeline with LangChain and LLama](https://github.com/groq/groq-api-cookbook/blob/main/tutorials/benchmarking-rag-langchain/benchmarking_rag.ipynb)
- [Webinar: Build Blazing-Fast LLM Apps with Groq, Langflow, & LangChain](https://youtu.be/4ukqsKajWnk?si=ebbbnFH0DySdoWbX)
---
## Agno + Groq: Fast Agents
URL: https://console.groq.com/docs/agno
## Agno + Groq: Fast Agents
[Agno](https://github.com/agno-agi/agno) is a lightweight framework for building multi-modal Agents. It's easy to use, extremely fast and supports multi-modal inputs and outputs.
With Groq & Agno, you can build:
- **Agentic RAG**: Agents that can search different knowledge stores for RAG or dynamic few-shot learning.
- **Image Agents**: Agents that can understand images and make tool calls accordingly.
- **Reasoning Agents**: Agents that can reason using a reasoning model, then generate a result using another model.
- **Structured Outputs**: Agents that can generate pydantic objects adhering to a schema.
### Python Quick Start (2 minutes to hello world)
Agents are autonomous programs that use language models to achieve tasks. They solve problems by running tools, accessing knowledge and memory to improve responses.
Let's build a simple web search agent, with a tool to search DuckDuckGo to get better results.
####1. Create a file called `web_search_agent.py` and add the following code:
```python web_search_agent.py
from agno.agent import Agent
from agno.models.groq import Groq
from agno.tools.duckduckgo import DuckDuckGoTools
# Initialize the agent with an LLM via Groq and DuckDuckGoTools
agent = Agent(
model=Groq(id="llama-3.3-70b-versatile"),
description="You are an enthusiastic news reporter with a flair for storytelling!",
tools=[DuckDuckGoTools()], # Add DuckDuckGo tool to search the web
show_tool_calls=True, # Shows tool calls in the response, set to False to hide
markdown=True # Format responses in markdown
)
# Prompt the agent to fetch a breaking news story from New York
agent.print_response("Tell me about a breaking news story from New York.", stream=True)
```
####3. Set up and activate your virtual environment:
```shell
python3 -m venv .venv
source .venv/bin/activate
```
####4. Install the Groq, Agno, and DuckDuckGo dependencies:
```shell
pip install -U groq agno duckduckgo-search
```
####5. Configure your Groq API Key:
```bash
GROQ_API_KEY="your-api-key"
```
####6. Run your Agno agent that now extends your LLM's context to include web search for up-to-date information and send results in seconds:
```shell
python web_search_agent.py
```
### Multi-Agent Teams
Agents work best when they have a singular purpose, a narrow scope, and a small number of tools. When the number of tools grows beyond what the language model can handle or the tools belong to different
categories, use a **team of agents** to spread the load.
The following code expands upon our quick start and creates a team of two agents to provide analysis on financial markets:
```python agent_team.py
from agno.agent import Agent
from agno.models.groq import Groq
from agno.tools.duckduckgo import DuckDuckGoTools
from agno.tools.yfinance import YFinanceTools
web_agent = Agent(
name="Web Agent",
role="Search the web for information",
model=Groq(id="llama-3.3-70b-versatile"),
tools=[DuckDuckGoTools()],
instructions="Always include sources",
markdown=True,
)
finance_agent = Agent(
name="Finance Agent",
role="Get financial data",
model=Groq(id="llama-3.3-70b-versatile"),
tools=[YFinanceTools(stock_price=True, analyst_recommendations=True, company_info=True)],
instructions="Use tables to display data",
markdown=True,
)
agent_team = Agent(
team=[web_agent, finance_agent],
model=Groq(id="llama-3.3-70b-versatile"), # You can use a different model for the team leader agent
instructions=["Always include sources", "Use tables to display data"],
# show_tool_calls=True, # Uncomment to see tool calls in the response
markdown=True,
)
# Give the team a task
agent_team.print_response("What's the market outlook and financial performance of AI semiconductor companies?", stream=True)
```
### Additional Resources
For additional documentation and support, see the following:
- [Agno Documentation](https://docs.agno.com)
- [Groq via Agno Documentation](https://docs.agno.com/models/groq)
- [Groq via Agno examples](https://docs.agno.com/examples/models/groq/basic)
- [Various industry-ready examples](https://docs.agno.com/examples/introduction)
---
## Reasoning: Reasoning Gpt Oss High (js)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_gpt-oss-high
The provided content does not contain any React-specific component tags or documentation content that needs cleaning according to the specified rules. Therefore, the content remains unchanged:
```javascript
import { Groq } from 'groq-sdk';
const groq = new Groq();
const chatCompletion = await groq.chat.completions.create({
"messages": [
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
"model": "openai/gpt-oss-20b",
"reasoning_effort": "high",
"include_reasoning": true,
"stream": false
});
console.log(chatCompletion.choices[0].message);
```
---
## Reasoning: Reasoning Parsed (py)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_parsed.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
model="qwen/qwen3-32b",
stream=False,
reasoning_format="parsed"
)
print(chat_completion.choices[0].message)
---
## Reasoning: Reasoning Hidden (py)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_hidden.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
model="qwen/qwen3-32b",
stream=False,
reasoning_format="hidden"
)
print(chat_completion.choices[0].message)
---
## Reasoning: Reasoning Gpt Oss (py)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_gpt-oss.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
model="openai/gpt-oss-20b",
stream=False
)
print(chat_completion.choices[0].message)
---
## Reasoning: Reasoning Gpt Oss (js)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_gpt-oss
The provided content does not contain any React-specific component tags or documentation content that needs cleaning. Here is the content as provided:
import { Groq } from 'groq-sdk';
const groq = new Groq();
const chatCompletion = await groq.chat.completions.create({
"messages": [
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
"model": "openai/gpt-oss-20b",
"stream": false
});
console.log(chatCompletion.choices[0].message);
---
## Reasoning: Reasoning Gpt Oss Excl (js)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_gpt-oss-excl
import { Groq } from 'groq-sdk';
const groq = new Groq();
const chatCompletion = await groq.chat.completions.create({
"messages": [
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
"model": "openai/gpt-oss-20b",
"stream": false,
"include_reasoning": false
});
console.log(chatCompletion.choices[0].message);
---
## Reasoning: R1 (js)
URL: https://console.groq.com/docs/reasoning/scripts/r1
import Groq from 'groq-sdk';
const client = new Groq();
const completion = await client.chat.completions.create({
model: "openai/gpt-oss-20b",
messages: [
{
role: "user",
content: "How many r's are in the word strawberry?"
}
],
temperature: 0.6,
max_completion_tokens: 1024,
top_p: 0.95,
stream: true
});
for await (const chunk of completion) {
process.stdout.write(chunk.choices[0].delta.content || "");
}
---
## Reasoning: Reasoning Hidden (js)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_hidden
import { Groq } from 'groq-sdk';
const groq = new Groq();
const chatCompletion = await groq.chat.completions.create({
"messages": [
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
"model": "qwen/qwen3-32b",
"stream": false,
"reasoning_format": "hidden"
});
console.log(chatCompletion.choices[0].message);
---
## Reasoning: Reasoning Parsed (js)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_parsed
import { Groq } from 'groq-sdk';
const groq = new Groq();
const chatCompletion = await groq.chat.completions.create({
"messages": [
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
"model": "qwen/qwen3-32b",
"stream": false,
"reasoning_format": "parsed"
});
console.log(chatCompletion.choices[0].message);
---
## Reasoning: Reasoning Gpt Oss High (py)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_gpt-oss-high.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
model="openai/gpt-oss-20b",
reasoning_effort="high",
include_reasoning=True,
stream=False
)
print(chat_completion.choices[0].message)
---
## Reasoning: Reasoning Raw (js)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_raw
import { Groq } from 'groq-sdk';
const groq = new Groq();
const chatCompletion = await groq.chat.completions.create({
"messages": [
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
"model": "qwen/qwen3-32b",
"stream": false,
"reasoning_format": "raw"
});
console.log(chatCompletion.choices[0].message);
---
## Reasoning: Reasoning Raw (py)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_raw.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
model="qwen/qwen3-32b",
stream=False,
reasoning_format="raw"
)
print(chat_completion.choices[0].message)
---
## Reasoning: Reasoning Gpt Oss Excl (py)
URL: https://console.groq.com/docs/reasoning/scripts/reasoning_gpt-oss-excl.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "How do airplanes fly? Be concise."
}
],
model="openai/gpt-oss-20b",
stream=False,
include_reasoning=False
)
print(chat_completion.choices[0].message)
---
## Reasoning: R1 (py)
URL: https://console.groq.com/docs/reasoning/scripts/r1.py
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{
"role": "user",
"content": "How many r's are in the word strawberry?"
}
],
temperature=0.6,
max_completion_tokens=1024,
top_p=0.95,
stream=True
)
for chunk in completion:
print(chunk.choices[0].delta.content or "", end="")
---
## Reasoning
URL: https://console.groq.com/docs/reasoning
# Reasoning
Reasoning models excel at complex problem-solving tasks that require step-by-step analysis, logical deduction, and structured thinking and solution validation. With Groq inference speed, these types of models
can deliver instant reasoning capabilities critical for real-time applications.
## Why Speed Matters for Reasoning
Reasoning models are capable of complex decision making with explicit reasoning chains that are part of the token output and used for decision-making, which make low-latency and fast inference essential.
Complex problems often require multiple chains of reasoning tokens where each step build on previous results. Low latency compounds benefits across reasoning chains and shaves off minutes of reasoning to a response in seconds.
## Supported Models
| Model ID | Model |
|---------------------------------|--------------------------------|
| `openai/gpt-oss-20b` | [OpenAI GPT-OSS20B](/docs/model/openai/gpt-oss-20b)
| `openai/gpt-oss-120b` | [OpenAI GPT-OSS120B](/docs/model/openai/gpt-oss-120b)
| `qwen/qwen3-32b` | [Qwen332B](/docs/model/qwen3-32b)
## Reasoning Format
Groq API supports explicit reasoning formats through the `reasoning_format` parameter, giving you fine-grained control over how the model's
reasoning process is presented. This is particularly valuable for valid JSON outputs, debugging, and understanding the model's decision-making process.
**Note:** The format defaults to `raw` or `parsed` when JSON mode or tool use are enabled as those modes do not support `raw`. If reasoning is
explicitly set to `raw` with JSON mode or tool use enabled, we will return a400 error.
### Options for Reasoning Format
| `reasoning_format` Options | Description |
|------------------|------------------------------------------------------------|
| `parsed` | Separates reasoning into a dedicated `message.reasoning` field while keeping the response concise. |
| `raw` | Includes reasoning within `` tags in the main text content. |
| `hidden` | Returns only the final answer. |
### Including Reasoning in the Response
You can also control whether reasoning is included in the response by setting the `include_reasoning` parameter.
| `include_reasoning` Options | Description |
|------------------|------------------------------------------------------------|
| `true` | Includes the reasoning in a dedicated `message.reasoning` field. This is the default behavior. |
| `false` | Excludes reasoning from the response. |
**Note:** The `include_reasoning` parameter cannot be used together with `reasoning_format`. These parameters are mutually exclusive.
## Reasoning Effort
### Options for Reasoning Effort (Qwen332B)
The `reasoning_effort` parameter controls the level of effort the model will put into reasoning. This is only supported by [Qwen332B](/docs/model/qwen3-32b).
| `reasoning_effort` Options | Description |
|------------------|------------------------------------------------------------|
| `none` | Disable reasoning. The model will not use any reasoning tokens. |
| `default` | Enable reasoning. |
### Options for Reasoning Effort (GPT-OSS)
The `reasoning_effort` parameter controls the level of effort the model will put into reasoning. This is only supported by [GPT-OSS20B](/docs/model/openai/gpt-oss-20b) and [GPT-OSS120B](/docs/model/openai/gpt-oss-120b).
| `reasoning_effort` Options | Description |
|------------------|------------------------------------------------------------|
| `low` | Low effort reasoning. The model will use a small number of reasoning tokens. |
| `medium` | Medium effort reasoning. The model will use a moderate number of reasoning tokens. |
| `high` | High effort reasoning. The model will use a large number of reasoning tokens. |
## Quick Start
Get started with reasoning models using this basic example that demonstrates how to make a simple API call for complex problem-solving tasks.
```bash
curl https://api.groq.com//openai/v1/chat/completions -s \
-H "authorization: bearer $GROQ_API_KEY" \
-d '{
"model": "openai/gpt-oss-20b",
"messages": [
{
"role": "user",
"content": "What is the weather like in Paris today?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
}
},
"required": [
"location"
],
"additionalProperties": false
},
"strict": true
}
}
]}'
```
## Recommended Configuration Parameters
| Parameter | Default | Range | Description |
|-----------|---------|--------|-------------|
| `messages` | - | - | Array of message objects. Important: Avoid system prompts - include all instructions in the user message! |
| `temperature` |0.6 |0.0 -2.0 | Controls randomness in responses. Lower values make responses more deterministic. Recommended range:0.5-0.7 to prevent repetitions or incoherent outputs |
| `max_completion_tokens` |1024 | - | Maximum length of model's response. Default may be too low for complex reasoning - consider increasing for detailed step-by-step solutions |
| `top_p` |0.95 |0.0 -1.0 | Controls diversity of token selection |
| `stream` | false | boolean | Enables response streaming. Recommended for interactive reasoning tasks |
| `stop` | null | string/array | Custom stop sequences |
| `seed` | null | integer | Set for reproducible results. Important for benchmarking - run multiple tests with different seeds |
| `response_format` | `{type: "text"}` | `{type: "json_object"}` or `{type: "text"}` | Set to `json_object` type for structured output. |
| `reasoning_format` | `raw` | `"parsed"`, `"raw"`, `"hidden"` | Controls how model reasoning is presented in the response. Must be set to either `parsed` or `hidden` when using tool calls or JSON mode. |
| `reasoning_effort` | `default` | `"none"`, `"default"`, `"low"`, `"medium"`, `"high"` | Controls the level of effort the model will put into reasoning. `none` and `default` are only supported by [Qwen332B](/docs/model/qwen3-32b). `low`, `medium`, and `high` are only supported by [GPT-OSS20B](/docs/model/openai/gpt-oss-20b) and [GPT-OSS120B](/docs/model/openai/gpt-oss-120b). |
## Accessing Reasoning Content
Accessing the reasoning content in the response is dependent on the model and the reasoning format you are using. See the examples below for more details and refer to the [Reasoning Format](#reasoning-format) section for more information.
### Non-GPT-OSS Models
When using `raw` reasoning format, the reasoning content is accessible in the main text content of assistant responses within `` tags. This example demonstrates making a request with `reasoning_format` set to `raw` to see the model's internal thinking process alongside the final answer.
When using `parsed` reasoning format, the model's reasoning is separated into a dedicated `reasoning` field, making it easier to access both the final answer and the thinking process programmatically. This format is ideal for applications that need to process or display reasoning content separately from the main response.
When using `hidden` reasoning format, only the final answer is returned without any visible reasoning content. This is useful for applications where you want the benefits of reasoning models but don't need to expose the thinking process to end users. The model will still reason, but the reasoning content will not be returned in the response.
### GPT-OSS Models
With `openai/gpt-oss-20b` and `openai/gpt-oss-120b`, the `reasoning_format` parameter is not supported.
By default, these models will include reasoning content in the `reasoning` field of the assistant response.
You can also control whether reasoning is included in the response by setting the `include_reasoning` parameter.
## Optimizing Performance
### Temperature and Token Management
The model performs best with temperature settings between0.5-0.7, with lower values (closer to0.5) producing more consistent mathematical proofs and higher values allowing for more creative problem-solving approaches. Monitor and adjust your token usage based on the complexity of your reasoning tasks - while the default max_completion_tokens is1024, complex proofs may require higher limits.
### Prompt Engineering
To ensure accurate, step-by-step reasoning while maintaining high performance:
- DeepSeek-R1 works best when all instructions are included directly in user messages rather than system prompts.
- Structure your prompts to request explicit validation steps and intermediate calculations.
- Avoid few-shot prompting and go for zero-shot prompting only.
---
## 🎨 Gradio + Groq: Easily Build Web Interfaces
URL: https://console.groq.com/docs/gradio
## 🎨 Gradio + Groq: Easily Build Web Interfaces
[Gradio](https://www.gradio.app/) is a powerful library for creating web interfaces for your applications that enables you to quickly build
interactive demos for your fast Groq apps with features such as:
- **Interface Builder:** Create polished UIs with just a few lines of code, supporting text, images, audio, and more
- **Interactive Demos:** Build demos that showcase your LLM applications with multiple input/output components
- **Shareable Apps:** Deploy and share your Groq-powered applications with a single click
### Quick Start (2 minutes to hello world)
####1. Install the packages:
```bash
pip install groq-gradio
```
####2. Set up your API key:
```bash
export GROQ_API_KEY="your-groq-api-key"
```
####3. Create your first Gradio chat interface:
The following code creates a simple chat interface with `llama-3.3-70b-versatile` that includes a clean UI.
```python
import gradio as gr
import groq_gradio
import os
# Initialize Groq client
client = Groq(
api_key=os.environ.get("GROQ_API_KEY")
)
gr.load(
name='llama-3.3-70b-versatile', # The specific model powered by Groq to use
src=groq_gradio.registry, # Tells Gradio to use our custom interface registry as the source
title='Groq-Gradio Integration', # The title shown at the top of our UI
description="Chat with the Llama3.370B model powered by Groq.", # Subtitle
examples=["Explain quantum gravity to a5-year old.", "How many R are there in the word Strawberry?"] # Pre-written prompts users can click to try
).launch() # Creates and starts the web server!
```
**Challenge**: Enhance the above example to create a multi-modal chatbot that leverages text, audio, and vision models powered by Groq and
displayed on a customized UI built with Gradio blocks!
For more information on building robust applications with Gradio and Groq, see:
- [Official Documentation: Gradio](https://www.gradio.app/docs)
- [Tutorial: Automatic Voice Detection with Groq](https://www.gradio.app/guides/automatic-voice-detection)
- [Groq API Cookbook: Groq and Gradio for Realtime Voice-Powered AI Applications](https://github.com/groq/groq-api-cookbook/blob/main/tutorials/groq-gradio/groq-gradio-tutorial.ipynb)
- [Webinar: Building a Multimodal Voice Enabled Calorie Tracking App with Groq and Gradio](https://youtu.be/azXaioGdm2Q?si=sXPJW1IerbghsCKU)
---
## Spend Limits
URL: https://console.groq.com/docs/spend-limits
# Spend Limits
Control your API costs with automated spending limits and proactive usage alerts when approaching budget thresholds.
## Quick Start
**Set a spending limit in 3 steps:**
1. Go to [**Settings** → **Billing** → **Limits**](/settings/billing/limits)
2. Click **Add Limit** and enter your monthly budget in USD
3. Add alert thresholds at 50%, 75%, and 90% of your limit
4. Click **Save** to activate the limit
**Requirements:** Paid tier account with organization owner permissions.
## How It Works
Spend limits automatically protect your budget by blocking API access when you reach your monthly cap. The limit applies organization-wide across all API keys, so usage from any team member or application counts toward the same shared limit. If you hit your set limit, API calls from any key in your organization will return a 400 with code `blocked_api_access`. Usage alerts notify you via email before you hit the limit, giving you time to adjust usage or increase your budget.
This feature offers:
- **Near real-time tracking:** Your current spend updates every 10-15 minutes
- **Automatic monthly reset:** Limits reset at the beginning of each billing cycle (1st of the month)
- **Immediate blocking:** API access is blocked when a spend update detects you've hit your limit
> ⚠️ **Important:** There's a 10-15 minute delay in spend tracking. This means you might exceed your limit by a small amount during high usage periods.
## Setting Up Spending Limits
### Create a Spending Limit
Navigate to [**Settings** → **Billing** → **Limits**](/settings/billing/limits) and click **Add Limit**.
Example Monthly Spending Limit: $500
Your API requests will be blocked when you reach $500 in monthly usage. The limit resets automatically on the 1st of each month.
### Add Usage Alerts
Set up email notifications before you hit your limit:
Alert at $250 (50% of limit)
Alert at $375 (75% of limit)
Alert at $450 (90% of limit)
**To add an alert:**
1. Click **Add Alert** in the Usage Alerts section
2. Enter the USD amount trigger
3. Click **Save**
Alerts appear as visual markers on your spending progress bar on Groq Console Limits page under Billing.
### Manage Your Alerts
- **Edit Limit:** Click the pencil icon next to any alert
- **Delete:** Click the trash icon to remove an alert
- **Multiple alerts:** Add as many thresholds as needed
## Email Notifications
All spending alerts and limit notifications are sent from **support@groq.com** to your billing email addresses.
**Update billing emails:**
1. Go to [**Settings** → **Billing** → **Manage**](/settings/billing)
2. Add or update email addresses
3. Return to the Limits page to confirm the changes
**Pro tip:** Add multiple team members to billing emails so important notifications don't get missed.
## Best Practices
### Setting Effective Limits
- **Start conservative:** Set your first limit 20-30% above your expected monthly usage to account for variability.
- **Monitor patterns:** Review your usage for 2-3 months, then adjust limits based on actual consumption patterns.
- **Leave buffer room:** Don't set limits exactly at your expected usage—unexpected spikes can block critical API access.
- **Use multiple thresholds:** Set alerts at 50%, 75%, and 90% of your limit to get progressive warnings.
## Troubleshooting
### Can't Access the Limits Page?
- **Check your account tier:** Spending limits are only available on paid plans, not free tier accounts.
- **Verify permissions:** You need organization owner permissions to manage spending limits.
- **Feature availability:** Contact us via support@groq.com if you're on a paid tier but don't see the spending limits feature.
### Not Receiving Alert Emails?
- **Verify email addresses:** Check that your billing emails are correct in [**Settings** → **Billing** → **Manage**](/settings/billing).
- **Check spam folders:** Billing alerts might be filtered by your email provider.
- **Test notifications:** Set a low-dollar test alert to verify email delivery is working.
### API Access Blocked?
- **Check your spending status:** The [limits page](/settings/billing/limits) shows your current spend against your limit.
- **Increase your limit:** You can raise your spending limit at any time to restore immediate access if you've hit your spend limit. You can also remove it to unblock your API access immediately.
- **Wait for reset:** If you've hit your limit, API access will restore on the 1st of the next month.
## FAQ
**Q: Can I set different limits for different API endpoints or API keys?**
A: No, spending limits are organization-wide and apply to your total monthly usage across all API endpoints and all API keys in your organization. All team members and applications using your organization's API keys share the same spending limit.
**Q: What happens to in-flight requests when I hit my limit?**
A: In-flight requests complete normally, but new requests are blocked immediately.
**Q: Can I set weekly or daily spending limits?**
A: Currently, only monthly limits are supported. Limits reset on the 1st of each month.
**Q: How accurate is the spending tracking?**
A: Spending is tracked in near real-time with a 10-15 minute delay. The delay prevents brief usage spikes from prematurely triggering limits.
**Q: Can I temporarily disable my spending limit?**
A: Yes, you can edit or remove your spending limit at any time from the limits page.
Need help? Contact our support team at support@groq.com with details about your configuration and any error messages.
---
## Browser Search: Quickstart (py)
URL: https://console.groq.com/docs/browser-search/scripts/quickstart.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "What happened in AI last week? Give me a concise, one paragraph summary of the most important events."
}
],
model="openai/gpt-oss-20b",
temperature=1,
max_completion_tokens=2048,
top_p=1,
stream=False,
stop=None,
tool_choice="required",
tools=[
{
"type": "browser_search"
}
]
)
print(chat_completion.choices[0].message.content)
---
## Browser Search: Quickstart (js)
URL: https://console.groq.com/docs/browser-search/scripts/quickstart
import { Groq } from 'groq-sdk';
const groq = new Groq();
const chatCompletion = await groq.chat.completions.create({
"messages": [
{
"role": "user",
"content": "What happened in AI last week? Give me a concise, one paragraph summary of the most important events."
}
],
"model": "openai/gpt-oss-20b",
"temperature":1,
"max_completion_tokens":2048,
"top_p":1,
"stream": false,
"reasoning_effort": "medium",
"stop": null,
"tool_choice": "required",
"tools": [
{
"type": "browser_search"
}
]
});
console.log(chatCompletion.choices[0].message.content);
---
## Browser Search
URL: https://console.groq.com/docs/browser-search
# Browser Search
Some models on Groq have built-in support for interactive browser search, providing a more comprehensive approach to accessing real-time web content than traditional web search. Unlike [Web Search](/docs/web-search) which performs a single search and retrieves text snippets from webpages, browser search mimics human browsing behavior by navigating websites interactively, providing more detailed results.
For latency sensitive use cases, we recommend using [Web Search](/docs/web-search) instead.
The use of this tool with a supported model or system in GroqCloud is not a HIPAA Covered Cloud Service under Groq's Business Associate Addendum at this time. This tool is also not available currently for use with regional / sovereign endpoints.
## Supported Models
Built-in browser search is supported for the following models:
| Model ID | Model |
|---------------------------------|--------------------------------|
| openai/gpt-oss-20b | [OpenAI GPT-OSS20B](/docs/model/openai/gpt-oss-20b)
| openai/gpt-oss-120b | [OpenAI GPT-OSS120B](/docs/model/openai/gpt-oss-120b)
**Note:** Browser search is not compatible with [structured outputs](/docs/structured-outputs).
## Quick Start
To use browser search, change the `model` parameter to one of the supported models.
When the API is called, it will use browser search to best answer the user's query. This tool call is performed on the server side, so no additional setup is required on your part to use this feature.
### Final Output
This is the final response from the model, containing snippets from the web pages that were searched, and the final response at the end. The model combines information from multiple sources to provide a comprehensive response.
## Pricing
Please see the [Pricing](https://groq.com/pricing) page for more information.
## Best Practices
When using browser search with reasoning models, consider setting `reasoning_effort` to `low` to optimize performance and token usage. Higher reasoning effort levels can result in extended browser sessions with more comprehensive web exploration, which may consume significantly more tokens than necessary for most queries. Using `low` reasoning effort provides a good balance between search quality and efficiency.
## Provider Information
Browser search functionality is powered by [Exa](https://exa.ai/), a search engine designed for AI applications. Exa provides comprehensive web browsing capabilities that go beyond traditional search by allowing models to navigate and interact with web content in a more human-like manner.
---
## Prompting: Seed (js)
URL: https://console.groq.com/docs/prompting/scripts/seed
```javascript
import { Groq } from "groq-sdk"
const groq = new Groq()
const response = await groq.chat.completions.create({
messages: [
{ role: "system", content: "You are a creative storyteller." },
{ role: "user", content: "Write a brief opening line to a mystery novel." }
],
model: "llama-3.1-8b-instant",
temperature:0.8, // Some creativity allowed
seed:700, // Deterministic seed
max_tokens:50
});
console.log(response.choices[0].message.content)
```
---
## Using a custom stop sequence for structured, concise output.
URL: https://console.groq.com/docs/prompting/scripts/stop.py
# Using a custom stop sequence for structured, concise output.
# The model is instructed to produce '###' at the end of the desired content.
# The API will stop generation when '###' is encountered and will NOT include '###' in the response.
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Provide a2-sentence summary of the concept of 'artificial general intelligence'. End your summary with '###'."
}
# Model's goal before stop sequence removal might be:
# "Artificial general intelligence (AGI) refers to a type of AI that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparable to that of a human being. This contrasts with narrow AI, which is designed for specific tasks. ###"
],
model="llama-3.1-8b-instant",
stop=["###"],
max_tokens=100 # Ensure enough tokens for the summary + stop sequence
)
print(chat_completion.choices[0].message.content)
---
## Prompting: Stop (js)
URL: https://console.groq.com/docs/prompting/scripts/stop
```javascript
// Using a custom stop sequence for structured, concise output.
// The model is instructed to produce '###' at the end of the desired content.
// The API will stop generation when '###' is encountered and will NOT include '###' in the response.
import { Groq } from "groq-sdk"
const groq = new Groq()
const response = await groq.chat.completions.create({
messages: [
{
role: "user",
content: "Provide a2-sentence summary of the concept of 'artificial general intelligence'. End your summary with '###'."
}
// Model's goal before stop sequence removal might be:
// "Artificial general intelligence (AGI) refers to a type of AI that possesses the ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparable to that of a human being. This contrasts with narrow AI, which is designed for specific tasks. ###"
],
model: "llama-3.1-8b-instant",
stop: ["###"],
max_tokens:100 // Ensure enough tokens for the summary + stop sequence
});
console.log(response.choices[0].message.content)
```
---
## Prompting: Roles (js)
URL: https://console.groq.com/docs/prompting/scripts/roles
import Groq from "groq-sdk";
const groq = new Groq();
const systemPrompt = `
You are a helpful IT support chatbot for 'Tech Solutions'.
Your role is to assist employees with common IT issues, provide guidance on using company software, and help troubleshoot basic technical problems.
Respond clearly and patiently. If an issue is complex, explain that you will create a support ticket for a human technician.
Keep responses brief and ask a maximum of one question at a time.
`;
const completion = await groq.chat.completions.create({
messages: [
{
role: "system",
content: systemPrompt,
},
{
role: "user",
content: "My monitor isn't turning on.",
},
{
role: "assistant",
content: "Let's try to troubleshoot. Is the monitor properly plugged into a power source?",
},
{
role: 'user',
content: "Yes, it's plugged in."
}
],
model: "openai/gpt-oss-20b",
});
console.log(completion.choices[0]?.message?.content);
---
## Prompting: Roles (py)
URL: https://console.groq.com/docs/prompting/scripts/roles.py
```python
from groq import Groq
client = Groq()
system_prompt = """
You are a helpful IT support chatbot for 'Tech Solutions'.
Your role is to assist employees with common IT issues, provide guidance on using company software, and help troubleshoot basic technical problems.
Respond clearly and patiently. If an issue is complex, explain that you will create a support ticket for a human technician.
Keep responses brief and ask a maximum of one question at a time.
"""
chat_completion = client.chat.completions.create(
messages=[
{
"role": "system",
"content": system_prompt,
},
{
"role": "user",
"content": "My monitor isn't turning on.",
},
{
"role": "assistant",
"content": "Let's try to troubleshoot. Is the monitor properly plugged into a power source?",
},
{
"role": "user",
"content": "Yes, it's plugged in."
}
],
model="llama-3.3-70b-versatile",
)
print(chat_completion.choices[0].message.content)
```
---
## Some creativity allowed
URL: https://console.groq.com/docs/prompting/scripts/seed.py
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{ "role": "system", "content": "You are a creative storyteller." },
{ "role": "user", "content": "Write a brief opening line to a mystery novel." }
],
model="llama-3.1-8b-instant",
temperature=0.8, # Some creativity allowed
seed=700, # Deterministic seed
max_tokens=100
)
print(chat_completion.choices[0].message.content)
---
## Prompt Engineering Patterns Guide
URL: https://console.groq.com/docs/prompting/patterns
# Prompt Engineering Patterns Guide
This guide provides a systematic approach to selecting appropriate prompt patterns for various tasks when working with open-source language models. Implementing the correct pattern significantly improves output reliability and performance.
## Why Patterns Matter
Prompt patterns serve distinct purposes in language model interactions:
- **Zero shot** provides instructions without examples, relying on the model's existing knowledge.
- **Few shot** demonstrates specific examples for the model to follow as templates.
- **Chain of Thought** breaks complex reasoning into sequential steps for methodical problem-solving.
Selecting the appropriate pattern significantly improves output accuracy, consistency, and reliability across applications.
## Pattern Chooser Table
The table below helps you quickly identify the most effective prompt pattern for your specific task, matching common use cases with optimal approaches to maximize model performance.
| Task Type | Recommended Pattern | Why it works |
| --- | --- | --- |
| Simple Q&A, definitions | [**Zero shot**](#zero-shot) | Model already knows; instructions suffice |
| Extraction / classification | [**Few shot (1-3 samples)**](#one-shot--few-shot) | Teaches exact labels & JSON keys |
| Creative writing | [**Zero shot + role**](#zero-shot) | Freedom + persona = coherent style |
| Multi-step math / logic | [**Chain of Thought**](#chain-of-thought) | Forces stepwise reasoning |
| Edge-case heavy tasks | [**Few shot (2-5 samples)**](#one-shot--few-shot) | Covers exceptions & rare labels |
| Mission-critical accuracy | [**Guided CoT + Self Consistency**](#guided-cot--self-consistency) | Multiple reasoned paths to a consensus |
| Tool-use / knowledge-heavy tasks | [**ReAct (Reasoning + Acting)**](#react-reasoning-and-acting) | Thinks, calls tools, repeats for grounded solutions. |
| Concise yet comprehensive summarization | [**Chain of Density (CoD)**](#chain-of-density-cod) | Stepwise compression keeps essentials, cuts fluff. |
| Accuracy-critical facts | [**Chain of Verification (CoVe)**](#chain-of-verification-cove) | Asks and answers its own checks, then fixes. |
## Customer Support Ticket Processing Use Case
Throughout this guide, we'll use the practical example of automating customer support ticket processing. This enterprise-relevant use case demonstrates how different prompt patterns can improve:
- Initial ticket triage and categorization
- Issue urgency assessment
- Information extraction from customer communications
- Resolution suggestions and draft responses
- Ticket summarization for team handoffs
Using AI to enhance support ticket processing can reduce agent workload, accelerate response times, ensure consistent handling, and enable better tracking of common issues. Each prompt pattern offers distinct advantages for specific aspects of the support workflow.
## Zero Shot
Zero shot prompting tells a large-language model **exactly what you want without supplying a single demonstration**. The model leans on the general-purpose knowledge it absorbed during pre-training to infer the right output. You provide instructions but no examples, allowing the model to apply its existing understanding to the task.
### When to use
| Use case | Why Zero Shot works |
| --- | --- |
| **Sentiment classification** | Model has seen millions of examples during training; instructions suffice |
| **Basic information extraction** (e.g., support ticket triage) | Simple extraction of explicit data points requires minimal guidance |
| **Urgent support ticket assessment** | Clear indicators of urgency are typically explicit in customer language |
| **Standard content formatting** | Straightforward style adjustments like formalization or simplification |
| **Language translation** | Well-established task with clear inputs and outputs |
| **Content outlines and summaries** | Follows common structural patterns; benefits from brevity |
### Support Ticket Zero Shot Example
This example demonstrates using zero shot prompting to quickly analyze a customer support ticket for essential information.
## One Shot & Few Shot
A **one shot prompt** includes exactly one worked example; a **few shot prompt** provides several (typically3-8) examples. Both rely on the model's in-context learning to imitate the demonstrated input to output mapping. Because the demonstrations live in the prompt, you get the benefits of "training" without fine-tuning: you can swap tasks or tweak formats instantly by editing examples.
### When to use
| Use case | Why One/Few Shot works |
| --- | --- |
| **Structured output (JSON, SQL, XML)** | Examples nail the exact keys, quoting, or delimiters you need |
| **Support ticket categorization** with nuanced or custom labels | A few examples teach proper categorization schemes specific to your organization |
| **Domain-specific extraction** from technical support tickets | Demonstrations anchor the terminology and extraction patterns |
| **Edge-case handling** for unusual tickets | Show examples of tricky inputs to teach disambiguation strategies |
| **Consistent formatting** of support responses | Examples ensure adherence to company communication standards |
| **Custom urgency criteria** based on business rules | Examples demonstrate how to apply organization-specific Service Level Agreement (SLA) definitions |
### Support Ticket Few Shot Example
This example demonstrates using few shot prompting to extract detailed, structured information from support tickets according to a specific schema.
## Chain of Thought
Chain of Thought (CoT) is a prompt engineering technique that explicitly instructs the model to think through a problem step-by-step before producing the answer. In its simplest form you add a phrase like **"Let's think step by step."** This cue triggers the model to emit a sequence of reasoning statements (the "chain") followed by a conclusion. Zero shot CoT works effectively on arithmetic and commonsense questions, while few shot CoT supplies handcrafted exemplars for more complex domains.
### When to use
| Problem type | Why CoT helps |
| --- | --- |
| **Math & logic word problems** | Forces explicit arithmetic steps |
| **Multi-hop Q&A / retrieval** | Encourages sequential evidence gathering |
| **Complex support ticket analysis** | Breaks down issue diagnosis into logical components |
| **Content plans & outlines** | Structures longform content creation |
| **Policy / safety analysis** | Documents each step of reasoning for transparency |
| **Ticket priority determination** | Systematically assesses impact, urgency, and SLA considerations |
### Support Ticket Chain of Thought Example
This example demonstrates using CoT to systematically analyze a customer support ticket to extract detailed information and make reasoned judgments about the issue.
## Guided CoT & Self Consistency
Guided CoT provides a structured outline of reasoning steps for the model to follow. Rather than letting the model determine its own reasoning path, you explicitly define the analytical framework.
Self-Consistency replaces standard decoding in CoT with a sample-and-majority-vote strategy: the same CoT prompt is run multiple times with a higher temperature, the answer from each chain is extracted, then the most common answer is returned as the final result.
### When to use
| Use case | Why it works |
| --- | --- |
| **Support ticket categorization** with complex business rules | Guided CoT ensures consistent application of classification criteria |
| **SLA breach determination** with multiple factors | Self-Consistency reduces calculation errors in deadline computations |
| **Risk assessment** of customer issues | Multiple reasoning paths help identify edge cases in potential impact analysis |
| **Customer sentiment analysis** in ambiguous situations | Consensus across multiple paths provides more reliable interpretation |
| **Root cause analysis** for technical issues | Guided steps ensure thorough investigation across all system components |
| **Draft response generation** for sensitive issues | Self-Consistency helps avoid inappropriate or inadequate responses |
## ReAct (Reasoning and Acting)
ReAct (Reasoning and Acting) is a prompt pattern that instructs an LLM to generate two interleaved streams:
1. **Thought / reasoning trace** - natural-language reflection on the current state
2. **Action** - a structured command that an external tool executes (e.g., `Search[query]`, `Calculator[expression]`, or `Call_API[args]`) followed by the tool's observation
Because the model can observe the tool's response and continue thinking, it forms a closed feedback loop. The model assesses the situation, takes an action to gather information, processes the results, and repeats if necessary.
### When to use
| Use case | Why ReAct works |
| --- | --- |
| **Support ticket triage requiring contextual knowledge** | Enables lookup of error codes, known issues, and solutions |
| **Ticket analysis needing real-time status checks** | Can verify current system status and outage information |
| **SLA calculation and breach determination** | Performs precise time calculations with Python execution |
| **Customer history-enriched responses** | Retrieves customer context from knowledge bases or documentation |
| **Technical troubleshooting with diagnostic tools** | Runs diagnostic scripts and interprets results |
| **Product-specific error resolution** | Searches documentation and knowledge bases for specific error codes |
## Chain of Verification (CoVe)
Chain of Verification (CoVe) prompting turns the model into its own fact-checker. It follows a four-phase process: first writing a draft analysis, then planning targeted verification questions, answering those questions independently to avoid bias, and finally producing a revised, "verified" response. This technique can reduce error rates significantly across knowledge-heavy tasks while adding only one extra round-trip latency.
### When to use
| Use case | Why CoVe works |
| --- | --- |
| **Support ticket categorization auditing** | Verifies proper categorization through targeted questions |
| **SLA calculation verification** | Double-checks time calculations and policy interpretation |
| **Technical troubleshooting validation** | Confirms logical connections between symptoms and causes |
| **Customer response quality assurance** | Ensures completeness and accuracy of draft responses |
| **Incident impact assessment** | Validates estimates of business impact through specific questions |
| **Error code interpretation** | Cross-checks error code explanations against known documentation |
## Chain of Density (CoD)
Chain of Density (CoD) is an iterative summarization technique that begins with a deliberately entity-sparse draft and progressively adds key entities while maintaining a fixed length. In each round, the model identifies1-3 new entities it hasn't mentioned, then rewrites the summary: compressing existing text to make room for them. After several iterations, the summary achieves a higher entity-per-token density, reducing lead bias and often matching or exceeding human summaries in informativeness.
### When to use
| Use case | Why CoD works |
| --- | --- |
| **Support ticket executive summaries** | Creates highly informative briefs within strict length limits |
| **Agent handover notes** | Ensures all critical details are captured in a concise format |
| **Knowledge base entry creation** | Progressively incorporates technical details without increasing length |
| **Customer communication summaries** | Balances completeness with brevity for customer record notes |
| **SLA/escalation notifications** | Packs essential details into notification character limits |
| **Support team daily digests** | Summarizes multiple tickets with key details for management review |
---
## Model Migration Guide
URL: https://console.groq.com/docs/prompting/model-migration
# Model Migration Guide
Migrating prompts from commercial models (GPT, Claude, Gemini) to open-source ones like Llama often requires explicitly including instructions that might have been implicitly handled in proprietary systems. This migration typically involves adjusting prompting techniques to be more explicit, matching generation parameters, and testing outputs to help with iteratively adjust prompts until the desired outputs are reached.
## Migration Principles
1. **Surface hidden rules:** Proprietary model providers prepend their closed-source models with system messages that are not explicitly shared with the end user; you must create clear system messages to get consistent outputs.
2. **Start from parity, not aspiration:** Match parameters such as temperature, Top P, and max tokens first, then focus on adjusting your prompts.
3. **Automate the feedback loop:** We recommend using open-source tooling like prompt optimizers instead of manual trial-and-error.
## Aligning System Behavior and Tone
Closed-source models are often prepended with elaborate system prompts that enforce politeness, hedging, legal disclaimers, policies, and more, that are not shown to the end user. To ensure consistency and lead open-source models to generate desired outputs, create a comprehensive system prompt.
## Sampling / Parameter Parity
No matter which model you're migrating from, having explicit control over temperature and other sampling parameters matters a lot. First, determine what temperature your source model defaults to (often 1.0). Then experiment to find what works best for your specific use case - many Llama deployments see better results with temperatures between 0.2-0.4. The key is to start with parity, measure the results, then adjust deliberately:
| Parameter | Closed-Source Models | Llama Models | Suggested Adjustments |
| --- | --- | --- | --- |
| `temperature` | 1.0 | 0.7 | Lower for factual answers and strict schema adherence (e.g., JSON) |
| `top_p` | 1.0 | 1.0 | leave 1.0 |
## Refactoring Prompts
In some cases, you'll need to refactor your prompts to use explicit [Prompt Patterns](/docs/prompting/patterns) since different models have varying pre- and post-training that can affect how they function. For example:
- Some models, such as [those that can reason](/docs/reasoning), might naturally break down complex problems, while others may need explicit instructions to "think step by step" using [Chain of Thought](/docs/prompting/patterns#chain-of-thought) prompting
- Where some models automatically verify facts, others might need [Chain of Verification](/docs/prompting/patterns#chain-of-verification-cove) to achieve similar accuracy
- When certain models explore multiple solution paths by default, you can achieve similar results with [Self-Consistency](/docs/prompting/patterns#self-consistency) voting across multiple completions
The key is being more explicit about the reasoning process you want. Instead of:
"Calculate the compound interest over 5 years"
Use:
"Let's solve this step by step:
1. First, write out the compound interest formula
2. Then, plug in our values
3. Calculate each year's interest separately
4. Sum the total and verify the math"
This explicit guidance helps open models match the sophisticated reasoning that closed models learn through additional training.
### Migrating from Claude (Anthropic)
Claude models from Anthropic are known for their conversational abilities, safety features, and detailed reasoning. When migrating from Claude to an open-source model like Llama, creating a system prompt with the following instructions to maintain similar behavior:
| Instruction | Description |
| --- | --- |
| Set a clear persona | "I am a helpful, multilingual, and proactive assistant ready to guide this conversation." |
| Specify tone & style | "Be concise and warm. Avoid bullet or numbered lists unless explicitly requested." |
| Limit follow-up questions | "Ask at most one concise clarifying question when needed." |
| Embed reasoning directive | "For tasks that need analysis, think step-by-step in a Thought: section, then provide Answer: only." |
| Insert counting rule | "Enumerate each item with #1, #2 ... before giving totals." |
| Provide a brief accuracy notice | "Information on niche or very recent topics may be incomplete—verify externally." |
| Define refusal template | "If a request breaches guidelines, reply: 'I'm sorry, but I can't help with that.'" |
| Mirror user language | "Respond in the same language the user uses." |
| Reinforce empathy | "Express sympathy when the user shares difficulties; maintain a supportive tone." |
| Control token budget | Keep the final system block under 2,000 tokens to preserve user context. |
| Web search | Use [Agentic Tooling](/docs/agentic-tooling) for built-in web search. |
### Migrating from Grok (xAI)
Grok models from xAI are known for their conversational abilities, real-time knowledge, and engaging personality. When migrating from Grok to an open-source model like Llama, creating a system prompt with the following instructions to maintain similar behavior:
| Instruction | Description |
| --- | --- |
| Language parity | "Detect the user's language and respond in the same language." |
| Structured style | "Write in short paragraphs; use numbered or bulleted lists for multiple points." |
| Formatting guard | "Do not output Markdown (or only the Markdown elements you permit)." |
| Length ceiling | "Keep the answer below 750 characters" and enforce `max_completion_tokens` in the API call. |
| Epistemic stance | "Adopt a neutral, evidence-seeking tone; challenge unsupported claims; express uncertainty when facts are unclear." |
| Draft-versus-belief rule | "Treat any supplied analysis text as provisional research, not as established fact." |
| No meta-references | "Do not mention the question, system instructions, tool names, or platform branding in the reply." |
| Real-time knowledge | Use [Agentic Tooling](/docs/agentic-tooling) for built-in web search. |
### Migrating from OpenAI
OpenAI models like GPT-4o are known for their versatility, tool use capabilities, and conversational style. When migrating from OpenAI models to open-source alternatives like Llama, include these key instructions in your system prompt:
| Instruction | Description |
| --- | --- |
| Define a flexible persona | "I am a helpful, adaptive assistant that mirrors your tone and formality throughout our conversation." |
| Add tone-mirroring guidance | "I will adjust my vocabulary, sentence length, and formality to match your style throughout our conversation." |
| Set follow-up-question policy | "When clarification is useful, I'll ask exactly one short follow-up question; otherwise, I'll answer directly." |
| Describe tool-usage rules (if using tools) | "I can use tools like search and code execution when needed, preferring search for factual queries and code execution for computational tasks." |
| State visual-aid preference | "I'll offer diagrams when they enhance understanding" |
| Limit probing | "I won't ask for confirmation after every step unless instructions are ambiguous." |
| Embed safety | "My answers must respect local laws and organizational policies; I'll refuse prohibited content." |
| Web search | Use [Agentic Tooling](/docs/agentic-tooling) for built-in web search capabilities |
| Code execution | Use [Agentic Tooling](/docs/agentic-tooling) for built-in code execution capabilities. |
| Tool use | Select a model that supports [tool use](/docs/tool-use). |
### Migrating from Gemini (Google)
When migrating from Gemini to an open-source model like Llama, include these key instructions in your system prompt:
| Instruction | Description |
| --- | --- |
| State the role plainly | Start with one line: "You are a concise, professional assistant." |
| Re-encode rules | Convert every MUST/SHOULD from the original into numbered bullet rules, each should be 1 sentence. |
| Define [tool use](/docs/tool-use) | Add a short Tools section listing tool names and required JSON structure; provide one sample call. |
| Specify tone & length | Include explicit limits (e.g., "less than 150 words unless code is required; formal international English"). |
| Self-check footer | End with "Before sending, ensure JSON validity, correct tag usage, no system text leakage." |
| Content-block guidance | Define how rich output should be grouped: for example, Markdown headings for text, fenced blocks for code. |
| Behaviour checklist | Include numbered, one-sentence rules covering length limits, formatting, and answer structure. |
| Prefer brevity | Remind the model to keep explanations brief and omit library boilerplate unless explicitly requested. |
| Web search and grounding | Use [Agentic Tooling](/docs/agentic-tooling) for built-in web search and grounding capabilities.|
## Tooling: llama-prompt-ops
[**llama-prompt-ops**](https://github.com/meta-llama/llama-prompt-ops) auto-rewrites prompts created for GPT / Claude into Llama-optimized phrasing, adjusting spacing, quotes, and special tokens.
Why use it?
- **Drop-in CLI:** feed a JSONL file of prompts and expected responses; get a better prompt with improved success rates.
- **Regression mode:** runs your golden set and reports win/loss vs baseline.
Install once (`pip install llama-prompt-ops`) and run during CI to keep prompts tuned as models evolve.
---
## Prompt Basics
URL: https://console.groq.com/docs/prompting
# Prompt Basics
Prompting is the methodology through which we communicate instructions, parameters, and expectations to large language models. Consider a prompt as a detailed specification document provided to the model: the more precise and comprehensive the specifications, the higher the quality of the output. This guide establishes the fundamental principles for crafting effective prompts for open-source instruction-tuned models, including Llama, Deepseek, and Gemma.
## Why Prompts Matter
Large language models require clear direction to produce optimal results. Without precise instructions, they may produce inconsistent outputs. Well-structured prompts provide several benefits:
- **Reduce development time** by minimizing iterations needed for acceptable results.
- **Enhance output consistency** to ensure responses meet validation requirements without modification.
- **Optimize resource usage** by maintaining efficient context window utilization.
## Prompt Building Blocks
Most high-quality prompts contain five elements: **role, instructions, context, input, expected output**.
| Element | What it does |
| --- | --- |
| **Role** | Sets persona or expertise ("You are a data analyst…") |
| **Instructions** | Bullet-proof list of required actions |
| **Context** | Background knowledge or reference material |
| **Input** | The data or question to transform |
| **Expected Output** | Schema or miniature example to lock formatting |
### Real-world use case
Here's a real-world example demonstrating how these prompt building blocks work together to extract structured data from an email. Each element plays a crucial role in ensuring accurate, consistent output:
1. **System** - fixes the model's role so it doesn't add greetings or extra formatting.
2. **Instructions** - lists the exact keys; pairing this with [JSON mode](/docs/structured-outputs#json-object-mode) or [tool use](/docs/tool-use) further guarantees parseable output.
3. **Context** - gives domain hints ("Deliver to", postcode format) that raise extraction accuracy without extra examples.
4. **Input** - the raw e-mail; keep original line breaks so the model can latch onto visual cues.
5. **Example Output** - a miniature few-shot sample that locks the reply shape to one JSON object.
## Role Channels
Most chat-style APIs expose **three channels**:
| Channel | Typical Use |
| --- | --- |
| `system` | High-level persona & non-negotiable rules ("You are a helpful financial assistant."). |
| `user` | The actual request or data, such as a user's message in a chat. |
| `assistant` | The model's response. In multi-turn conversations, the assistant role can be used to track the conversation history. |
The following example demonstrates how to implement a customer service chatbot using role channels. Role channels provide a structured way for the model to maintain context and generate contextually appropriate responses throughout the conversation.
## Prompt Priming
Prompt priming is the practice of giving the model an **initial block of instructions or context** that influences every downstream token the model generates. Think of it as "setting the temperature of the conversation room" before anyone walks in. This usually lives in the **system** message; in single-shot prompts it's the first paragraph you write. Unlike one- or few-shot demos, priming does not need examples; the power comes from describing roles ("You are a medical billing expert"), constraints ("never reveal PII"), or seed knowledge ("assume the user's database is Postgres16").
### Why it Works
Large language models generate text by conditioning on **all previous tokens**, weighting earlier tokens more heavily than later ones. By positioning high-leverage tokens (role, style, rules) first, priming biases the probability distribution over next tokens toward answers that respect that frame.
### Example (Primed Chat)
## Core Principles
1. **Lead with the must-do.** Put critical instructions first; the model weighs early tokens more heavily.
2. **Show, don't tell.** A one-line schema or table example beats a paragraph of prose.
3. **State limits explicitly.** Use "Return **only** JSON" or "less than75 words" to eliminate chatter.
4. **Use plain verbs.** "Summarize in one bullet per metric" is clearer than "analyze."
5. **Chunk long inputs.** Delimit data with ``` or \<\<\< … \>\>\> so the model sees clear boundaries.
## Context Budgeting
While many models can handle up to **128K** tokens (or more), using a longer system prompt still costs latency and money. While you might be able to fit a lot of information in the model's context window, it could increase latency and reduce the model's accuracy. As a best practice, only include what is needed for the model to generate the desired response in the context.
## Quick Prompting Wins
Try these **10-second tweaks** before adding examples or complex logic:
| Quick Fix | Outcome |
| --- | --- |
| Add a one-line persona (*"You are a veteran copy editor."*) | Sharper, domain-aware tone |
| Show a mini output sample (one-row table / tiny JSON) | Increased formatting accuracy |
| Use numbered steps in instructions | Reduces answers with extended rambling |
| Add "no extra prose" at the end | Stops model from adding greetings or apologies |
## Common Mistakes to Avoid
Review these recommended practices and solutions to avoid common prompting issues.
| Common Mistake | Result | Solution |
| --- | --- | --- |
| **Hidden ask** buried mid-paragraph | Model ignores it | Move all instructions to top bullet list |
| **Over-stuffed context** | Truncated or slow responses | Summarize, remove old examples |
| **Ambiguous verbs** (*"analyze"*) | Vague output | Be explicit (*"Summarize in one bullet per metric"*) |
| **Partial JSON keys** in sample | Model Hallucinates extra keys | Show the **full** schema: even if brief |
## Parameter Tuning
Optimize model outputs by configuring key parameters like temperature and top-p. These settings control the balance between deterministic and creative responses, with recommended values based on your specific use case.
| Parameter | What it does | Safe ranges | Typical use |
| --- | --- | --- | --- |
| **Temperature** | Global randomness (higher = more creative) |0 -1.0 |0 -0.3 facts,0.7 -0.9 creative |
| **Top-p** | Keeps only the top p cumulative probability mass - use this or temperature, not both |0.5 -1.0 |0.9 facts,1.0 creative |
| **Top-k** | Limits to the k highest-probability tokens |20 -100 | Rarely needed; try k =40 for deterministic extraction |
### Quick presets
The following are recommended values to set temperature or top-p to (but not both) for various use cases:
| Scenario | Temp | Top-p | Comments |
| --- | --- | --- | --- |
| Factual Q&A |0.2 |0.9 | Keeps dates & numbers stable |
| Data extraction (JSON) |0.0 |0.9 | Deterministic keys/values |
| Creative copywriting |0.8 |1.0 | Vivid language, fresh ideas |
| Brainstorming list |0.7 |0.95 | Variety without nonsense |
| Long-form code |0.3 |0.85 | Fewer hallucinated APIs |
## Controlling Length & Cost
The following are recommended settings for controlling token usage and costs with length limits, stop sequences, and deterministic outputs.
| Setting | Purpose | Tip |
| --- | --- | --- |
| `max_completion_tokens` | Hard cap on completion size | Set10-20 % above ideal answer length |
| Stop sequences | Early stop when model hits token(s) | Use `"###"` or another delimiter |
| System length hints | "less than75 words" or "return only table rows" | Model respects explicit numbers |
| `seed` | Controls randomness deterministically | Use same seed for consistent outputs across runs |
## Guardrails & Safety
Good prompts set the rules; dedicated guardrail models enforce them. [Meta's **Llama Guard4**](/docs/content-moderation) is designed to sit in front of: or behind: your main model, classifying prompts or outputs for safety violations (hate, self-harm, private data). Integrating a moderation step can cut violation rates without changing your core prompt structure.
## Next Steps
Ready to level up? Explore dedicated [**prompt patterns**](/docs/prompting/patterns) like zero-shot, one-shot, few-shot, chain-of-thought, and more to match the pattern to your task complexity. From there, iterate and refine to improve your prompts.
---
## Google Cloud Private Service Connect
URL: https://console.groq.com/docs/security/gcp-private-service-connect
## Google Cloud Private Service Connect
Private Service Connect (PSC) enables you to access Groq's API services through private network connections, eliminating exposure to the public internet. This guide explains how to set up Private Service Connect for secure access to Groq services.
### Overview
Groq exposes its API endpoints in Google Cloud Platform as PSC _published services_. By configuring PSC endpoints, you can:
- Access Groq services through private IP addresses within your VPC
- Eliminate public internet exposure
- Maintain strict network security controls
- Minimize latency
- Reduce data transfer costs
```ascii
Your VPC Network Google Cloud PSC Groq Network
+------------------+ +------------------+ +------------------+
| | | | | |
| +-----------+ | | | | +-----------+ |
| | | | Private | Service | Internal | | Groq | |
| | Your | |10.0.0.x | | | | API | |
| | App +---+--> IP <---+---> Connect <----+--> LB <---+---+ Service | |
| | | | | | | | | |
| +-----------+ | | | | +-----------+ |
| | | | | |
| DNS Resolution | | | | |
| api.groq.com | | | | |
| ->10.0.0.x | | | | |
| | | | | |
+------------------+ +------------------+ +------------------+
```
### Prerequisites
- A Google Cloud project with [Private Service Connect enabled](https://cloud.google.com/vpc/docs/configure-private-service-connect-consumer)
- VPC network where you want to create the PSC endpoint
- Appropriate IAM permissions to create PSC endpoints and DNS zones
- Enterprise plan with Groq
- Provided Groq with your GCP Project ID
- Groq has accepted your GCP Project ID to our Private Service Connect
### Setup
The steps below use us as an example. Make sure you configure your system
according to the region(s) you want to use.
####1. Connect an endpoint
1. Navigate to **Network services** > **Private Service Connect** in your Google Cloud Console
2. Go to the **Endpoints** section and click **Connect endpoint**
* Under **Target**, select _Published service_
* For **Target service**, enter a [published service](#published-services) target name.
* For **Endpoint name**, enter a descriptive name (e.g., `groq-api-psc`)
* Select your desired **Network** and **Subnetwork**
* For **IP address**, create and select an internal IP from your subnet
* Enable **Global access** if you need to connect from multiple regions
3. Click **Add endpoint** and verify the status shows as _Accepted_
####2. Configure Private DNS
1. Go to **Network services** > **Cloud DNS** in your Google Cloud Console
2. Create the first zone for groq.com:
* Click **Create zone**
* Set **Zone type** to _Private_
* Enter a descriptive **Zone name** (e.g., `groq-api-private`)
* For **DNS name**, enter `groq.com.`
* Create an `A` record:
* **DNS name**: `api`
* **Resource record type**: `A`
* Enter your PSC endpoint IP address
* Link the private zone to your VPC network
3. Create the second zone for groqcloud.com:
* Click **Create zone**
* Set **Zone type** to _Private_
* Enter a descriptive **Zone name** (e.g., `groqcloud-api-private`)
* For **DNS name**, enter `groqcloud.com.`
* Create an `A` record:
* **DNS name**: `api.us`
* **Resource record type**: `A`
* Enter your PSC endpoint IP address
* Link the private zone to your VPC network
####3. Validate the Connection
To verify your setup:
1. SSH into a VM in your VPC network
2. Test DNS resolution for both endpoints:
```bash
dig +short api.groq.com
dig +short api.us.groqcloud.com
```
Both should return your PSC endpoint IP address
3. Test API connectivity (using either endpoint):
```bash
curl -i https://api.groq.com
# or
curl -i https://api.us.groqcloud.com
```
Should return a successful response through your private connection
### Published Services
| Service | PSC Target Name | Private DNS Names |
|---------|----------------|-------------------|
| API | projects/groq-pe/regions/me-central2/serviceAttachments/groqcloud | api.groq.com, api.me-central-1.groqcloud.com |
| API | projects/groq-pe/regions/us-central1/serviceAttachments/groqcloud | api.groq.com, api.us.groqcloud.com |
### Troubleshooting
If you encounter connectivity issues:
1. Verify DNS resolution is working correctly for both domains
2. Check that your security groups and firewall rules allow traffic to the PSC endpoint
3. Ensure your service account has the necessary permissions
4. Verify the PSC endpoint status is _Accepted_
5. Confirm the model you are requesting is operating in the target region
### Alerting
To monitor and alert on an unexpected change in connectivity status for the PSC endpoint, use a [Google Cloud log-based alerting policy](https://cloud.google.com/logging/docs/alerting/log-based-alerts).
Below is an example of an alert policy that will alert the given notification channel in the case of a connection being _Closed_. This will require manual intervention to reconnect the endpoint.
```hcl
resource "google_monitoring_alert_policy" "groq_psc" {
display_name = "Groq - Private Service Connect"
combiner = "OR"
conditions {
display_name = "Connection Closed"
condition_matched_log {
filter = <<-EOF
resource.type="gce_forwarding_rule"
protoPayload.methodName="LogPscConnectionStatusUpdate"
protoPayload.metadata.pscConnectionStatus="CLOSED"
EOF
}
}
notification_channels = [google_monitoring_notification_channel.my_alert_channel.id]
severity = "CRITICAL"
alert_strategy {
notification_prompts = ["OPENED"]
notification_rate_limit {
period = "600s"
}
}
documentation {
mime_type = "text/markdown"
subject = "Groq forwarding rule was unexpectedly closed"
content = <<-EOF
Forwarding rule $${resource.label.forwarding_rule_id} was unexpectedly closed. Please contact Groq Support (support@groq.com) for remediation steps.
- **Project**: $${project}
- **Alert Policy**: $${policy.display_name}
- **Condition**: $${condition.display_name}
EOF
links {
display_name = "Dashboard"
url = "https://console.cloud.google.com/net-services/psc/list/consumers?project=${var.project_id}"
}
}
}
```
### Further Reading
- [Google Cloud Private Service Connect Documentation](https://cloud.google.com/vpc/docs/private-service-connect)
---
## Integrations: Button Group (tsx)
URL: https://console.groq.com/docs/integrations/button-group
## Button Group
The button group component is used to display a collection of buttons in a grid layout. It accepts an array of button objects, each with properties for title, description, href, iconSrc, iconDarkSrc, and color.
### Button Group Properties
* **buttons**: An array of objects, each representing a button with the following properties:
+ **title**: The title of the button.
+ **description**: A brief description of the button.
+ **href**: The link URL for the button.
+ **iconSrc**: The URL of the button's icon.
+ **iconDarkSrc**: The URL of the button's dark mode icon (optional).
+ **color**: The color of the button (optional).
### Usage
To use the button group component, simply pass an array of button objects to it.
## Example
```markdown
## Button Group
```
## Button Group
The button group component is used to display a collection of buttons in a grid layout. It accepts an array of button objects, each with properties for title, description, href, iconSrc, iconDarkSrc, and color.
### Button Group Properties
* **buttons**: An array of objects, each representing a button with the following properties:
+ **title**: The title of the button.
+ **description**: A brief description of the button.
+ **href**: The link URL for the button.
+ **iconSrc**: The URL of the button's icon.
+ **iconDarkSrc**: The URL of the button's dark mode icon (optional).
+ **color**: The color of the button (optional).
### Usage
To use the button group component, simply pass an array of button objects to it.
---
## Integrations: Integration Buttons (ts)
URL: https://console.groq.com/docs/integrations/integration-buttons
import type { IntegrationButton } from "./button-group";
type IntegrationGroup =
| "ai-agent-frameworks"
| "browser-automation"
| "llm-app-development"
| "observability"
| "llm-code-execution"
| "ui-and-ux"
| "tool-management"
| "real-time-voice";
export const integrationButtons: Record =
{
"ai-agent-frameworks": [
{
title: "Agno",
description:
"Agno is a lightweight library for building Agents with memory, knowledge, tools and reasoning.",
href: "/docs/agno",
iconSrc: "/integrations/agno_black.svg",
iconDarkSrc: "/integrations/agno_white.svg",
color: "gray",
},
{
title: "AutoGen",
description:
"AutoGen is a framework for building conversational AI systems that can operate autonomously or collaborate with humans and other agents.",
href: "/docs/autogen",
iconSrc: "/integrations/autogen.svg",
color: "gray",
},
{
title: "CrewAI",
description:
"CrewAI is a framework for orchestrating role-playing AI agents that work together to accomplish complex tasks.",
href: "/docs/crewai",
iconSrc: "/integrations/crewai.png",
color: "gray",
},
{
title: "xRx",
description:
"xRx is a reactive AI agent framework for building reliable and observable LLM agents with real-time feedback.",
href: "/docs/xrx",
iconSrc: "/integrations/xrx.png",
color: "gray",
},
],
"browser-automation": [
{
title: "Anchor Browser",
description:
"Anchor Browser is a browser automation platform that allows you to automate workflows for web applications that lack APIs or have limited API coverage.",
href: "/docs/anchorbrowser",
iconSrc: "/integrations/anchorbrowser.png",
color: "gray",
},
],
"llm-app-development": [
{
title: "LangChain",
description:
"LangChain is a framework for developing applications powered by language models through composability.",
href: "/docs/langchain",
iconSrc: "/integrations/langchain_black.png",
iconDarkSrc: "/integrations/langchain_white.png",
color: "gray",
},
{
title: "LlamaIndex",
description:
"LlamaIndex is a data framework for building LLM applications with context augmentation over external data.",
href: "/docs/llama-index",
iconSrc: "/integrations/llamaindex_black.png",
iconDarkSrc: "/integrations/llamaindex_white.png",
color: "gray",
},
{
title: "LiteLLM",
description:
"LiteLLM is a library that standardizes LLM API calls and provides robust tracking, fallbacks, and observability for LLM applications.",
href: "/docs/litellm",
iconSrc: "/integrations/litellm.png",
color: "gray",
},
{
title: "Vercel AI SDK",
description:
"Vercel AI SDK is a typescript library for building AI-powered applications in modern frontend frameworks.",
href: "/docs/ai-sdk",
iconSrc: "/vercel-integration.png",
color: "gray",
},
],
observability: [
{
title: "Arize",
description:
"Arize is an observability platform for monitoring, troubleshooting, and explaining LLM applications.",
href: "/docs/arize",
iconSrc: "/integrations/arize_phoenix.png",
color: "gray",
},
{
title: "MLflow",
description:
"MLflow is an open-source platform for managing the end-to-end machine learning lifecycle, including experiment tracking and model deployment.",
href: "/docs/mlflow",
iconSrc: "/integrations/mlflow-white.svg",
iconDarkSrc: "/integrations/mlflow-black.svg",
color: "gray",
},
],
"llm-code-execution": [
{
title: "E2B",
description:
"E2B provides secure sandboxed environments for LLMs to execute code and use tools in a controlled manner.",
href: "/docs/e2b",
iconSrc: "/integrations/e2b_black.png",
iconDarkSrc: "/integrations/e2b_white.png",
color: "gray",
},
],
"ui-and-ux": [
{
title: "FlutterFlow",
description:
"FlutterFlow is a visual development platform for building high-quality, custom, cross-platform apps with AI capabilities.",
href: "/docs/flutterflow",
iconSrc: "/integrations/flutterflow_black.png",
iconDarkSrc: "/integrations/flutterflow_white.png",
color: "gray",
},
{
title: "Gradio",
description:
"Gradio is a Python library for quickly creating customizable UI components for machine learning models and LLM applications.",
href: "/docs/gradio",
iconSrc: "/integrations/gradio.svg",
color: "gray",
},
],
"tool-management": [
{
title: "Composio",
description:
"Composio is a platform for managing and integrating tools with LLMs and AI agents for seamless interaction with external applications.",
href: "/docs/composio",
iconSrc: "/integrations/composio_black.png",
iconDarkSrc: "/integrations/composio_white.png",
color: "gray",
},
{
title: "JigsawStack",
description:
"JigsawStack is a powerful AI SDK that integrates into any backend, automating tasks using LLMs with features like Mixture-of-Agents approach.",
href: "/docs/jigsawstack",
iconSrc: "/integrations/jigsaw.svg",
color: "gray",
},
{
title: "Toolhouse",
description:
"Toolhouse is a tool management platform that helps developers organize, secure, and scale tool usage across AI agents.",
href: "/docs/toolhouse",
iconSrc: "/integrations/toolhouse.svg",
color: "gray",
},
],
"real-time-voice": [
{
title: "LiveKit",
description:
"LiveKit provides text-to-speech and real-time communication features that complement Groq's speech recognition for end-to-end AI voice applications.",
href: "/docs/livekit",
iconSrc: "/integrations/livekit_white.svg",
color: "gray",
},
],
};
---
## What are integrations?
URL: https://console.groq.com/docs/integrations
# What are integrations?
Integrations are a way to connect your application to external services and enhance your Groq-powered applications with additional capabilities.
Browse the categories below to find integrations that suit your needs.
## AI Agent Frameworks
Create autonomous AI agents that can perform complex tasks, reason, and collaborate effectively using Groq's fast inference capabilities.
## Browser Automation
Automate browser interactions and perform complex tasks and transform any browser-based task in to an API endpoint instantly with models via Groq.
## LLM App Development
Build powerful LLM applications with these frameworks and libraries that provide essential tools for working with Groq models.
## Observability and Monitoring
Track, analyze, and optimize your LLM applications with these integrations that provide insights into model performance and behavior.
## LLM Code Execution and Sandboxing
Enable secure code execution in controlled environments for your AI applications with these integrations.
## UI and UX
Create beautiful and responsive user interfaces for your Groq-powered applications with these UI frameworks and tools.
## Tool Management
Manage and orchestrate tools for your AI agents, enabling them to interact with external services and perform complex tasks.
## Real-time Voice
Build voice-enabled applications that leverage Groq's fast inference for natural and responsive conversations.
---
## Quickstart: Performing Chat Completion (py)
URL: https://console.groq.com/docs/quickstart/scripts/performing-chat-completion.py
```python
import os
from groq import Groq
client = Groq(
api_key=os.environ.get("GROQ_API_KEY"),
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain the importance of fast language models",
}
],
model="llama-3.3-70b-versatile",
)
print(chat_completion.choices[0].message.content)
```
---
## Quickstart: Performing Chat Completion (js)
URL: https://console.groq.com/docs/quickstart/scripts/performing-chat-completion
import Groq from "groq-sdk";
const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
export async function main() {
const chatCompletion = await getGroqChatCompletion();
// Print the completion returned by the LLM.
console.log(chatCompletion.choices[0]?.message?.content || "");
}
export async function getGroqChatCompletion() {
return groq.chat.completions.create({
messages: [
{
role: "user",
content: "Explain the importance of fast language models",
},
],
model: "openai/gpt-oss-20b",
});
}
---
## Quickstart: Quickstart Ai Sdk (js)
URL: https://console.groq.com/docs/quickstart/scripts/quickstart-ai-sdk
import Groq from "groq-sdk";
const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
export async function main() {
const chatCompletion = await getGroqChatCompletion();
// Print the completion returned by the LLM.
console.log(chatCompletion.choices[0]?.message?.content || "");
}
export async function getGroqChatCompletion() {
return groq.chat.completions.create({
messages: [
{
role: "user",
content: "Explain the importance of fast language models",
},
],
model: "openai/gpt-oss-20b",
});
}
---
## Quickstart: Performing Chat Completion (json)
URL: https://console.groq.com/docs/quickstart/scripts/performing-chat-completion.json
{
"messages": [
{
"role": "user",
"content": "Explain the importance of fast language models"
}
],
"model": "llama-3.3-70b-versatile"
}
---
## Quickstart
URL: https://console.groq.com/docs/quickstart
# Quickstart
Get up and running with the Groq API in a few minutes.
## Create an API Key
Please visit [here](/keys) to create an API Key.
## Set up your API Key (recommended)
Configure your API key as an environment variable. This approach streamlines your API usage by eliminating the need to include your API key in each request. Moreover, it enhances security by minimizing the risk of inadvertently including your API key in your codebase.
### In your terminal of choice:
```shell
export GROQ_API_KEY=
```
## Requesting your first chat completion
### Execute this curl command in the terminal of your choice:
```shell
# Your curl command here
```
### Install the Groq JavaScript library:
```shell
# Your install command here
```
### Performing a Chat Completion:
```js
// Your JavaScript code here
```
### Install the Groq Python library:
```shell
# Your install command here
```
### Performing a Chat Completion:
```python
# Your Python code here
```
### Pass the following as the request body:
```json
// Your JSON data here
```
## Using third-party libraries and SDKs
### Using AI SDK:
[AI SDK](https://ai-sdk.dev/) is a Javascript-based open-source library that simplifies building large language model (LLM) applications. Documentation for how to use Groq on the AI SDK [can be found here](https://console.groq.com/docs/ai-sdk/).
First, install the `ai` package and the Groq provider `@ai-sdk/groq`:
```shell
pnpm add ai @ai-sdk/groq
```
Then, you can use the Groq provider to generate text. By default, the provider will look for `GROQ_API_KEY` as the API key.
```js
// Your JavaScript code here
```
### Using LiteLLM:
[LiteLLM](https://www.litellm.ai/) is both a Python-based open-source library, and a proxy/gateway server that simplifies building large language model (LLM) applications. Documentation for LiteLLM [can be found here](https://docs.litellm.ai/).
First, install the `litellm` package:
```python
pip install litellm
```
Then, set up your API key:
```python
export GROQ_API_KEY="your-groq-api-key"
```
Now you can easily use any model from Groq. Just set `model=groq/` as a prefix when sending litellm requests.
```python
# Your Python code here
```
### Using LangChain:
[LangChain](https://www.langchain.com/) is a framework for developing reliable agents and applications powered by large language models (LLMs). Documentation for LangChain [can be found here for Python](https://python.langchain.com/docs/introduction/), and [here for Javascript](https://js.langchain.com/docs/introduction/).
When using Python, first, install the `langchain` package:
```python
pip install langchain-groq
```
Then, set up your API key:
```python
export GROQ_API_KEY="your-groq-api-key"
```
Now you can build chains and agents that can perform multi-step tasks. This chain combines a prompt that tells the model what information to extract, a parser that ensures the output follows a specific JSON format, and llama-3.3-70b-versatile to do the actual text processing.
```python
# Your Python code here
```
Now that you have successfully received a chat completion, you can try out the other endpoints in the API.
### Next Steps
- Check out the [Playground](/playground) to try out the Groq API in your browser
- Join our GroqCloud [developer community](https://community.groq.com/)
- Add a how-to on your project to the [Groq API Cookbook](https://github.com/groq/groq-api-cookbook)
---
## Speech To Text: Translation (js)
URL: https://console.groq.com/docs/speech-to-text/scripts/translation
import fs from "fs";
import Groq from "groq-sdk";
// Initialize the Groq client
const groq = new Groq();
async function main() {
// Create a translation job
const translation = await groq.audio.translations.create({
file: fs.createReadStream("sample_audio.m4a"), // Required path to audio file - replace with your audio file!
model: "whisper-large-v3", // Required model to use for translation
prompt: "Specify context or spelling", // Optional
language: "en", // Optional ('en' only)
response_format: "json", // Optional
temperature:0.0, // Optional
});
// Log the transcribed text
console.log(translation.text);
}
main();
---
## Speech To Text: Transcription (js)
URL: https://console.groq.com/docs/speech-to-text/scripts/transcription
import fs from "fs";
import Groq from "groq-sdk";
// Initialize the Groq client
const groq = new Groq();
async function main() {
// Create a transcription job
const transcription = await groq.audio.transcriptions.create({
file: fs.createReadStream("YOUR_AUDIO.wav"), // Required path to audio file - replace with your audio file!
model: "whisper-large-v3-turbo", // Required model to use for transcription
prompt: "Specify context or spelling", // Optional
response_format: "verbose_json", // Optional
timestamp_granularities: ["word", "segment"], // Optional (must set response_format to "json" to use and can specify "word", "segment" (default), or both)
language: "en", // Optional
temperature:0.0, // Optional
});
// To print only the transcription text, you'd use console.log(transcription.text); (here we're printing the entire transcription object to access timestamps)
console.log(JSON.stringify(transcription, null,2));
}
main();
---
## Initialize the Groq client
URL: https://console.groq.com/docs/speech-to-text/scripts/transcription.py
```python
import os
import json
from groq import Groq
# Initialize the Groq client
client = Groq()
# Specify the path to the audio file
filename = os.path.dirname(__file__) + "/YOUR_AUDIO.wav" # Replace with your audio file!
# Open the audio file
with open(filename, "rb") as file:
# Create a transcription of the audio file
transcription = client.audio.transcriptions.create(
file=file, # Required audio file
model="whisper-large-v3-turbo", # Required model to use for transcription
prompt="Specify context or spelling", # Optional
response_format="verbose_json", # Optional
timestamp_granularities = ["word", "segment"], # Optional (must set response_format to "json" to use and can specify "word", "segment" (default), or both)
language="en", # Optional
temperature=0.0 # Optional
)
# To print only the transcription text, you'd use print(transcription.text) (here we're printing the entire transcription object to access timestamps)
print(json.dumps(transcription, indent=2, default=str))
```
---
## Initialize the Groq client
URL: https://console.groq.com/docs/speech-to-text/scripts/translation.py
```python
import os
from groq import Groq
# Initialize the Groq client
client = Groq()
# Specify the path to the audio file
filename = os.path.dirname(__file__) + "/sample_audio.m4a" # Replace with your audio file!
# Open the audio file
with open(filename, "rb") as file:
# Create a translation of the audio file
translation = client.audio.translations.create(
file=(filename, file.read()), # Required audio file
model="whisper-large-v3", # Required model to use for translation
prompt="Specify context or spelling", # Optional
language="en", # Optional ('en' only)
response_format="json", # Optional
temperature=0.0 # Optional
)
# Print the translation text
print(translation.text)
```
---
## Speech to Text
URL: https://console.groq.com/docs/speech-to-text
# Speech to Text
Groq API is designed to provide fast speech-to-text solution available, offering OpenAI-compatible endpoints that
enable near-instant transcriptions and translations. With Groq API, you can integrate high-quality audio
processing into your applications at speeds that rival human interaction.
## API Endpoints
We support two endpoints:
| Endpoint | Usage | API Endpoint |
|----------------|--------------------------------|-------------------------------------------------------------|
| Transcriptions | Convert audio to text | `https://api.groq.com/openai/v1/audio/transcriptions` |
| Translations | Translate audio to English text| `https://api.groq.com/openai/v1/audio/translations` |
## Supported Models
| Model ID | Model | Supported Language(s) | Description |
|-----------------------------|----------------------|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
| `whisper-large-v3-turbo` | [Whisper Large V3 Turbo](/docs/model/whisper-large-v3-turbo) | Multilingual | A fine-tuned version of a pruned Whisper Large V3 designed for fast, multilingual transcription tasks. |
| `whisper-large-v3` | [Whisper Large V3](/docs/model/whisper-large-v3) | Multilingual | Provides state-of-the-art performance with high accuracy for multilingual transcription and translation tasks. |
## Which Whisper Model Should You Use?
Having more choices is great, but let's try to avoid decision paralysis by breaking down the tradeoffs between models to find the one most suitable for
your applications:
- If your application is error-sensitive and requires multilingual support, use `whisper-large-v3`.
- If your application requires multilingual support and you need the best price for performance, use `whisper-large-v3-turbo`.
The following table breaks down the metrics for each model.
| Model | Cost Per Hour | Language Support | Transcription Support | Translation Support | Real-time Speed Factor | Word Error Rate |
|--------|--------|--------|--------|--------|--------|--------|
| `whisper-large-v3` | $0.111 | Multilingual | Yes | Yes |189 |10.3% |
| `whisper-large-v3-turbo` | $0.04 | Multilingual | Yes | No |216 |12% |
## Working with Audio Files
### Audio File Limitations
* Max File Size: 25 MB (free tier), 100MB (dev tier)
* Max Attachment File Size: 25 MB. If you need to process larger files, use the `url` parameter to specify a url to the file instead.
* Minimum File Length: 0.01 seconds
* Minimum Billed Length: 10 seconds. If you submit a request less than this, you will still be billed for 10 seconds.
* Supported File Types: Either a URL or a direct file upload for `flac`, `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `ogg`, `wav`, `webm`
* Single Audio Track: Only the first track will be transcribed for files with multiple audio tracks. (e.g. dubbed video)
* Supported Response Formats: `json`, `verbose_json`, `text`
* Supported Timestamp Granularities: `segment`, `word`
### Audio Preprocessing
Our speech-to-text models will downsample audio to 16KHz mono before transcribing, which is optimal for speech recognition. This preprocessing can be performed client-side if your original file is extremely
large and you want to make it smaller without a loss in quality (without chunking, Groq API speech-to-text endpoints accept up to 25MB for free tier and 100MB for [dev tier](/settings/billing)). For lower latency, convert your files to `wav` format. When reducing file size, we recommend FLAC for lossless compression.
The following `ffmpeg` command can be used to reduce file size:
```shell
ffmpeg \
-i \
-ar 16000 \
-ac 1 \
-map0:a \
-c:a flac \