Documentation

Groq Batch API

Process large-scale workloads asynchronously with our Batch API.

What is Batch Processing?

Batch processing lets you run thousands of API requests at scale by submitting your workload as a batch to Groq and letting us process it with a 24-hour turnaround.

While synchronous API calls are perfect for our fast inference speed for realtime applications, asynchronous batch processing is perfect for use cases where volume of data matters more than synchronous responses, such as processing large datasets, generating content in bulk, and running evaluations. Compared to using our synchronous endpoints, Batch API has:

  • Higher rate limits: Substantially increased limits compared to on-demand APIs
  • 24-hour (or less) turnaround: Each batch completes within 24 hours (or often more quickly)

Model Availability

The Batch API can currently be used to execute queries for text inputs with llama-3.3-70b-versatile and llama-3.1-8b-instant.

Getting Started

1. Prepare Your Batch File

A batch is composed of a list of API requests and every batch job starts with a JSON Lines (JSONL) file that contains the requests you want processed. Each line in this file represents a single API call.

The Groq Batch API currently supports chat completion requests through /v1/chat/completions.

The structure for each line must include:

  • custom_id: Your unique identifier for tracking the batch request
  • method: The HTTP method (currently POST only)
  • url: The API endpoint to call (/v1/chat/completions)
  • body: The parameters of your request matching our synchronous API format. See our API Reference here.

The following is an example of a JSONL batch file:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.1-8b-instant", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.1-8b-instant", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+3?"}]}}
{"custom_id": "request-3", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.1-8b-instant", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "count up to 1000000. starting with 1, 2, 3. print all the numbers, do not stop until you get to 1000000."}]}}

Converting Sync Calls to Batch Format

If you're familiar with making synchronous API calls, converting them to batch format is straightforward. Here's how a regular API call transforms into a batch request:

# Your typical synchronous API call:
response = client.chat.completions.create(
    model="llama-3.1-8b-instant",
    messages=[
        {"role": "user", "content": "What is quantum computing?"}
    ]
)

# The same call in batch format (must be on a single line as JSONL):
{"custom_id": "quantum-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "llama-3.1-8b-instant", "messages": [{"role": "user", "content": "What is quantum computing?"}]}}

2. Upload Your Batch File

Upload your .jsonl batch file using the Files API endpoint for when kicking off your batch job:

Note: The Files API currently only supports .jsonl files up ot 200MB in size.

1import requests # pip install requests first!
2
3def upload_file_to_groq(api_key, file_path):
4    url = "https://api.groq.com/openai/v1/files"
5    
6    headers = {
7        "Authorization": f"Bearer {api_key}"
8    }
9    
10    # Prepare the file and form data
11    files = {
12        "file": ("batch_file.jsonl", open(file_path, "rb"))
13    }
14    
15    data = {
16        "purpose": "batch"
17    }
18    
19    # Make the POST request
20    response = requests.post(url, headers=headers, files=files, data=data)
21    
22    return response.json()
23
24# Usage example
25api_key = "YOUR_GROQ_API_KEY"  # Replace with your actual API key
26file_path = "batch_file.jsonl"  # Path to your JSONL file
27
28try:
29    result = upload_file_to_groq(api_key, file_path)
30    print(result)
31except Exception as e:
32    print(f"Error: {e}")

You will receive a JSON response that contains the ID (id) for your file object that you will then use to create your batch job:

{
    "id":"file_01jh6x76wtemjr74t1fh0faj5t",
    "object":"file",
    "bytes":966,
    "created_at":1736472501,
    "filename":"input_file.jsonl",
    "purpose":"batch"
}

3. Create Your Batch Job

Once you've uploaded your .jsonl file, you can use the file object ID (in this case, file_01jh6x76wtemjr74t1fh0faj5t as shown in Step 2) to create a batch:

Note: For now, the completion window for batch jobs can only be set to 24 hours (24h).

1import requests # pip install requests first! 
2
3def create_batch(api_key, input_file_id):
4    url = "https://api.groq.com/openai/v1/batches"
5    
6    headers = {
7        "Authorization": f"Bearer {api_key}",
8        "Content-Type": "application/json"
9    }
10    
11    data = {
12        "input_file_id": input_file_id,
13        "endpoint": "/v1/chat/completions",
14        "completion_window": "24h"
15    }
16    
17    response = requests.post(url, headers=headers, json=data)
18    return response.json()
19
20# Usage example
21api_key = "YOUR_GROQ_API_KEY"
22file_id = "file_01jh6x76wtemjr74t1fh0faj5t" # replace with your `id` from file upload API response object
23
24try:
25    result = create_batch(api_key, file_id)
26    print(result)
27except Exception as e:
28    print(f"Error: {e}")

This request will return a Batch object with metadata about your batch, including the batch id that you can use to check the status of your batch:

{
    "id":"batch_01jh6xa7reempvjyh6n3yst2zw",
    "object":"batch",
    "endpoint":"/v1/chat/completions",
    "errors":null,
    "input_file_id":"file_01jh6x76wtemjr74t1fh0faj5t",
    "completion_window":"24h",
    "status":"validating",
    "output_file_id":null,
    "error_file_id":null,
    "finalizing_at":null,
    "failed_at":null,
    "expired_at":null,
    "cancelled_at":null,
    "request_counts":{
        "total":0,
        "completed":0,
        "failed":0
    },
    "metadata":null,
    "created_at":1736472600,
    "expires_at":1736559000,
    "cancelling_at":null,
    "completed_at":null,
    "in_progress_at":null
}

4. Check Batch Status

You can check the status of a batch any time your heart desires with the batch id (in this case, batch_01jh6xa7reempvjyh6n3yst2zw from the above Batch response object), which will also return a Batch object:

1import requests # pip install requests first!
2
3def get_batch_status(api_key, batch_id):
4    url = f"https://api.groq.com/openai/v1/batches/{batch_id}"
5    
6    headers = {
7        "Authorization": f"Bearer {api_key}",
8        "Content-Type": "application/json"
9    }
10    
11    response = requests.get(url, headers=headers)
12    return response.json()
13
14# Usage example
15api_key = "YOUR_GROQ_API_KEY"
16batch_id = "batch_01jh6xa7reempvjyh6n3yst2zw"
17
18try:
19    result = get_batch_status(api_key, batch_id)
20    print(result)
21except Exception as e:
22    print(f"Error: {e}")

The status of a given batch job can return any of the following status codes:

StatusDescription
validatingbatch file is being validated before the batch processing begins
failedbatch file has failed the validation process
in_progressbatch file was successfully validated and the batch is currently being run
finalizingbatch has completed and the results are being prepared
completedbatch has been completed and the results are ready
expiredbatch was not able to be completed within the 24-hour time window
cancellingbatch is being cancelled (may take up to 10 minutes)
cancelledbatch was cancelled

When your batch job is complete, the Batch object will return an output_file_id and/or an error_file_id that you can then use to retrieve your results (as shown below in Step 5). Here's an example:

{
    "id":"batch_01jh6xa7reempvjyh6n3yst2zw",
    "object":"batch",
    "endpoint":"/v1/chat/completions",
    "errors":[
        {
            "code":"invalid_method",
            "message":"Invalid value: 'GET'. Supported values are: 'POST'","param":"method",
            "line":4
        }
    ],
    "input_file_id":"file_01jh6x76wtemjr74t1fh0faj5t",
    "completion_window":"24h",
    "status":"completed",
    "output_file_id":"file_01jh6xa97be52b7pg88czwrrwb",
    "error_file_id":"file_01jh6xa9cte52a5xjnmnt5y0je",
    "finalizing_at":null,
    "failed_at":null,
    "expired_at":null,
    "cancelled_at":null,
    "request_counts":
    {
        "total":3,
        "completed":2,
        "failed":1
    },
    "metadata":null,
    "created_at":1736472600,
    "expires_at":1736559000,
    "cancelling_at":null,
    "completed_at":1736472607,
    "in_progress_at":1736472601
}

5. Retrieve Batch Results

Now for the fun. Once the batch is complete, you can retrieve the results using the output_file_id from your Batch object (in this case, file_01jh6xa97be52b7pg88czwrrwb from the above Batch response object) and write it to a file on your machine (batch_output.jsonl in this case) to view them:

1import requests # pip install requests first! 
2
3def download_file_content(api_key, output_file_id, output_file):
4    url = f"https://api.groq.com/openai/v1/files/{output_file_id}/content"
5    
6    headers = {
7        "Authorization": f"Bearer {api_key}"
8    }
9    
10    response = requests.get(url, headers=headers)
11    
12    # Write the content to a file
13    with open(output_file, 'wb') as f:
14        f.write(response.content)
15    
16    return f"File downloaded successfully to {output_file}"
17
18# Usage example
19api_key = "YOUR_GROQ_API_KEY"
20output_file_id = "file_01jh6xa97be52b7pg88czwrrwb" # replace with your own completed batch job's `output_file_id`
21output_file = "batch_output.jsonl" # replace with your own file of choice to download batch job contents to
22
23try:
24    result = download_file_content(api_key, file_id, output_file)
25    print(result)
26except Exception as e:
27    print(f"Error: {e}")

The output .jsonl file will have one response line per successful request line of your batch file. Each line includes the original custom_id for mapping results, a unique batch request ID, and the response:

{"id": "batch_req_123", "custom_id": "my-request-1", "response": {"status_code": 200, "request_id": "req_abc", "body": {"id": "completion_xyz", "model": "llama-3.1-8b-instant", "choices": [{"index": 0, "message": {"role": "assistant", "content": "Hello!"}}], "usage": {"prompt_tokens": 20, "completion_tokens": 5, "total_tokens": 25}}}, "error": null}

Any failed requests in the batch will have their error information written to an error file that can be accessed via the batch's error_file_id.

Note: Results may not appears in the same order as your batch request submissions. Always use the custom_id field to match results with your original request.

List Batches

You can view all your batch jobs by making a call to https://api.groq.com/v1/batches:

curl https://api.groq.com/v1/batches \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json"

Batch Expiration

Results from processed batches will be reserved for a period of 30 days in our system.

Rate limits

The Batch API rate limits are separate than existing per-model rate limits for synchronous requests. Using the Batch API will not consume tokens from your standard per-model limits, which means you can conveniently leverage batch processing to increase the number of tokens you process with us.

See your limits here.