Compound Mini

groq/compound-mini

Try it in Playground

TOKEN SPEED

~450 tps

Powered bygroq

INPUT

Text

OUTPUT

Text

CAPABILITIES

Web Search, Code Execution, Visit Website, Browser Automation, Wolfram Alpha, JSON Object Mode

Groq

Groq's Compound Mini system integrates OpenAI's GPT-OSS 120B and Llama 3.3 70B models with external tools like web search and code execution. This allows applications to access real-time data and interact with external environments, providing more accurate and current responses than standalone LLMs. Instead of managing separate tools and APIs, Compound systems offer a unified interface that handles tool integration and orchestration, letting you focus on application logic rather than infrastructure complexity.

Rate limits for groq/compound-mini are determined by the rate limits of the individual models that comprise them.

The use of this tool with a supported model or system in GroqCloud is not a HIPAA Covered Cloud Service under Groq's Business Associate Addendum at this time. This tool is also not available currently for use with regional / sovereign endpoints.

PRICING

Underlying Model Pricing (per 1M tokens)

Pricing (GPT-OSS-120B)

Input

$0.15

Output

$0.60

Pricing (Llama 3.3 70B)

Input

$0.59

Output

$0.79

Built-in Tool Pricing

Basic Web Search

$5 / 1000 requests

Advanced Web Search

$8 / 1000 requests

Visit Website

$1 / 1000 requests

Code Execution

$0.18 / hour

Browser Automation

$0.08 / hour

Wolfram Alpha

Based on your API key from Wolfram, not billed by Groq

Final pricing depends on which underlying models and tools are used for your specific query. See the Pricing page for more details or the Compound page for usage breakdowns.

LIMITS

CONTEXT WINDOW

131,072

MAX OUTPUT TOKENS

8,192

QUANTIZATION

This uses Groq's TruePoint Numerics, which reduces precision only in areas that don't affect accuracy, preserving quality while delivering significant speedup over traditional approaches. Learn more here.

Key Technical Specifications

Model Architecture

Compound mini is powered by Llama 3.3 70B and GPT-OSS 120B for intelligent reasoning and tool use. Unlike groq/compound, it can only use one tool per request, but has an average of 3x lower latency.

Performance Metrics

Groq developed a new evaluation benchmark for measuring search capabilities called RealtimeEval. This benchmark is designed to evaluate tool-using systems on current events and live data. On the benchmark, Compound Mini outperformed GPT-4o-search-preview and GPT-4o-mini-search-preview significantly.

Learn More About Agentic Tooling

Discover how to build powerful applications with real-time web search and code execution

Use Cases

Realtime Web Search

Automatically access up-to-date information from the web using the built-in web search tool.

Code Execution

Execute Python code automatically using the code execution tool powered by E2B.

Code Generation and Technical Tasks

Create AI tools for code generation, debugging, and technical problem-solving with high-quality multilingual support.

Best Practices

Use system prompts to improve steerability and reduce false refusals. Compound mini is designed to be highly steerable with appropriate system prompts.
Consider implementing system-level protections like Llama Guard for input filtering and response validation.
Deploy with appropriate safeguards when working in specialized domains or with critical content.

Quick Start

Experience the capabilities of groq/compound-mini on Groq:

shell

pip install groq

Python

from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="groq/compound-mini",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)

Getting Started

Core Features

Tools & Integrations

Compound (Agentic AI)

Guides

Service Tiers

Advanced

Production Readiness

Account and Console

Developer Resources

Legal

Uncategorized