Llama 3.3 70B

llama-3.3-70b-versatile

Try it in Playground

TOKEN SPEED

~280 TPS

Powered bygroq

INPUT

Text

OUTPUT

Text

CAPABILITIES

Tool Use, JSON Object Mode

PRICING

Input

$0.59

1.7M / $1

Output

$0.79

1.3M / $1

LIMITS

CONTEXT WINDOW

131,072

MAX OUTPUT TOKENS

32,768

Key Technical Specifications

Model Architecture

Built upon Meta's Llama 3.3 architecture, this model utilizes an optimized transformer design with 70 billion parameters. It incorporates Grouped-Query Attention (GQA) to enhance inference scalability and efficiency. The model has been fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align outputs with human preferences for helpfulness and safety.

Performance Metrics

The Llama-3.3-70B-Versatile model demonstrates exceptional performance across multiple benchmarks:

MMLU (Massive Multitask Language Understanding): 86.0% accuracy
HumanEval (code generation): 88.4% pass@1
MATH (mathematical problem solving): 77.0% sympy intersection score
MGSM (Multilingual Grade School Math): 91.1% exact match

Use Cases

Advanced Language Understanding

Leverage the model's strong multilingual capabilities for complex language understanding tasks across different domains.

Code Generation and Problem Solving

Utilize the model's great performance in code generation, mathematical problem-solving and analytical tasks.

Best Practices

Clearly specify task instructions and provide sufficient context in your prompts for precise responses.
Clearly define tool and function definitions for the model to understand their intended use cases, required parameters, expected outputs, and any constraints.

Get Started with Llama-3.3-70B-Versatile

Experience llama-3.3-70b-versatile on Groq:

shell

pip install groq

Python

from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)

Get Started

Features

Compound

Advanced Features

Prompting Guide

Production Readiness

Developer Resources

Console

Support & Guidelines

Uncategorized