Llama 3.3 70B

llama-3.3-70b-versatile
Try it in Playground
TOKEN SPEED
~280 TPS
INPUT
Text
OUTPUT
Text
CAPABILITIES
Tool Use, JSON Mode

Llama-3.3-70B-Versatile is Meta's advanced multilingual large language model, optimized for a wide range of natural language processing tasks. With 70 billion parameters, it offers high performance across various benchmarks while maintaining efficiency suitable for diverse applications.


PRICING

Input
$0.59
1.7M / $1
Output
$0.79
1.3M / $1

LIMITS

CONTEXT WINDOW
131,072

MAX OUTPUT TOKENS
32,768

Key Technical Specifications

Model Architecture

Built upon Meta's Llama 3.3 architecture, this model utilizes an optimized transformer design with 70 billion parameters. It incorporates Grouped-Query Attention (GQA) to enhance inference scalability and efficiency. The model has been fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align outputs with human preferences for helpfulness and safety.

Performance Metrics

The Llama-3.3-70B-Versatile model demonstrates exceptional performance across multiple benchmarks:
  • MMLU (Massive Multitask Language Understanding): 86.0% accuracy
  • HumanEval (code generation): 88.4% pass@1
  • MATH (mathematical problem solving): 77.0% sympy intersection score
  • MGSM (Multilingual Grade School Math): 91.1% exact match

Use Cases

Advanced Language Understanding
Leverage the model's strong multilingual capabilities for complex language understanding tasks across different domains.
Code Generation and Problem Solving
Utilize the model's great performance in code generation, mathematical problem-solving and analytical tasks.

Best Practices

  • Clearly specify task instructions and provide sufficient context in your prompts for precise responses.
  • Clearly define tool and function definitions for the model to understand their intended use cases, required parameters, expected outputs, and any constraints.

Get Started with Llama-3.3-70B-Versatile

Experience llama-3.3-70b-versatile on Groq:

shell
pip install groq
Python
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)

Was this page helpful?