LLaMA-3.2-3B-Preview

Back to models page
Meta Logo
LLaMA-3.2-3B-Preview is one of the fastest models on Groq, offering a great balance of speed and generation quality. With 3.1 billion parameters and a 128K context window, it delivers rapid responses while providing improved accuracy compared to the 1B version. The model excels at tasks like content creation, summarization, and information retrieval, making it ideal for applications where quality matters without requiring a large model. Its efficient design translates to cost-effective performance for real-time applications such as chatbots, content generation, and summarization tasks that need reliable responses with good output quality.

Key Technical Specifications

Model Architecture

LLaMA-3.2-3B-Preview is an auto-regressive language model built upon Meta's LLaMA-3.2 architecture. Utilizing an optimized transformer architecture, it supports text and code generation and offers enhanced capabilities compared to the 1B version.

Performance Metrics

The model demonstrates strong performance across key benchmarks, with notable improvements over the 1B version:
  • MMLU: 45.7% accuracy (#5 shot)
  • Arc-Challenge: 41.3% accuracy (#25 shot)
  • SQuAD: 61.8% accuracy (#1 shot)

Technical Details

FEATUREVALUE
Context Window (Tokens)128k
Max Output Tokens8k
Max File Size-
Token Generation Speed~2800 tps
Input Token Price$0.06/1M tokens
Output Token Price$0.06/1M tokens
Tool UseSupported
JSON ModeSupported
Image SupportNot Supported

Use Cases

Enhanced Content Generation
Ideal for applications requiring higher quality outputs with reasonable speed.
  • More sophisticated chatbots and virtual assistants
  • Higher-quality content creation and summarization
  • More accurate information extraction and analysis
  • Enhanced reasoning for complex problem-solving
Balanced Performance Applications
Perfect for use cases where quality matters more than absolute speed.
  • Production-ready applications requiring better reasoning
  • More nuanced content moderation and analysis
  • Educational tools requiring deeper knowledge
  • Customer service applications needing more accurate responses

Best Practices

  • Enable JSON mode: For generating structured data or when you need outputs in a specific format
  • Use tool use: For tasks that require external tools or services to generate responses
  • Leverage the enhanced reasoning: Provide more complex prompts that take advantage of the model's improved capabilities
  • Balance batch size: Adjust batch processing to optimize for the slightly lower token speed compared to the 1B version

Get Started with LLaMA-3.2-3B-Preview

Experience llama-3.2-3b-preview with Groq speed now:

pip install groq
1from groq import Groq
2client = Groq()
3completion = client.chat.completions.create(
4    model="llama-3.2-3b-preview",
5    messages=[
6        {
7            "role": "user",
8            "content": "Explain why fast inference is critical for reasoning models"
9        }
10    ]
11)
12print(completion.choices[0].message.content)