Llama 3 70B

Deprecated
llama3-70b-8192
Try it in Playground
TOKEN SPEED
~330 tps
INPUT
Text
OUTPUT
Text
CAPABILITIES
Tool Use, JSON Mode

Llama 3.0 70B on Groq offers a balance of performance and speed as a reliable foundation model that excels at dialogue and content-generation for tasks requiring smaller context windows. While newer models have since emerged, Llama 3.0 70B remains production-ready and cost-effective with fast, consistent outputs via Groq API.


PRICING

Input
$0.59
1.7M / $1
Output
$0.79
1.3M / $1

LIMITS

CONTEXT WINDOW
8,192

MAX OUTPUT TOKENS
8,192

Key Technical Specifications

Model Architecture

A 70 billion parameter transformer that includes enhanced attention mechanisms and optimized training objectives. It offers solid instruction-following capabilities and reduced hallucinations.

Performance Metrics

The model demonstrates solid performance across various benchmarks:
  • MMLU (5-shot): 79.5% accuracy, showing strong general knowledge
  • GSM-8K (8-shot, CoT): 93.0% accuracy in mathematical reasoning
  • HumanEval (0-shot): 81.7% pass rate in code generation

Use Cases

Dialogue Applications
Ideal for building reliable conversational experiences with consistent outputs:
  • Customer support and service chatbots
  • Interactive assistants and guides
  • Educational dialogue systems
  • Conversational interfaces for applications
Content Generation
Excels at creating high-quality content with a balance of creativity and accuracy:
  • Marketing and promotional content
  • Documentation and technical writing
  • Creative writing and storytelling
  • Content adaptation and summarization

Best Practices

  • Structure your prompts: Break complex tasks into clear steps for more reliable outputs
  • Enable JSON mode: For generating structured data and maintaining consistent output formats
  • Include examples: Add sample outputs or specific formats to guide complex generations

Get Started with llama3-70b

Experience the versatile llama3-70b-8192 with Groq speed now:

shell
pip install groq
Python
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="llama3-70b-8192",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)

Was this page helpful?