Llama 3 70B

Deprecated
llama3-70b-8192
Try it in Playground
TOKEN SPEED
~330 tps
Powered bygroq
INPUT
Text
OUTPUT
Text

Llama 3.0 70B on Groq offers a balance of performance and speed as a reliable foundation model that excels at dialogue and content-generation for tasks requiring smaller context windows. While newer models have since emerged, Llama 3.0 70B remains production-ready and cost-effective with fast, consistent outputs via Groq API.


PRICING

Input
$0.59
1.7M / $1
Output
$0.79
1.3M / $1

LIMITS

CONTEXT WINDOW
8,192

MAX OUTPUT TOKENS
8,192

QUANTIZATION

This uses Groq's TruePoint Numerics, which reduces precision only in areas that don't affect accuracy, preserving quality while delivering significant speedup over traditional approaches. Learn more here.

Key Technical Specifications

Model Architecture

A 70 billion parameter transformer that includes enhanced attention mechanisms and optimized training objectives. It offers solid instruction-following capabilities and reduced hallucinations.

Performance Metrics

The model demonstrates solid performance across various benchmarks:
  • MMLU (5-shot): 79.5% accuracy, showing strong general knowledge
  • GSM-8K (8-shot, CoT): 93.0% accuracy in mathematical reasoning
  • HumanEval (0-shot): 81.7% pass rate in code generation

Use Cases

Dialogue Applications
Ideal for building reliable conversational experiences with consistent outputs:
  • Customer support and service chatbots
  • Interactive assistants and guides
  • Educational dialogue systems
  • Conversational interfaces for applications
Content Generation
Excels at creating high-quality content with a balance of creativity and accuracy:
  • Marketing and promotional content
  • Documentation and technical writing
  • Creative writing and storytelling
  • Content adaptation and summarization

Best Practices

  • Structure your prompts: Break complex tasks into clear steps for more reliable outputs
  • Enable JSON mode: For generating structured data and maintaining consistent output formats
  • Include examples: Add sample outputs or specific formats to guide complex generations

Get Started with llama3-70b

Experience the versatile llama3-70b-8192 with Groq speed now:

shell
pip install groq
Python
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="llama3-70b-8192",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)

Was this page helpful?