DeepSeek-R1-Distill-Llama-70B

Back to models page
DeepSeek Logo
DeepSeek-R1-Distill-Llama-70B is a distilled version of DeepSeek's R1 model, fine-tuned from the Llama-3.3-70B-Instruct base model. This model leverages knowledge distillation to retain robust reasoning capabilities and deliver exceptional performance on mathematical and logical reasoning tasks with Groq's industry-leading speed.

Key Technical Specifications

Model Architecture

Built upon the Llama-3.3-70B-Instruct framework, the model is comprised of 70 billion parameters. The distillation process fine-tunes the base model using outputs from DeepSeek-R1, effectively transferring reasoning patterns.

Performance Metrics

The model demonstrates strong performance across various benchmarks:
  • AIME 2024: Pass@1 score of 70.0
  • MATH-500: Pass@1 score of 94.5
  • CodeForces Rating: Achieved a rating of 1,633

Technical Details

FEATUREVALUE
Context Window (Tokens)128K
Max Output Tokens-
Max File Size-
Token Generation Speed275 tps
Input Token Price$0.75/1M tokens
Output Token Price$0.99/1M tokens
Tool UseSupported
JSON ModeSupported
Image SupportNot Supported

Use Cases

Mathematical Problem-Solving
Effectively addresses complex mathematical queries, making it valuable for educational tools and research applications.
Coding Assistance
Supports code generation and debugging, beneficial for software development.
Logical Reasoning
Performs tasks requiring structured thinking and deduction, applicable in data analysis and strategic planning.

Best Practices

  • Prompt Engineering: Set the temperature parameter between 0.5 and 0.7 (ideally 0.6) to prevent repetitive or incoherent outputs.
  • System Prompt: Avoid adding a system prompt and include all instructions within the user prompt.

Get Started with DeepSeek-R1-Distill-Llama-70B

Experience the reasoning capabilities of deepseek-r1-distill-llama-70b with Groq speed now:

pip install groq
1from groq import Groq
2client = Groq()
3completion = client.chat.completions.create(
4    model="deepseek-r1-distill-llama-70b",
5    messages=[
6        {
7            "role": "user",
8            "content": "Explain why fast inference is critical for reasoning models"
9        }
10    ]
11)
12print(completion.choices[0].message.content)