Documentation

Reasoning

Reasoning models excel at complex problem-solving tasks that require step-by-step analysis, logical deduction, and structured thinking and solution validation. With Groq inference speed, these types of models can deliver instant reasoning capabilities critical for real-time applications.

Why Speed Matters for Reasoning

Reasoning models are capable of complex decision making with explicit reasoning chains that are part of the token output and used for decision-making, which make low-latency and fast inference essential. Complex problems often require multiple chains of reasoning tokens where each step build on previous results. Low latency compounds benefits across reasoning chains and shaves off minutes of reasoning to a response in seconds.

Supported Model

Model IDModel
deepseek-r1-distill-llama-70bDeepSeek R1 (Distil-Llama 70B)

Quick Start

1from groq import Groq
2
3client = Groq()
4completion = client.chat.completions.create(
5    model="deepseek-r1-distill-llama-70b",
6    messages=[
7        {
8            "role": "user",
9            "content": "How many r's are in the word strawberry?"
10        }
11    ],
12    temperature=0.6,
13    max_completion_tokens=1024,
14    top_p=0.95,
15    stream=True
16)
17
18for chunk in completion:
19    print(chunk.choices[0].delta.content or "", end="")

Recommended Configuration Parameters

ParameterDefaultRangeDescription
messages--Array of message objects. Important: Avoid system prompts - include all instructions in the user message!
temperature0.60.0 - 2.0Controls randomness in responses. Lower values make responses more deterministic. Recommended range: 0.5-0.7 to prevent repetitions or incoherent outputs
max_completion_tokens1024-Maximum length of model's response. Default may be too low for complex reasoning - consider increasing for detailed step-by-step solutions
top_p0.950.0 - 1.0Controls diversity of token selection
streamfalsebooleanEnables response streaming. Recommended for interactive reasoning tasks
stopnullstring/arrayCustom stop sequences
seednullintegerSet for reproducible results. Important for benchmarking - run multiple tests with different seeds
json_mode-booleanNot currently supported - avoid using as it may break response formatting

Optimizing Performance

Temperature and Token Management

The model performs best with temperature settings between 0.5-0.7, with lower values (closer to 0.5) producing more consistent mathematical proofs and higher values allowing for more creative problem-solving approaches. Monitor and adjust your token usage based on the complexity of your reasoning tasks - while the default max_completion_tokens is 1024, complex proofs may require higher limits.

Prompt Engineering

To ensure accurate, step-by-step reasoning while maintaining high performance:

  • DeepSeek-R1 works best when all instructions are included directly in user messages rather than system prompts.
  • Structure your prompts to request explicit validation steps and intermediate calculations.
  • Avoid few-shot prompting and go for zero-shot prompting only.