Qwen Logo
Qwen/QwQ-32B is a breakthrough 32-billion parameter reasoning model delivering performance comparable to state-of-the-art (SOTA) models 20x larger like DeepSeek-R1 (671B parameters) on complex reasoning and coding tasks. Deployed on Groq's hardware, it provides the world's fastest and cost-efficient reasoning, producing chains and results in seconds. Along with native tool use support, the 128K context window enables processing extensive information while maintaining comprehensive context.

Key Technical Specifications

Revolutionary Reasoning Capabilities

Developed through advanced reinforcement learning techniques, QwQ-32B excels at mathematical reasoning, coding, and complex-problem solving with performance rivaling the likes of DeepSeek-R1 and o1-mini.

Performance Metrics

SOTA capabilities in this compact QwQ-32B model across various benchmarks:
  • AIME24: 79.5 (compared to 63.6 for o1-mini)
  • BFCL: 66.4 (compared to 60.3 for DeepSeek-R1)
  • LiveBench: 73.1 (compared to 71.6 for DeepSeek-R1)

Technical Details

FEATUREVALUE
Context Window (Tokens)128K
Max Output Tokens-
Max File Size-
Token Generation Speed~400 TPS
Input Token Price $0.29 per million tokens
Output Token Price $0.39 per million tokens
Tool UseSupported
JSON ModeSupported
Image SupportNot Supported

Use Cases

Advanced Problem Solving
Tackles complex mathematical problems and logical reasoning tasks with exceptional accuracy:
  • Multi-step reasoning chains with explanation
  • Complex decision-making scenarios
  • Research assistance and literature analysis
Software Development
Delivers high-quality code generation and technical assistance comparable to much larger models:
  • Algorithm implementation and optimization
  • Debugging with step-by-step reasoning
  • API development and integration guidance

Best Practices

  • Use `temperature=0.6` and `top_p=0.95` to avoid endless repetitions and hallucinations.
  • Utilize the full context window - with 128K tokens available, provide comprehensive problem descriptions and relevant background information.
  • Set `reasoning_format` to `parsed` with to handle the missing first `<think>` token in QwQ-32B output.
  • For multi-turn conversations, include only the final output from previous turns in history, not the thinking content.
  • Prompt the model to be concise when needed - the model tends to produce extensive reasoning chains.
  • Increase `max_completion_tokens` to give the model sufficient space to complete its reasoning without truncation.
  • If reasoning chains are critical, prompt the model to avoid using Chinese characters in its output (this is normal behavior).
  • Take advantage of QwQ-32B's strong tool use and function calling capabilities for agentic applications.
  • If the model provides thinking without reaching a final answer, try prompting for conciseness or rerun your query.

Get Started with Qwen/QwQ-32B

Experience the world's fastest breakthrough reasoning capabilities with qwen-qwq-32b on Groq:

pip install groq
1from groq import Groq
2client = Groq()
3completion = client.chat.completions.create(
4    model="qwen-qwq-32b",
5    messages=[
6        {
7            "role": "user",
8            "content": "Explain why fast inference is critical for reasoning models"
9        }
10    ]
11)
12print(completion.choices[0].message.content)