Qwen3-32B

Preview
qwen/qwen3-32b
Try it in Playground
TOKEN SPEED
~400 TPS
Powered bygroq
INPUT
Text
OUTPUT
Text
Alibaba Cloud logoAlibaba Cloud
Model card

Qwen 3 32B is the latest generation of large language models in the Qwen series, offering groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. It uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within a single model. The model excels in human preference alignment, creative writing, role-playing, and multi-turn dialogues, while supporting 100+ languages and dialects.


PRICING

Input
$0.29
3.4M / $1
Output
$0.59
1.7M / $1

LIMITS

CONTEXT WINDOW
131,072

MAX OUTPUT TOKENS
40,960

QUANTIZATION

This uses Groq's TruePoint Numerics, which reduces precision only in areas that don't affect accuracy, preserving quality while delivering significant speedup over traditional approaches. Learn more here.

Key Technical Specifications

Model Architecture

Built on Qwen's architecture with 32 billion parameters, featuring a unique dual-mode system that supports both thinking mode for complex reasoning and non-thinking mode for efficient dialogue. The model demonstrates exceptional performance across diverse benchmarks.

Performance Metrics

The model demonstrates exceptional performance across diverse benchmarks:
  • 93.8% score on ArenaHard
  • 81.4% pass rate on AIME 2024
  • 65.7% on LiveCodeBench
  • 30.3% on BFCL
  • 73.0% on MultiIF
  • 72.9% on AIME 2025
  • 71.6% on LiveBench

Use Cases

Complex Problem Solving
Excels at tasks requiring deep analysis and structured thinking in thinking mode.
  • Multi-step reasoning and analysis
  • Mathematical problem solving
  • Complex coding tasks
  • Strategic planning and decision support
Natural Dialogue and Content Creation
Delivers engaging and natural conversations in non-thinking mode.
  • Creative writing and storytelling
  • Role-playing and character development
  • Multi-turn dialogues
  • Multilingual content generation

Best Practices

  • Mode Selection: Use thinking mode (reasoning_effort="default") for complex reasoning with temperature=0.6, top_p=0.95, top_k=20, and min_p=0
  • Non-thinking Mode: For general dialogue, use temperature=0.7, top_p=0.8, top_k=20, and min_p=0
  • Math Problems: Include 'Please reason step by step, and put your final answer within \boxed{}' in the prompt
  • Multiple-Choice: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the answer field with only the choice letter, e.g., "answer": "C"."
  • History Management: In multi-turn conversations, only include final outputs without thinking content
  • Reasoning format: Set reasoning_format to hidden to only return the final answer, or parsed to include the reasoning in a separate field

Get Started with Qwen 3 32B

Experience state-of-the-art language understanding and generation with Qwen 3 32B with Groq speed:

shell
pip install groq
Python
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="qwen/qwen3-32b",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)

Was this page helpful?