Qwen3-32B

Preview
qwen/qwen3-32b
Try it in Playground
TOKEN SPEED
~400 TPS
INPUT
Text
OUTPUT
Text
CAPABILITIES
Tool Use, JSON Mode, Reasoning
Alibaba Cloud logoAlibaba Cloud
model card

Qwen 3 32B is the latest generation of large language models in the Qwen series, offering groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. It uniquely supports seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within a single model. The model excels in human preference alignment, creative writing, role-playing, and multi-turn dialogues, while supporting 100+ languages and dialects.


PRICING

Input
$0.29
3.4M / $1
Output
$0.59
1.7M / $1

LIMITS

CONTEXT WINDOW
131,072

MAX OUTPUT TOKENS
40,960

Key Technical Specifications

Model Architecture

Built on Qwen's architecture with 32 billion parameters, featuring a unique dual-mode system that supports both thinking mode for complex reasoning and non-thinking mode for efficient dialogue. The model demonstrates exceptional performance across diverse benchmarks.

Performance Metrics

The model demonstrates exceptional performance across diverse benchmarks:
  • 93.8% score on ArenaHard
  • 81.4% pass rate on AIME 2024
  • 65.7% on LiveCodeBench
  • 30.3% on BFCL
  • 73.0% on MultiIF
  • 72.9% on AIME 2025
  • 71.6% on LiveBench

Use Cases

Complex Problem Solving
Excels at tasks requiring deep analysis and structured thinking in thinking mode.
  • Multi-step reasoning and analysis
  • Mathematical problem solving
  • Complex coding tasks
  • Strategic planning and decision support
Natural Dialogue and Content Creation
Delivers engaging and natural conversations in non-thinking mode.
  • Creative writing and storytelling
  • Role-playing and character development
  • Multi-turn dialogues
  • Multilingual content generation

Best Practices

  • Mode Selection: Use thinking mode (reasoning_effort="default") for complex reasoning with temperature=0.6, top_p=0.95, top_k=20, and min_p=0
  • Non-thinking Mode: For general dialogue, use temperature=0.7, top_p=0.8, top_k=20, and min_p=0
  • Math Problems: Include 'Please reason step by step, and put your final answer within \boxed{}' in the prompt
  • Multiple-Choice: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the answer field with only the choice letter, e.g., "answer": "C"."
  • History Management: In multi-turn conversations, only include final outputs without thinking content
  • Reasoning format: Set reasoning_format to hidden to only return the final answer, or parsed to include the reasoning in a separate field

Get Started with Qwen 3 32B

Experience state-of-the-art language understanding and generation with Qwen 3 32B with Groq speed:

shell
pip install groq
Python
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="qwen/qwen3-32b",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)

Was this page helpful?