qwen Logo
Qwen-2.5-32B is Alibaba's flagship model, delivering near-instant responses with GPT-4 level capabilities across a wide range of tasks. Built on 5.5 trillion tokens of diverse training data, it excels at everything from creative writing to complex reasoning. With reliable tool use, native JSON support, and a massive 128K context window, it handles sophisticated workflows while maintaining consistently fast response times.

Key Technical Specifications

Model Architecture

Built on Qwen's architecture with 32 billion parameters, the model is trained on 5.5 trillion tokens of diverse data and optimized for versatile real-world applications with instant responses, reliable tool use, and native JSON support.

Performance Metrics

The model demonstrates exceptional performance across diverse tasks:
  • 94.5% score on MATH-500 benchmark
  • 70.0% pass rate on AIME 2024
  • Robust performance on complex reasoning tasks

Technical Details

FEATUREVALUE
Context Window (Tokens)128K
Max Output Tokens-
Max File Size-
Token Generation Speed~200 TPS
Input Token Price$0.79/1M tokens
Output Token Price$0.79/1M tokens
Tool UseSupported
JSON ModeSupported
Image SupportNot Supported

Use Cases

Complex Problem Solving
Excels at tasks requiring deep analysis and structured thinking.
  • Multi-step reasoning and analysis
  • Mathematical problem solving
  • Strategic planning and decision support
  • Research synthesis and summarization
Content Creation
Generates high-quality content across various formats and styles.
  • Long-form article writing
  • Creative writing and storytelling
  • Technical documentation
  • Marketing copy and content adaptation

Best Practices

  • Leverage the context window: Include comprehensive information for more accurate and contextual responses
  • Simplify complex queries: Break down multi-part questions into clear, small steps for more reliable reasoning
  • Enable JSON mode: For generating structured data or when you need outputs in a specific format
  • Include examples: Add sample outputs or specific formats to guide the model into specific output structures

Get Started with Qwen-2.5-32B

Experience state-of-the-art language understanding and generation with Qwen-2.5-32B with Groq speed:

pip install groq
1from groq import Groq
2client = Groq()
3completion = client.chat.completions.create(
4    model="qwen-2.5-32b",
5    messages=[
6        {
7            "role": "user",
8            "content": "Explain why fast inference is critical for reasoning models"
9        }
10    ]
11)
12print(completion.choices[0].message.content)