meta-llama/llama-4-scout-17b-16e-instruct

Preview
Meta Logo
Llama 4 Scout is Meta's natively multimodal model that enables text and image understanding. With a 17 billion parameter mixture-of-experts architecture (16 experts), this model offers industry-leading performance for multimodal tasks like natural assistant-like chat, image recognition, and coding tasks. With a massive 10M token context window and support for 12 languages (Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese), the model delivers exceptional capabilities, especially when paired with Groq for fast inference.

Key Technical Specifications

Model Architecture

Llama 4 Scout features an auto-regressive language model that uses a mixture-of-experts (MoE) architecture with 17B activated parameters (109B total) and incorporates early fusion for native multimodality. The model uses 16 experts to efficiently handle both text and image inputs while maintaining high performance across chat, knowledge, and code generation tasks, with a knowledge cutoff of August 2024.

Performance Metrics

The Llama 4 Scout instruction-tuned model demonstrates exceptional performance across multiple benchmarks:
  • MMLU Pro: 74.3
  • ChartQA: 84.3
  • DocVQA: 89.4 anls

Technical Details

FEATUREVALUE
Context Window (Tokens)10M tokens (currently limited to 128K)
Max Output TokensN/A but recommended image input up to 5 only for highest accuracy
Max File SizeN/A
Token Generation SpeedN/A
Input Token Price0.11 per million tokens
Output Token Price0.34 per million tokens
Tool UseSupported
JSON ModeSupported
Image SupportSupported

Use Cases

Multimodal Assistant Applications
Build conversational AI assistants that can reason about both text and images, enabling visual recognition, image reasoning, captioning, and answering questions about visual content.
Code Generation and Technical Tasks
Create AI tools for code generation, debugging, and technical problem-solving with high-quality multilingual support.
Long-Context Applications
Leverage the 10M token context window for applications requiring extensive memory, document analysis, and maintaining conversation history.

Best Practices

  • Use system prompts to improve steerability and reduce false refusals. The model is designed to be highly steerable with appropriate system prompts.
  • Consider implementing system-level protections like Llama Guard for input filtering and response validation.
  • For multimodal applications, the model has been tested for up to 5 input images - perform additional testing if exceeding this limit.
  • Deploy with appropriate safeguards when working in specialized domains or with critical content.

Quick Start

Experience the capabilities of meta-llama/llama-4-scout-17b-16e-instruct on Groq:

pip install groq
1from groq import Groq
2client = Groq()
3completion = client.chat.completions.create(
4    model="meta-llama/llama-4-scout-17b-16e-instruct",
5    messages=[
6        {
7            "role": "user",
8            "content": "Explain why fast inference is critical for reasoning models"
9        }
10    ]
11)
12print(completion.choices[0].message.content)