Llama 4 Scout 17B 16E

Preview

meta-llama/llama-4-scout-17b-16e-instruct

Try it in Playground

TOKEN SPEED

~750 tps

INPUT

Text, image

OUTPUT

Text

CAPABILITIES

Tool Use, JSON Mode

Meta

model card

Llama 4 Scout is Meta's natively multimodal model that enables text and image understanding. With a 17 billion parameter mixture-of-experts architecture (16 experts), this model offers industry-leading performance for multimodal tasks like natural assistant-like chat, image recognition, and coding tasks. With a 128K token context window and support for 12 languages (Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese), the model delivers exceptional capabilities, especially when paired with Groq for fast inference.

Usage note: With respect to any multimodal models included in Llama 4, the rights granted under Section 1(a) of the Llama 4 Community License Agreement are not being granted to you by Meta if you are an individual domiciled in, or a company with a principal place of business in, the European Union.

PRICING

Input

$0.11

9.1M / $1

Output

$0.34

2.9M / $1

LIMITS

CONTEXT WINDOW

131,072

MAX OUTPUT TOKENS

8,192

MAX FILE SIZE

20 MB

MAX INPUT IMAGES

Key Technical Specifications

Model Architecture

Llama 4 Scout features an auto-regressive language model that uses a mixture-of-experts (MoE) architecture with 17B activated parameters (109B total) and incorporates early fusion for native multimodality. The model uses 16 experts to efficiently handle both text and image inputs while maintaining high performance across chat, knowledge, and code generation tasks, with a knowledge cutoff of August 2024.

Performance Metrics

The Llama 4 Scout instruction-tuned model demonstrates exceptional performance across multiple benchmarks:

MMLU Pro: 52.2
ChartQA: 88.8
DocVQA: 94.4 anls

Use Cases

Multimodal Assistant Applications

Build conversational AI assistants that can reason about both text and images, enabling visual recognition, image reasoning, captioning, and answering questions about visual content.

Code Generation and Technical Tasks

Create AI tools for code generation, debugging, and technical problem-solving with high-quality multilingual support.

Long-Context Applications

Leverage the 128K token context window for applications requiring extensive memory, document analysis, and maintaining conversation history.

Best Practices

Use system prompts to improve steerability and reduce false refusals. The model is designed to be highly steerable with appropriate system prompts.
Consider implementing system-level protections like Llama Guard for input filtering and response validation.
For multimodal applications, this model supports up to 5 image inputs
Deploy with appropriate safeguards when working in specialized domains or with critical content.

Quick Start

Experience the capabilities of meta-llama/llama-4-scout-17b-16e-instruct on Groq:

shell

pip install groq

Python

from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)