Llama 4 Scout 17B 16E

Preview
meta-llama/llama-4-scout-17b-16e-instruct
Try it in Playground
TOKEN SPEED
~750 tps
INPUT
Text, image
OUTPUT
Text
CAPABILITIES
Tool Use, JSON Mode

Llama 4 Scout is Meta's natively multimodal model that enables text and image understanding. With a 17 billion parameter mixture-of-experts architecture (16 experts), this model offers industry-leading performance for multimodal tasks like natural assistant-like chat, image recognition, and coding tasks. With a 128K token context window and support for 12 languages (Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese), the model delivers exceptional capabilities, especially when paired with Groq for fast inference.


PRICING

Input
$0.11
9.1M / $1
Output
$0.34
2.9M / $1

LIMITS

CONTEXT WINDOW
131,072

MAX OUTPUT TOKENS
8,192

MAX FILE SIZE
20 MB

MAX INPUT IMAGES
5

Key Technical Specifications

Model Architecture

Llama 4 Scout features an auto-regressive language model that uses a mixture-of-experts (MoE) architecture with 17B activated parameters (109B total) and incorporates early fusion for native multimodality. The model uses 16 experts to efficiently handle both text and image inputs while maintaining high performance across chat, knowledge, and code generation tasks, with a knowledge cutoff of August 2024.

Performance Metrics

The Llama 4 Scout instruction-tuned model demonstrates exceptional performance across multiple benchmarks:
  • MMLU Pro: 52.2
  • ChartQA: 88.8
  • DocVQA: 94.4 anls

Use Cases

Multimodal Assistant Applications
Build conversational AI assistants that can reason about both text and images, enabling visual recognition, image reasoning, captioning, and answering questions about visual content.
Code Generation and Technical Tasks
Create AI tools for code generation, debugging, and technical problem-solving with high-quality multilingual support.
Long-Context Applications
Leverage the 128K token context window for applications requiring extensive memory, document analysis, and maintaining conversation history.

Best Practices

  • Use system prompts to improve steerability and reduce false refusals. The model is designed to be highly steerable with appropriate system prompts.
  • Consider implementing system-level protections like Llama Guard for input filtering and response validation.
  • For multimodal applications, this model supports up to 5 image inputs
  • Deploy with appropriate safeguards when working in specialized domains or with critical content.

Quick Start

Experience the capabilities of meta-llama/llama-4-scout-17b-16e-instruct on Groq:

shell
pip install groq
Python
from groq import Groq
client = Groq()
completion = client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        {
            "role": "user",
            "content": "Explain why fast inference is critical for reasoning models"
        }
    ]
)
print(completion.choices[0].message.content)

Was this page helpful?