Introducing the Official
Llama API

The fastest way to run the world’s most trusted openly available models with no tradeoffs.

Served directly on the most efficient inference chip.

and start building!

Not a wrapper. Not a copy. It’s the real thing, served directly from Meta and accelerated by Groq's purpose-built inference hardware

Llama 4 and more, available instantly with zero setup.

First party access

Performance you can trust

Scalable

More than an endpoint, it’s an edge

Private & secure

Headache-free migration

from openai import OpenAI

client = OpenAI(
    api_key=LLAMA_API_KEY,
    base_url="https://api.llama.com/compat/v1/"
)

import { OpenAI } from "openai";

const openai = new OpenAI({
  apiKey: LLAMA_API_KEY,
  baseURL: "https://api.llama.com/compat/v1/",
});