Documentation

Supported Models

GroqCloud currently supports the following models:


Production Models

Note: Production models are intended for use in your production environments. They meet or exceed our high standards for speed and quality.

MODEL IDDEVELOPERCONTEXT WINDOW (TOKENS)MAX COMPLETION TOKENSMAX FILE SIZEMODEL CARD LINK
distil-whisper-large-v3-enHuggingFace--25 MBCard
gemma2-9b-itGoogle8,192--Card
llama-3.3-70b-versatileMeta128K32,768-Card
llama-3.1-8b-instantMeta128K8,192-Card
llama-guard-3-8bMeta8,192--Card
llama3-70b-8192Meta8,192--Card
llama3-8b-8192Meta8,192--Card
mixtral-8x7b-32768Mistral32,768--Card
whisper-large-v3OpenAI--25 MBCard
whisper-large-v3-turboOpenAI--25 MBCard

Preview Models

Note: Preview models are intended for evaluation purposes only and should not be used in production environments as they may be discontinued at short notice.

MODEL IDDEVELOPERCONTEXT WINDOW (TOKENS)MAX COMPLETION TOKENSMAX FILE SIZEMODEL CARD LINK
qwen-2.5-32bAlibaba Cloud128K8,192-Card
deepseek-r1-distill-qwen-32bDeepSeek128K16,384-Card
deepseek-r1-distill-llama-70b-specdecDeepSeek128K16,384-Card
deepseek-r1-distill-llama-70bDeepSeek128K--Card
llama-3.3-70b-specdecMeta8,192--Card
llama-3.2-1b-previewMeta128K8,192-Card
llama-3.2-3b-previewMeta128K8,192-Card
llama-3.2-11b-vision-previewMeta128K8,192-Card
llama-3.2-90b-vision-previewMeta128K8,192-Card

Deprecated models are models that are no longer supported or will no longer be supported in the future. A suggested alternative model for you to use is listed for each deprecated model. See our deprecated models here


Hosted models are directly accessible through the GroqCloud Models API endpoint using the model IDs mentioned above. You can use the https://api.groq.com/openai/v1/models endpoint to return a JSON list of all active models:

1import requests
2import os
3
4api_key = os.environ.get("GROQ_API_KEY")
5url = "https://api.groq.com/openai/v1/models"
6
7headers = {
8    "Authorization": f"Bearer {api_key}",
9    "Content-Type": "application/json"
10}
11
12response = requests.get(url, headers=headers)
13
14print(response.json())