Groq provides inference services for pre-made Low-Rank Adaptation (LoRA) adapters. LoRA is a Parameter-efficient Fine-tuning (PEFT) technique that customizes model behavior without altering base model weights. Upload your existing LoRA adapters to run specialized inference while maintaining the performance and efficiency of Groq's infrastructure.
This service is not available currently for use with regional / sovereign endpoints.
Note: Groq offers LoRA inference services only. We do not provide LoRA fine-tuning services - you must create your LoRA adapters externally using other providers or tools.
With LoRA inference on Groq, you can:
LoRA is available exclusively to enterprise-tier customers. To get started with LoRA on GroqCloud, please reach out to our enterprise team.
Compared to using just the base model, LoRA adapters offer significant advantages:
LoRA provides several key advantages over traditional fine-tuning approaches:
Lower Total Cost of Ownership
LoRA reduces fine-tuning costs significantly by avoiding full base model fine-tuning. This efficiency makes it cost-effective to customize models at scale.
Rapid Deployment with High Performance
Smaller, task-specific LoRA adapters can match or exceed the performance of fully fine-tuned models while delivering faster inference. This translates to quicker experimentation, iteration, and real-world impact.
Non-Invasive Model Adaptation
Since LoRA adapters don't require changes to the base model, you avoid the complexity and liability of managing and validating a fully retrained system. Adapters are modular, independently versioned, and easily replaceable as your data evolves—simplifying governance and compliance.
Full Control, Less Risk
Customers keep control of how and when updates happen—no retraining, no surprise behavior changes. Just lightweight, swappable adapters that fit into existing systems with minimal disruption. And with self-service APIs, updating adapters is quick, intuitive, and doesn't require heavy engineering lift.
Groq supports LoRAs through two deployment options:
Pay-per-token usage model with no dedicated hardware requirements, ideal for customers with a small number of LoRA adapters across different tasks like customer support, document summarization, and translation.
Deployed on dedicated Groq hardware instances purchased by the customer, providing optimized performance for multiple LoRA adapters and consistent inference speeds, best suited for high-traffic scenarios or customers serving personalized adapters to many end users.
LoRA support is currently available for the following models:
Model ID | Model | Base Model |
---|---|---|
llama-3.1-8b-instant | Llama 3.1 8B | meta-llama/Llama-3.1-8B-Instruct |
Please reach out to our enterprise support team for additional model support.
Please reach out to our enterprise support team for pricing.
To begin using LoRA on GroqCloud:
Important: You must fine-tune the exact base model versions that Groq supports for your LoRA adapters to work properly.
Once you have access to LoRA, you can upload and deploy your adapters using Groq's Fine-Tuning API. This process involves two API calls: one to upload your LoRA adapter files and another to register them as a fine-tuned model. When you upload your LoRA adapters, Groq will store and process your files to provide this service. LoRA adapters are your Customer Data and will only be available for your organization's use.
Note: Cold start times are proportional to the LoRA rank. Higher ranks (32, 64) will take longer to load initially but have no impact on inference performance once loaded.
Create a ZIP file containing exactly these 2 files:
adapter_model.safetensors
- A safetensors file containing your LoRA weights in float16 formatadapter_config.json
- A JSON configuration file with required fields:
"lora_alpha"
: (integer or float) The LoRA alpha parameter"r"
: (integer) The rank of your LoRA adapter (must be 8, 16, 32, or 64)Upload your ZIP file to the /files
endpoint with purpose="fine_tuning"
:
curl --location 'https://api.groq.com/openai/v1/files' \
--header "Authorization: Bearer ${TOKEN}" \
--form "file=@<file-name>.zip" \
--form 'purpose="fine_tuning"'
This returns a file ID that you'll use in the next step:
{
"id": "file_01jxnqc8hqebx343rnkyxw47e",
"object": "file",
"bytes": 155220077,
"created_at": 1749854594,
"filename": "<file-name>.zip",
"purpose": "fine_tuning"
}
Use the file ID to register your LoRA adapter as a fine-tuned model:
curl --location 'https://api.groq.com/v1/fine_tunings' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${TOKEN}" \
--data '{
"input_file_id": "<file-id>",
"name": "my-lora-adapter",
"type": "lora",
"base_model": "llama-3.1-8b-instant"
}'
This returns your unique model ID:
{
"id": "ft_01jxx7abvdf6pafdthfbfmb9gy",
"object": "fine_tuning",
"data": {
"name": "my-lora-adapter",
"base_model": "llama-3.1-8b-instant",
"type": "lora",
"fine_tuned_model": "ft:llama-3.1-8b-instant:org_01hqed9y3fexcrngzqm9qh6ya9/my-lora-adapter-ef36419a0010"
}
}
Use the returned fine_tuned_model
ID in your inference requests just like any other model:
curl --location 'https://api.groq.com/openai/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer ${TOKEN}" \
--data '{
"model": "ft:llama-3.1-8b-instant:org_01hqed9y3fexcrngzqm9qh6ya9/my-lora-adapter-ef36419a0010",
"messages": [
{
"role": "user",
"content": "Your prompt here"
}
]
}'
No. Groq provides LoRA inference services only. Customers must create their LoRA adapters externally using fine-tuning providers or tools (e.g., Hugging Face PEFT, Unsloth, or custom solutions) and then upload their pre-made adapters to Groq for inference. You must fine-tune the exact base model versions that Groq supports.
Not at this time. LoRA support is currently exclusive to enterprise tier customers. Stay tuned for updates.
Stay tuned for further updates on recommended fine-tuning providers.
Contact our enterprise team to discuss your LoRA requirements and get started.
Your uploaded LoRA adapter files are stored and accessible solely to your organization for the entire time you use the LoRAs service. This service is not available currently for use with regional / sovereign endpoints.
adapter_model.safetensors
file