Orpheus Text to Speech

Generate expressive, natural-sounding speech with vocal direction controls for dynamic audio output.

Overview

Orpheus text-to-speech models by Canopy Labs provide fast, high-quality audio generation with unique expressive capabilities. Both models offer multiple voices and low-latency inference, with the English model supporting vocal direction controls for expressive performances.

Supported Models

Groq hosts two specialized Orpheus models for different language needs:

Model ID	Description	Language	Vocal Directions
`canopylabs/orpheus-v1-english`	Expressive English TTS with direction support	English	✅ Supported
`canopylabs/orpheus-arabic-saudi`	Authentic Saudi dialect synthesis	Arabic (Saudi)	❌ Not Supported

Pricing

Model ID	Price
`canopylabs/orpheus-v1-english`	$22 / 1 million characters
`canopylabs/orpheus-arabic-saudi`	$40 / 1 million characters

API Endpoint

Endpoint	Usage	API Endpoint
Speech	Convert text to audio	`https://api.groq.com/openai/v1/audio/speech`

Quick Start

The speech endpoint accepts these parameters:

Parameter	Type	Required	Description
`model`	string	Yes	Model ID: `canopylabs/orpheus-v1-english` or `canopylabs/orpheus-arabic-saudi`
`input`	string	Yes	Text to convert to speech (max 200 characters). Use `[directions]` for vocal control.
`voice`	string	Yes	Voice persona ID to use (see Available Voices)
`response_format`	string	Optional	Audio format. Defaults to `"wav"`. The only supported format is `"wav"`.

Basic Usage

English Model

# Install the Groq SDK:
# pip install groq

# English Model Example:
import os
from groq import Groq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

speech_file_path = "orpheus-english.wav" 
model = "canopylabs/orpheus-v1-english"
voice = "troy"
text = "Welcome to Orpheus text-to-speech. [cheerful] This is an example of high-quality English audio generation with vocal directions support."
response_format = "wav"

response = client.audio.speech.create(
    model=model,
    voice=voice,
    input=text,
    response_format=response_format
)

response.write_to_file(speech_file_path)

// Install the Groq SDK:
// npm install --save groq-sdk

// English Model Example:
import fs from "fs";
import Groq from 'groq-sdk';

const groq = new Groq({
  apiKey: process.env.GROQ_API_KEY
});

const speechFilePath = "orpheus-english.wav";
const model = "canopylabs/orpheus-v1-english";
const voice = "hannah";
const text = "Welcome to Orpheus text-to-speech. [cheerful] This is an example of high-quality English audio generation with vocal directions support.";
const responseFormat = "wav";

async function main() {
  const response = await groq.audio.speech.create({
    model: model,
    voice: voice,
    input: text,
    response_format: responseFormat
  });
  
  const buffer = Buffer.from(await response.arrayBuffer());
  await fs.promises.writeFile(speechFilePath, buffer);
  
  console.log(`Orpheus English speech generated: ${speechFilePath}`);
}

main().catch((error) => {
  console.error('Error generating speech:', error);
});

curl https://api.groq.com/openai/v1/audio/speech \
  -X POST \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-v1-english",
    "input": "Welcome to Orpheus text-to-speech. [cheerful] This is an example of high-quality English audio generation with vocal directions support.",
    "voice": "austin",
    "response_format": "wav"
  }' \
  --output orpheus-english.wav

Arabic Saudi Dialect Model

import os
from groq import Groq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

speech_file_path = "orpheus-arabic.wav" 
model = "canopylabs/orpheus-arabic-saudi"
voice = "fahad"
text = "مرحبا بكم في نموذج أورفيوس للتحويل من النص إلى الكلام. هذا مثال على جودة الصوت العربية السعودية الطبيعية."
response_format = "wav"

response = client.audio.speech.create(
    model=model,
    voice=voice,
    input=text,
    response_format=response_format
)

response.write_to_file(speech_file_path)

import fs from "fs";
import Groq from 'groq-sdk';

const groq = new Groq({
  apiKey: process.env.GROQ_API_KEY
});

const speechFilePath = "orpheus-arabic.wav";
const model = "canopylabs/orpheus-arabic-saudi";
const voice = "lulwa";
const text = "مرحبا بكم في نموذج أورفيوس للتحويل من النص إلى الكلام. هذا مثال على جودة الصوت العربية السعودية الطبيعية.";
const responseFormat = "wav";

async function main() {
  const response = await groq.audio.speech.create({
    model: model,
    voice: voice,
    input: text,
    response_format: responseFormat
  });
  
  const buffer = Buffer.from(await response.arrayBuffer());
  await fs.promises.writeFile(speechFilePath, buffer);
  
  console.log(`Orpheus Arabic speech generated: ${speechFilePath}`);
}

main().catch((error) => {
  console.error('Error generating Arabic speech:', error);
});

curl https://api.groq.com/openai/v1/audio/speech \
  -X POST \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-arabic-saudi",
    "input": "مرحبا بكم في نموذج أورفيوس للتحويل من النص إلى الكلام. هذا مثال على جودة الصوت العربية السعودية الطبيعية.",
    "voice": "noura",
    "response_format": "wav"
  }' \
  --output orpheus-arabic.wav

Vocal Directions

Orpheus V1 English supports vocal directions using bracketed text like [cheerful] or [whisper] to control how the model speaks. This powerful feature enables everything from subtle conversational nuances to highly expressive character performances.

How Directions Work

More directions = more expressive, acted performance
Fewer/no directions = natural, casual conversational cadence
Use 1-2 word directions (typically adjectives or adverbs)

Common use cases:

Customer support: Use no directions for natural, friendly conversations
Game characters: Add expressive directions for dynamic, performative speech
Professional narration: Use [professionally] or [authoritatively] for business content
Storytelling: Combine multiple directions to create engaging narrative performances

Direction Examples

Conversational tones:

[cheerful], [friendly], [casual], [warm]

Professional styles:

[professionally], [authoritatively], [formally], [confidently]

Expressive performance:

[whisper], [excited], [dramatic], [deadpan], [sarcastic]

Vocal qualities:

[gravelly whisper], [rapid babbling], [singsong], [breathy]

Note: There isn't an official or exhaustive list of directions; the model recognizes many natural descriptors and ignores vague or unfamiliar ones.

Using Vocal Directions

Natural Conversation (No Directions)

For customer support, AI assistants, or natural dialogue, omit directions entirely. The model defaults to conversational, human-like cadence.

Example (Troy): "I see you ordered the Bose QuietComfort Ultra earbuds, order number 7829-XK-441, tracking ID H3J7L9C2F5V8, and yeah it looks like it's been stuck in transit since, uhh, Thursday the 8th."
Example (Autumn): "Okay so I'm looking at your account here and it shows you've got the Dell XPS 15 9530, is that right? Let me just pull up the warranty info real quick... yep that all looks good!"

Tip: Pure numbers like 203 are normalized to "two hundred and three." Use hyphens (2-0-3) for letter-by-letter reading.

Available Voices

English Voices

The English model includes six professionally-trained voice personas. Each voice has different strengths for expressive direction performance.

Voice Name	Voice ID	Gender
Autumn	`autumn`	Female
Diana	`diana`	Female
Hannah	`hannah`	Female
Austin	`austin`	Male
Daniel	`daniel`	Male
Troy	`troy`	Male

Note: Some voices perform better with expressive directions than others. Experiment to find the voice that works best for your use case.

Autumn

0:000:00

Arabic Saudi Dialect Voices

The Arabic model offers four distinct Saudi dialect voices with authentic pronunciation and regional nuances:

Voice Name	Voice ID	Gender
Fahad	`fahad`	Male
Sultan	`sultan`	Male
Lulwa	`lulwa`	Female
Noura	`noura`	Female

Fahad

0:000:00

Use Cases

Customer Support & AI Assistants

Use no directions for natural, conversational interactions that feel human and approachable.

"I'm looking at your account here and everything seems to be in order. Let me just check that shipping status for you real quick."

Best for: Customer service bots, virtual assistants, FAQ systems

Best Practices

Punctuation control: Experiment with removing punctuation to give the model more freedom in choosing intonation patterns, especially for expressive performances.

Voice selection: Test different voices for your use case; some handle expressive directions better than others, particularly for complex emotional ranges.

Arabic considerations: Use proper Arabic script with diacritical marks. Test pronunciation with sample content before production deployment.

Limitations

Input length: The input text length is limited to 200 characters.

Batch processing: The batch processing API is not supported at this time for Orpheus models.

Getting Started

Core Features

Tools & Integrations

Compound (Agentic AI)

Guides

Service Tiers

Advanced

Production Readiness

Account and Console

Developer Resources

Legal