Groq

Orpheus Text to Speech

Generate expressive, natural-sounding speech with vocal direction controls for dynamic audio output.

Overview

Orpheus text-to-speech models by Canopy Labs provide fast, high-quality audio generation with unique expressive capabilities. Both models offer multiple voices and low-latency inference, with the English model supporting vocal direction controls for expressive performances.

Supported Models

Groq hosts two specialized Orpheus models for different language needs:

Model IDDescriptionLanguageVocal Directions
canopylabs/orpheus-v1-english
Expressive English TTS with direction supportEnglish✅ Supported
canopylabs/orpheus-arabic-saudi
Authentic Saudi dialect synthesisArabic (Saudi)❌ Not Supported

Pricing

Model IDPrice
canopylabs/orpheus-v1-english
$22 / 1 million characters
canopylabs/orpheus-arabic-saudi
$40 / 1 million characters

API Endpoint

EndpointUsageAPI Endpoint
SpeechConvert text to audiohttps://api.groq.com/openai/v1/audio/speech

Quick Start

The speech endpoint accepts these parameters:

ParameterTypeRequiredDescription
modelstringYesModel ID: canopylabs/orpheus-v1-english or canopylabs/orpheus-arabic-saudi
inputstringYesText to convert to speech (max 200 characters). Use [directions] for vocal control.
voicestringYesVoice persona ID to use (see Available Voices)
response_formatstringOptionalAudio format. Defaults to "wav". The only supported format is "wav".

Basic Usage

English Model

# Install the Groq SDK:
# pip install groq

# English Model Example:
import os
from groq import Groq

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

speech_file_path = "orpheus-english.wav" 
model = "canopylabs/orpheus-v1-english"
voice = "troy"
text = "Welcome to Orpheus text-to-speech. [cheerful] This is an example of high-quality English audio generation with vocal directions support."
response_format = "wav"

response = client.audio.speech.create(
    model=model,
    voice=voice,
    input=text,
    response_format=response_format
)

response.write_to_file(speech_file_path)

Vocal Directions

Orpheus V1 English supports vocal directions using bracketed text like [cheerful] or [whisper] to control how the model speaks. This powerful feature enables everything from subtle conversational nuances to highly expressive character performances.

How Directions Work

  • More directions = more expressive, acted performance
  • Fewer/no directions = natural, casual conversational cadence
  • Use 1-2 word directions (typically adjectives or adverbs)

Common use cases:

  • Customer support: Use no directions for natural, friendly conversations
  • Game characters: Add expressive directions for dynamic, performative speech
  • Professional narration: Use [professionally] or [authoritatively] for business content
  • Storytelling: Combine multiple directions to create engaging narrative performances

Direction Examples

Conversational tones:

  • [cheerful], [friendly], [casual], [warm]

Professional styles:

  • [professionally], [authoritatively], [formally], [confidently]

Expressive performance:

  • [whisper], [excited], [dramatic], [deadpan], [sarcastic]

Vocal qualities:

  • [gravelly whisper], [rapid babbling], [singsong], [breathy]

Note: There isn't an official or exhaustive list of directions; the model recognizes many natural descriptors and ignores vague or unfamiliar ones.

Using Vocal Directions

Natural Conversation (No Directions)

For customer support, AI assistants, or natural dialogue, omit directions entirely. The model defaults to conversational, human-like cadence.

  • Example (Troy): "I see you ordered the Bose QuietComfort Ultra earbuds, order number 7829-XK-441, tracking ID H3J7L9C2F5V8, and yeah it looks like it's been stuck in transit since, uhh, Thursday the 8th."
  • Example (Autumn): "Okay so I'm looking at your account here and it shows you've got the Dell XPS 15 9530, is that right? Let me just pull up the warranty info real quick... yep that all looks good!"

Tip: Pure numbers like 203 are normalized to "two hundred and three." Use hyphens (2-0-3) for letter-by-letter reading.

Available Voices

English Voices

The English model includes six professionally-trained voice personas. Each voice has different strengths for expressive direction performance.

Voice NameVoice IDGender
AutumnautumnFemale
DianadianaFemale
HannahhannahFemale
AustinaustinMale
DanieldanielMale
TroytroyMale

Note: Some voices perform better with expressive directions than others. Experiment to find the voice that works best for your use case.

Autumn
0:000:00

Arabic Saudi Dialect Voices

The Arabic model offers four distinct Saudi dialect voices with authentic pronunciation and regional nuances:

Voice NameVoice IDGender
FahadfahadMale
SultansultanMale
LulwalulwaFemale
NouranouraFemale
Fahad
0:000:00

Use Cases

Customer Support & AI Assistants

Use no directions for natural, conversational interactions that feel human and approachable.

  • "I'm looking at your account here and everything seems to be in order. Let me just check that shipping status for you real quick."

Best for: Customer service bots, virtual assistants, FAQ systems

Best Practices

Punctuation control: Experiment with removing punctuation to give the model more freedom in choosing intonation patterns, especially for expressive performances.

Voice selection: Test different voices for your use case; some handle expressive directions better than others, particularly for complex emotional ranges.

Arabic considerations: Use proper Arabic script with diacritical marks. Test pronunciation with sample content before production deployment.

Limitations

Input length: The input text length is limited to 200 characters.

Batch processing: The batch processing API is not supported at this time for Orpheus models.

Was this page helpful?