Documentation
Speech to Text
Groq's Whisper API is capable of transcription and translation. Utilize our OpenAI-compatible endpoints to integrate high-quality audio processing directly into your applications.
API Endpoints
- Transcriptions: Convert audio to text.
https://api.groq.com/openai/v1/audio/transcriptions
- Translations: Translate audio to English text.
https://api.groq.com/openai/v1/audio/translations
Supported Models
- Model ID:
whisper-large-v3
This model provides state-of-the-art performance for both transcription and translation tasks.
Audio file limitations
- File uploads are limited to 25 MB
- The following input file types are supported:
mp3
,mp4
,mpeg
,mpga
,m4a
,wav
, andwebm
- If a file contains multiple audio tracks, for example a video with dubs, only the first track will be transcribed
Whisper will downsample audio to 16,000 Hz mono before transcribing. This preprocessing can be performed client-side to reduce file size and allow longer files to be uploaded to groq. The following ffmpeg command can be used to reduce file size:
ffmpeg \
-i <your file> \
-ar 16000 \
-ac 1 \
-map 0:a: \
<output file name>
Transcription Usage
Transcribe spoken words in audio or video files.
Optional Parameters:
prompt
: Provide context or specify how to spell unfamiliar wordsresponse_format
: Define the output response format.- Default is "json"
- Set to "verbose_json" to receive timestamps for audio segments
- Set to "text" to return a text response
- formats
vtt
andsrt
are not supported
temperature
: Specify a value between 0 and 1 to control the translation output.language
: Specify the language for transcription (optional; Whisper will auto-detect if not specified)- Use ISO 639-1 language codes (e.g., "en" for English, "fr" for French, etc.).
- Specifying a language may improve transcription accuracy and speed
timestamp_granularities[]
is not supported
Code Overview
pip install groq
import os
from groq import Groq
client = Groq()
filename = os.path.dirname(__file__) + "/sample_audio.m4a"
with open(filename, "rb") as file:
transcription = client.audio.transcriptions.create(
file=(filename, file.read()),
model="whisper-large-v3",
prompt="Specify context or spelling", # Optional
response_format="json", # Optional
language="en", # Optional
temperature=0.0 # Optional
)
print(transcription.text)
Translation Usage
Translate spoken words in audio or video files to English.
Optional Parameters:
prompt
: Provide context or specify how to spell unfamiliar wordsresponse_format
: Define the output response format- Default is "json"
- Set to "verbose_json" to receive timestamps for audio segments
- Set to "text" to return a text response
- formats
vtt
andsrt
are not supported
temperature
: Specify a value between 0 and 1 to control the translation output
Code Overview
pip install groq
import os
from groq import Groq
client = Groq()
filename = os.path.dirname(__file__) + "/sample_audio.m4a"
with open(filename, "rb") as file:
translation = client.audio.translations.create(
file=(filename, file.read()),
model="whisper-large-v3",
prompt="Specify context or spelling", # Optional
response_format="json", # Optional
temperature=0.0 # Optional
)
print(translation.text)