Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- ElevenLabs API Key - Visit ElevenLabs and create an account. Navigate to your profile settings to generate an API key.
- Python 3.10 or higher - Required for running the integration code.
Configure ElevenLabs Integration
Install required dependencies
Install the necessary Python packages for both Cerebras Inference and ElevenLabs:The
openai package provides the client for Cerebras Inference (OpenAI-compatible), and elevenlabs is the official ElevenLabs SDK for voice synthesis.Audio playback requirement: To play audio files, you may need to install FFmpeg:
- macOS:
brew install ffmpeg - Windows: Download from ffmpeg.org or use
choco install ffmpeg - Linux:
sudo apt install ffmpeg(Ubuntu/Debian) orsudo yum install ffmpeg(CentOS/RHEL)
Configure environment variables
Create a Alternatively, you can set these as environment variables in your shell:
.env file in your project directory to securely store your API keys:Initialize the Cerebras client
Set up the Cerebras client using the OpenAI-compatible interface. The integration header helps us track and optimize this integration:
Create a basic text-to-speech pipeline
Now let’s create a complete pipeline that generates text with Cerebras and converts it to speech with ElevenLabs. This example demonstrates the power of combining Cerebras’s fast inference with ElevenLabs’s natural voice synthesis:
Build a conversational voice agent
For a more advanced use case, here’s how to build a multi-turn conversational agent that maintains context across multiple interactions:This voice agent maintains conversation context and provides natural, spoken responses using Cerebras’s fast inference and ElevenLabs’s voice synthesis.
Voice Selection
ElevenLabs offers a variety of pre-made voices. Here are some popular options:- Rachel (21m00Tcm4TlvDq8ikWAM) - Calm, professional female voice
- Adam (pNInz6obpgDQGcFmaJgB) - Deep, authoritative male voice
- Bella (EXAVITQu4vr4xnSDxMaL) - Soft, friendly female voice
- Antoni (ErXwobaYiN019PkySvjV) - Well-rounded male voice
Use Cases
The Cerebras + ElevenLabs integration is perfect for:- Voice Assistants - Build responsive AI assistants with natural conversation flow
- Content Creation - Generate and narrate articles, stories, or educational content
- Customer Service - Create automated voice support systems with human-like responses
- Accessibility Tools - Convert text content to speech for visually impaired users
- Interactive Experiences - Build voice-enabled games, tours, or educational apps
- Podcast Generation - Automatically create podcast episodes from text content
FAQ
Audio playback not working
Audio playback not working
If you’re having trouble playing audio:
- Ensure you have audio output devices properly configured
- Try saving the audio to a file instead of playing directly:
- Install additional audio libraries if needed:
pip install sounddevice soundfile
High latency in responses
High latency in responses
To reduce latency:
- Use streaming for both text generation and audio synthesis (see Step 6)
- Keep responses concise by setting lower
max_completion_tokensvalues - Use faster Cerebras models like
llama3.1-8bfor simpler tasks - Consider caching common responses
Available Models
Cerebras offers several models optimized for voice AI applications:| Model | Parameters | Best For |
|---|---|---|
| llama-3.3-70b | 70B | Best for complex reasoning, long-form content, and tasks requiring deep understanding |
| qwen-3-32b | 32B | Balanced performance for general-purpose applications |
| llama3.1-8b | 8B | Fastest option for simple tasks and high-throughput scenarios |
| gpt-oss-120b | 120B | Largest model for the most demanding tasks |
| zai-glm-4.7 | 357B | Advanced 357B parameter model with strong reasoning capabilities |
model parameter in your Cerebras API calls to switch between models.
Next Steps
- Explore the ElevenLabs API documentation for advanced features like voice cloning and dubbing
- Try different Cerebras models like
qwen-3-32bfor specialized tasks - Experiment with streaming responses for even lower latency
- Learn about structured outputs to format responses for voice synthesis
- Check out the ElevenLabs Voice Library for more voice options
- Migrate to GLM4.7: Ready to upgrade? Follow our migration guide to start using our latest model

