Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Cartesia API Key - Visit Cartesia and create an account. Navigate to your profile settings to generate an API key.
- Python 3.10 or higher - Required for running the integration code.
Configure Cartesia Integration
Install required dependencies
Install the necessary Python packages for both Cerebras Inference and Cartesia:The
openai package provides the client for Cerebras Inference (OpenAI-compatible), cartesia is the official Cartesia SDK for voice synthesis, and pyaudio enables real-time audio playback.macOS users: If you encounter errors installing
pyaudio, first install PortAudio with: brew install portaudioConfigure environment variables
Create a Alternatively, you can set these as environment variables in your shell:
.env file in your project directory to securely store your API keys:Initialize the Cerebras client
Set up the Cerebras client using the OpenAI-compatible interface. The integration header helps us track and optimize this integration:
Create a basic text-to-speech pipeline
Now let’s create a complete pipeline that generates text with Cerebras and converts it to speech with Cartesia. This example demonstrates the power of combining Cerebras’s fast inference with Cartesia’s ultra-low latency voice synthesis:
Build a conversational voice agent
For a more advanced use case, here’s how to build a multi-turn conversational agent that maintains context across multiple interactions:This voice agent maintains conversation context and provides natural, spoken responses using Cerebras’s fast inference and Cartesia’s voice synthesis.
Stream responses for lower latency
For even faster response times, you can stream the Cerebras output and generate speech in real-time. This provides the lowest possible latency for interactive voice applications:This streaming approach minimizes latency by starting audio playback as soon as the first chunks are available.
Available Models
Cerebras offers several models optimized for voice AI applications:| Model | Parameters | Best For |
|---|---|---|
| llama-3.3-70b | 70B | Best for complex reasoning, long-form content, and tasks requiring deep understanding |
| qwen-3-32b | 32B | Balanced performance for general-purpose applications |
| llama3.1-8b | 8B | Fastest option for simple tasks and high-throughput scenarios |
| gpt-oss-120b | 120B | Largest model for the most demanding tasks |
| zai-glm-4.7 | 357B | Advanced 357B parameter model with strong reasoning capabilities |
model parameter in your Cerebras API calls to switch between models.
Next Steps
Explore Voice Options
Browse Cartesia’s library of natural-sounding voices
Advanced Examples
See production voice agent implementations
API Documentation
Learn more about Cartesia’s API capabilities
Cerebras Models
Explore available Cerebras models for your use case
Additional Resources
- Cartesia Python SDK - Official Python client library
- Line SDK Documentation - Build production voice agents
- Cerebras Tool Use - Add function calling to your voice agents
- Migrate to GLM4.7 - Ready to upgrade? Follow our migration guide to start using our latest model

