Prerequisites
To use Cerebras with Agno, you need to:-
Install the required packages:
-
Set your API key:
The Cerebras SDK expects your API key to be available as an environment variable:
Basic Usage
Here’s how to use a Cerebras model with Agno:Supported Models
Cerebras currently supports a variety of production models (see docs for the latest list).Configuration Options
TheCerebras class accepts the following parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
id | str | Model identifier (e.g., “llama-3.3-70b”) | Required |
name | str | Display name for the model | ”Cerebras” |
provider | str | Provider name | ”Cerebras” |
api_key | Optional[str] | API key (falls back to CEREBRAS_API_KEY env var) | None |
temperature | float | Sampling temperature | 0.7 |
top_p | float | Top-p sampling value | 1.0 |
request_params | Optional[Dict[str, Any]] | Additional request parameters | None |
Example with Custom Parameters
Resources
- Cerebras Inference Documentation
- Cerebras API Reference
- Agno Documentation
- Agno GitHub Repository
- Migrate to GLM4.6
FAQ
What makes Cerebras different for Agno agents?
What makes Cerebras different for Agno agents?
Cerebras provides ultra-fast inference speeds (up to ~3000 tokens/second) powered by Wafer-Scale Engines and CS-3 systems. This makes your Agno agents significantly more responsive compared to traditional GPU-based inference, enabling real-time conversational experiences and rapid agent workflows.
Can I use streaming with Cerebras models in Agno?
Can I use streaming with Cerebras models in Agno?
Yes! Cerebras models support streaming responses. You can enable streaming by setting
stream=True when creating your agent, or use the agent.print_response() method which automatically streams output to the terminal.Which Cerebras model should I choose for my agent?
Which Cerebras model should I choose for my agent?
- llama-3.3-70b: Best for complex reasoning, detailed responses, and general-purpose tasks
- qwen-3-32b: Excellent for multilingual tasks, coding, and technical content
- llama3.1-8b: Fast and efficient for simpler tasks and quick responses
- gpt-oss-120b: Large open-source model for diverse applications with highest speed
- zai-glm-4.6: Advanced 357B parameter model with strong reasoning capabilities
Troubleshooting
Import errors with Cerebras SDK
Import errors with Cerebras SDK
If you encounter import errors:
- Make sure the Cerebras SDK is installed:
pip install cerebras-cloud-sdk --upgrade - Verify you’re using Python 3.10 or higher:
python --version - Try creating a fresh virtual environment and reinstalling:
API key not found
API key not found
If you see API key errors:
- Verify your API key is set as an environment variable:
echo $CEREBRAS_API_KEY - Ensure the variable is exported in your current shell session
- Alternatively, pass the API key directly to the model:
Model not found errors
Model not found errors
If you see model not found errors:
- Verify the model ID matches one of the supported models:
llama-3.3-70b,qwen-3-32b,llama3.1-8b, orgpt-oss-120b - Check that your API key has access to the requested model
- Review the available models documentation for the latest model list

