Get Started with Agno

Cerebras Inference provides high-speed, low-latency AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. Agno integrates directly with the Cerebras Python SDK, allowing you to use state-of-the-art models with a simple interface. Agno is an open-source framework for building AI agents and agentic systems, designed for performance and simplicity.

Prerequisites

To use Cerebras with Agno, you need to:

Install the required packages:
```
pip install cerebras-cloud-sdk agno
```
Set your API key: The Cerebras SDK expects your API key to be available as an environment variable:
```
export CEREBRAS_API_KEY=your_api_key_here
```

Basic Usage

Here’s how to use a Cerebras model with Agno:

from agno.agent import Agent
from agno.models.cerebras import Cerebras

agent = Agent(
    model=Cerebras(
        id="llama-3.3-70b",
        default_headers={"X-Cerebras-3rd-Party-Integration": "agno"}
    ),
    markdown=True,
)

# Print the response in the terminal
agent.print_response("write a two sentence horror story")

Supported Models

Cerebras currently supports a variety of production models (see docs for the latest list).

Configuration Options

The Cerebras class accepts the following parameters:

Parameter	Type	Description	Default
`id`	str	Model identifier (e.g., “llama-3.3-70b”)	Required
`name`	str	Display name for the model	”Cerebras”
`provider`	str	Provider name	”Cerebras”
`api_key`	Optional[str]	API key (falls back to `CEREBRAS_API_KEY` env var)	None
`temperature`	float	Sampling temperature	0.7
`top_p`	float	Top-p sampling value	1.0
`request_params`	Optional[Dict[str, Any]]	Additional request parameters	None

Example with Custom Parameters

from agno.agent import Agent
from agno.models.cerebras import Cerebras

agent = Agent(
    model=Cerebras(
        id="llama-3.3-70b",
        temperature=0.7,
        default_headers={"X-Cerebras-3rd-Party-Integration": "agno"}
    ),
    markdown=True
)

agent.print_response("Explain quantum computing in simple terms")

Resources

FAQ

What makes Cerebras different for Agno agents?

Cerebras provides ultra-fast inference speeds (up to ~3000 tokens/second) powered by Wafer-Scale Engines and CS-3 systems. This makes your Agno agents significantly more responsive compared to traditional GPU-based inference, enabling real-time conversational experiences and rapid agent workflows.

Can I use streaming with Cerebras models in Agno?

Yes! Cerebras models support streaming responses. You can enable streaming by setting stream=True when creating your agent, or use the agent.print_response() method which automatically streams output to the terminal.

Which Cerebras model should I choose for my agent?

llama-3.3-70b: Best for complex reasoning, detailed responses, and general-purpose tasks
qwen-3-32b: Excellent for multilingual tasks, coding, and technical content
llama3.1-8b: Fast and efficient for simpler tasks and quick responses
gpt-oss-120b: Large open-source model for diverse applications with highest speed
zai-glm-4.7: Advanced 357B parameter model with strong reasoning capabilities

Troubleshooting

Import errors with Cerebras SDK

If you encounter import errors:

Make sure the Cerebras SDK is installed: pip install cerebras-cloud-sdk --upgrade
Verify you’re using Python 3.10 or higher: python --version

Try creating a fresh virtual environment and reinstalling:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install cerebras-cloud-sdk

API key not found

If you see API key errors:

Verify your API key is set as an environment variable: echo $CEREBRAS_API_KEY
Ensure the variable is exported in your current shell session

Alternatively, pass the API key directly to the model:

from agno.models.cerebras import Cerebras

model = Cerebras(
    id="llama-3.3-70b",
    api_key="your-api-key",
    request_params={"extra_headers": {"X-Cerebras-3rd-Party-Integration": "agno"}}
)

Model not found errors

If you see model not found errors:

Verify the model ID matches one of the supported models: llama-3.3-70b, qwen-3-32b, llama3.1-8b, or gpt-oss-120b
Check that your API key has access to the requested model
Review the available models documentation for the latest model list

Get Started

Capabilities

Compatibility

Resources

Support

Prerequisites

Basic Usage

Supported Models

Configuration Options

Example with Custom Parameters

Resources

FAQ

Troubleshooting

Get Started

Capabilities

Compatibility

Resources

Support

​Prerequisites

​Basic Usage

​Supported Models

​Configuration Options

​Example with Custom Parameters

​Resources

​FAQ

​Troubleshooting

Prerequisites

Basic Usage

Supported Models

Configuration Options

Example with Custom Parameters

Resources

FAQ

Troubleshooting