Skip to main content
Cerebras Inference provides high-speed, low-latency AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. Agno integrates directly with the Cerebras Python SDK, allowing you to use state-of-the-art models with a simple interface. Agno is an open-source framework for building AI agents and agentic systems, designed for performance and simplicity.

Prerequisites

To use Cerebras with Agno, you need to:
  1. Install the required packages:
    pip install cerebras-cloud-sdk agno
    
  2. Set your API key: The Cerebras SDK expects your API key to be available as an environment variable:
    export CEREBRAS_API_KEY=your_api_key_here
    

Basic Usage

Here’s how to use a Cerebras model with Agno:
from agno.agent import Agent
from agno.models.cerebras import Cerebras

agent = Agent(
    model=Cerebras(id="llama-3.3-70b"),
    markdown=True,
)

# Print the response in the terminal
agent.print_response("write a two sentence horror story")

Supported Models

Cerebras currently supports a variety of production models (see docs for the latest list).

Configuration Options

The Cerebras class accepts the following parameters:
ParameterTypeDescriptionDefault
idstrModel identifier (e.g., “llama-3.3-70b”)Required
namestrDisplay name for the model”Cerebras”
providerstrProvider name”Cerebras”
api_keyOptional[str]API key (falls back to CEREBRAS_API_KEY env var)None
temperaturefloatSampling temperature0.7
top_pfloatTop-p sampling value1.0
request_paramsOptional[Dict[str, Any]]Additional request parametersNone

Example with Custom Parameters

from agno.agent import Agent
from agno.models.cerebras import Cerebras

agent = Agent(
    model=Cerebras(
        id="llama-3.3-70b",
        temperature=0.7
    ),
    markdown=True
)

agent.print_response("Explain quantum computing in simple terms")

Resources

FAQ

Cerebras provides ultra-fast inference speeds (up to ~3000 tokens/second) powered by Wafer-Scale Engines and CS-3 systems. This makes your Agno agents significantly more responsive compared to traditional GPU-based inference, enabling real-time conversational experiences and rapid agent workflows.
Yes! Cerebras models support streaming responses. You can enable streaming by setting stream=True when creating your agent, or use the agent.print_response() method which automatically streams output to the terminal.
  • llama-3.3-70b: Best for complex reasoning, detailed responses, and general-purpose tasks
  • qwen-3-32b: Excellent for multilingual tasks, coding, and technical content
  • llama3.1-8b: Fast and efficient for simpler tasks and quick responses
  • gpt-oss-120b: Large open-source model for diverse applications with highest speed
  • zai-glm-4.6: Advanced 357B parameter model with strong reasoning capabilities

Troubleshooting

If you encounter import errors:
  1. Make sure the Cerebras SDK is installed: pip install cerebras-cloud-sdk --upgrade
  2. Verify you’re using Python 3.10 or higher: python --version
  3. Try creating a fresh virtual environment and reinstalling:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install cerebras-cloud-sdk
    
If you see API key errors:
  1. Verify your API key is set as an environment variable: echo $CEREBRAS_API_KEY
  2. Ensure the variable is exported in your current shell session
  3. Alternatively, pass the API key directly to the model:
    from agno.models.cerebras import Cerebras
    
    model = Cerebras(id="llama-3.3-70b", api_key="your-api-key")
    
If you see model not found errors:
  1. Verify the model ID matches one of the supported models: llama-3.3-70b, qwen-3-32b, llama3.1-8b, or gpt-oss-120b
  2. Check that your API key has access to the requested model
  3. Review the available models documentation for the latest model list