Skip to main content

What is AG2?

AG2 (formerly AutoGen) is an open-source framework for building and orchestrating multi-agent AI workflows. It enables developers to create sophisticated AI systems where multiple agents collaborate, reason, and solve complex problems together. Learn more at AG2 documentation. By integrating Cerebras Inference with AG2, you combine AG2’s powerful agent orchestration with Cerebras’s ultra-fast inference speeds, making it ideal for real-time multi-agent applications.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • Python 3.10 or higher - AG2 requires Python 3.10+. Check your version with python --version.

Configure AG2

1

Install AG2 with Cerebras support

AG2 provides a dedicated extra for Cerebras integration. Install it using pip:
pip install 'ag2[cerebras]'
This installs AG2 along with the necessary dependencies to communicate with Cerebras’s API. If you’re upgrading from an existing installation, use pip install -U ag2[cerebras].
2

Set up your API key

Add your Cerebras API key to your environment variables:
export CEREBRAS_API_KEY="your-cerebras-api-key-here"
For Windows users:
set CEREBRAS_API_KEY=your-cerebras-api-key-here
Alternatively, create a .env file in your project directory:
CEREBRAS_API_KEY=your-cerebras-api-key-here
3

Create your configuration file

AG2 uses a configuration list to define which models and APIs to use. Create a file named OAI_CONFIG_LIST.json in your project directory:
[
  {
    "model": "llama-3.3-70b",
    "api_key": "your-cerebras-api-key-here",
    "api_type": "cerebras"
  },
  {
    "model": "llama3.1-8b",
    "api_key": "your-cerebras-api-key-here",
    "api_type": "cerebras"
  }
]
The api_type: "cerebras" parameter tells AG2 to use the Cerebras client, which handles API communication and tracks token usage for cost monitoring.

Build Your First Multi-Agent System

Let’s create a simple two-agent system where a user proxy agent and an assistant agent collaborate to solve a coding problem.
1

Import required modules

Start by importing AG2’s agent classes and configuration utilities:
import os
from autogen import AssistantAgent, UserProxyAgent
2

Configure your LLM

Set up the configuration to use Cerebras models with your API key:
import os

# Configure using environment variable
config_list = [
    {
        "model": "llama-3.3-70b",
        "api_key": os.getenv("CEREBRAS_API_KEY"),
        "api_type": "cerebras"
    }
]

# Configure LLM settings
llm_config = {
    "config_list": config_list,
    "temperature": 0.7,
}
You can add multiple models to the config_list for automatic fallback if one is unavailable.
3

Create your agents

Create two agents: an assistant that writes code and a user proxy that executes it:
import os
from autogen import AssistantAgent, UserProxyAgent

# Configure using environment variable
config_list = [
    {
        "model": "llama-3.3-70b",
        "api_key": os.getenv("CEREBRAS_API_KEY"),
        "api_type": "cerebras"
    }
]
llm_config = {"config_list": config_list, "temperature": 0.7}

# Create an assistant agent
assistant = AssistantAgent(
    name="assistant",
    llm_config=llm_config,
    system_message="You are a helpful AI assistant that writes Python code to solve problems."
)

# Create a user proxy agent
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,
    },
    llm_config=llm_config,
)
The AssistantAgent generates responses using Cerebras models, while the UserProxyAgent can execute code automatically.
Setting use_docker: False executes code directly on your machine. For production use, consider enabling Docker for better security isolation.
4

Start the conversation

Initiate a conversation between the agents with a coding task:
import os
from autogen import AssistantAgent, UserProxyAgent

# Configure using environment variable
config_list = [
    {
        "model": "llama-3.3-70b",
        "api_key": os.getenv("CEREBRAS_API_KEY"),
        "api_type": "cerebras"
    }
]
llm_config = {"config_list": config_list, "temperature": 0.7}

# Create agents
assistant = AssistantAgent(
    name="assistant",
    llm_config=llm_config,
    system_message="You are a helpful AI assistant that writes Python code to solve problems."
)
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={"work_dir": "coding", "use_docker": False},
    llm_config=llm_config,
)
# Start a conversation to plan a Paris trip
user_proxy.initiate_chat(
    assistant,
    message="Write a Python function called get_paris_attractions() that returns a list of the top 5 must-visit attractions in Paris. Call the function and print each attraction. Then say TERMINATE."
)
The assistant will write a function that returns Paris attractions, execute it, and display the results.
5

Complete example

Here’s the full working example:
import os
from autogen import AssistantAgent, UserProxyAgent

# Configure using environment variable
config_list = [
    {
        "model": "llama-3.3-70b",
        "api_key": os.getenv("CEREBRAS_API_KEY"),
        "api_type": "cerebras"
    }
]

llm_config = {
    "config_list": config_list,
    "temperature": 0.7,
}

# Create assistant agent
assistant = AssistantAgent(
    name="assistant",
    llm_config=llm_config,
    system_message="You are a helpful AI assistant that writes Python code to solve problems."
)

# Create user proxy agent
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,
    },
    llm_config=llm_config,
)

# Start the conversation with a simple math task
user_proxy.initiate_chat(
    assistant,
    message="Write a Python function to calculate 32 * 32 and print the result. Then say TERMINATE."
)
This example asks the assistant to write code that computes 32 × 32. The user proxy executes the code and displays the result (1024). The conversation ends when the assistant says “TERMINATE”.

Available Models

Cerebras offers several high-performance models optimized for different use cases:
ModelParametersBest For
llama-3.3-70b70BBest for complex reasoning, long-form content, and tasks requiring deep understanding
qwen-3-32b32BBalanced performance for general-purpose applications
llama3.1-8b8BFastest option for simple tasks and high-throughput scenarios
gpt-oss-120b120BLargest model for the most demanding tasks
zai-glm-4.6357BAdvanced 357B parameter model with strong reasoning capabilities
For multi-agent systems with complex interactions, we recommend starting with llama-3.3-70b.

Advanced Configuration

Configure Model Parameters

Cerebras supports several parameters to fine-tune model behavior. Add these to your configuration:
[
  {
    "model": "llama-3.3-70b",
    "api_key": "your-cerebras-api-key-here",
    "api_type": "cerebras",
    "max_tokens": 10000,
    "seed": 1234,
    "stream": true,
    "temperature": 0.7,
    "top_p": 0.9
  }
]
Parameter descriptions:
  • max_tokens: Maximum number of tokens to generate (integer ≥ 0)
  • seed: Random seed for reproducible outputs (integer)
  • stream: Enable streaming responses (true/false)
  • temperature: Controls randomness, 0 to 1.5 (lower = more focused)
  • top_p: Nucleus sampling threshold, 0 to 1 (alternative to temperature)
Set either temperature or top_p, but not both, as they control similar aspects of generation.

Use Environment Variables

Reference environment variables in your configuration file for better security:
[
  {
    "model": "llama-3.3-70b",
    "api_key": "${CEREBRAS_API_KEY}",
    "api_type": "cerebras"
  }
]
AG2 automatically substitutes the environment variable value.

Configure Multiple Model Fallback

Set up multiple models for automatic fallback if one is unavailable:
[
  {
    "model": "llama-3.3-70b",
    "api_key": "${CEREBRAS_API_KEY}",
    "api_type": "cerebras"
  },
  {
    "model": "llama3.1-8b",
    "api_key": "${CEREBRAS_API_KEY}",
    "api_type": "cerebras"
  }
]
AG2 tries models in order until one succeeds.

Track Token Usage and Costs

The Cerebras client automatically tracks token usage. After running agent conversations, you can access statistics through AG2’s built-in cost tracking using gather_usage_summary([assistant, user_proxy]) to get detailed token usage and cost information for your agents.

Frequently Asked Questions

If you see an error about missing API keys:
  • Verify your environment variable is set: echo $CEREBRAS_API_KEY
  • Check that your .env file is in the correct directory
  • Restart your terminal after setting environment variables
  • Try hardcoding the key temporarily to isolate the issue
If a model isn’t responding:
  • Verify the model name matches exactly (case-sensitive)
  • Check Cerebras model availability
  • Try a different model from your configuration list
  • Ensure your API key has access to the requested model
If the user proxy agent can’t execute code:
  • Check that the work_dir exists or can be created
  • Verify required Python packages are installed (e.g., matplotlib)
  • Review the code_execution_config settings
  • Consider enabling Docker: "use_docker": True
If agents are responding slowly:
  • Try a smaller, faster model like llama3.1-8b
  • Reduce max_tokens in your configuration
  • Enable streaming with "stream": true
  • Check your network connection to Cerebras’s API
Yes! AG2 supports all Cerebras models. Simply update the model field in your configuration to use gpt-oss-120b, qwen-3-32b, or any other available model.

Next Steps

Now that you have AG2 configured with Cerebras, explore these advanced capabilities:
  • Build Complex Workflows - Create multi-agent systems with specialized roles (researcher, coder, reviewer)
  • Add Human-in-the-Loop - Set human_input_mode="ALWAYS" to review agent actions before execution
  • Explore Group Chat - Use AG2’s GroupChat feature to orchestrate conversations between multiple agents
  • Try Different Models - Experiment with different Cerebras models to find the best performance/cost balance
  • Want to migrate to GLM4.6? GLM4.6 migration guide

Additional Resources