Get Started with AG2

What is AG2?

AG2 (formerly AutoGen) is an open-source framework for building and orchestrating multi-agent AI workflows. It enables developers to create sophisticated AI systems where multiple agents collaborate, reason, and solve complex problems together. Learn more at AG2 documentation. By integrating Cerebras Inference with AG2, you combine AG2’s powerful agent orchestration with Cerebras’s ultra-fast inference speeds, making it ideal for real-time multi-agent applications.

Prerequisites

Before you begin, ensure you have:

Cerebras API Key - Get a free API key here.
Python 3.10 or higher - AG2 requires Python 3.10+. Check your version with python --version.

Configure AG2

Install AG2 with Cerebras support

AG2 provides a dedicated extra for Cerebras integration. Install it using pip:

pip install 'ag2[cerebras]'

This installs AG2 along with the necessary dependencies to communicate with Cerebras’s API. If you’re upgrading from an existing installation, use pip install -U ag2[cerebras].

Set up your API key

Add your Cerebras API key to your environment variables:

export CEREBRAS_API_KEY="your-cerebras-api-key-here"

For Windows users:

set CEREBRAS_API_KEY=your-cerebras-api-key-here

Alternatively, create a .env file in your project directory:

CEREBRAS_API_KEY=your-cerebras-api-key-here

Create your configuration file

AG2 uses a configuration list to define which models and APIs to use. Create a file named OAI_CONFIG_LIST.json in your project directory:

[
  {
    "model": "llama-3.3-70b",
    "api_key": "your-cerebras-api-key-here",
    "api_type": "cerebras"
  },
  {
    "model": "llama3.1-8b",
    "api_key": "your-cerebras-api-key-here",
    "api_type": "cerebras"
  }
]

The api_type: "cerebras" parameter tells AG2 to use the Cerebras client, which handles API communication and tracks token usage for cost monitoring.

Build Your First Multi-Agent System

Let’s create a simple two-agent system where a user proxy agent and an assistant agent collaborate to solve a coding problem.

Import required modules

Start by importing AG2’s agent classes and configuration utilities:

import os
from autogen import AssistantAgent, UserProxyAgent

Configure your LLM

Set up the configuration to use Cerebras models with your API key:

import os

# Configure using environment variable
config_list = [
    {
        "model": "llama-3.3-70b",
        "api_key": os.getenv("CEREBRAS_API_KEY"),
        "api_type": "cerebras"
    }
]

# Configure LLM settings
llm_config = {
    "config_list": config_list,
    "temperature": 0.7,
}

You can add multiple models to the config_list for automatic fallback if one is unavailable.

Create your agents

Create two agents: an assistant that writes code and a user proxy that executes it:

import os
from autogen import AssistantAgent, UserProxyAgent

# Configure using environment variable
config_list = [
    {
        "model": "llama-3.3-70b",
        "api_key": os.getenv("CEREBRAS_API_KEY"),
        "api_type": "cerebras"
    }
]
llm_config = {"config_list": config_list, "temperature": 0.7}

# Create an assistant agent
assistant = AssistantAgent(
    name="assistant",
    llm_config=llm_config,
    system_message="You are a helpful AI assistant that writes Python code to solve problems."
)

# Create a user proxy agent
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,
    },
    llm_config=llm_config,
)

The AssistantAgent generates responses using Cerebras models, while the UserProxyAgent can execute code automatically.

Setting use_docker: False executes code directly on your machine. For production use, consider enabling Docker for better security isolation.

Start the conversation

Initiate a conversation between the agents with a coding task:

import os
from autogen import AssistantAgent, UserProxyAgent

# Configure using environment variable
config_list = [
    {
        "model": "llama-3.3-70b",
        "api_key": os.getenv("CEREBRAS_API_KEY"),
        "api_type": "cerebras"
    }
]
llm_config = {"config_list": config_list, "temperature": 0.7}

# Create agents
assistant = AssistantAgent(
    name="assistant",
    llm_config=llm_config,
    system_message="You are a helpful AI assistant that writes Python code to solve problems."
)
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={"work_dir": "coding", "use_docker": False},
    llm_config=llm_config,
)
# Start a conversation to plan a Paris trip
user_proxy.initiate_chat(
    assistant,
    message="Write a Python function called get_paris_attractions() that returns a list of the top 5 must-visit attractions in Paris. Call the function and print each attraction. Then say TERMINATE."
)

The assistant will write a function that returns Paris attractions, execute it, and display the results.

Complete example

Here’s the full working example:

import os
from autogen import AssistantAgent, UserProxyAgent

# Configure using environment variable
config_list = [
    {
        "model": "llama-3.3-70b",
        "api_key": os.getenv("CEREBRAS_API_KEY"),
        "api_type": "cerebras"
    }
]

llm_config = {
    "config_list": config_list,
    "temperature": 0.7,
}

# Create assistant agent with strict instructions
assistant = AssistantAgent(
    name="assistant",
    llm_config=llm_config,
    system_message="You are a Python calculator. Only write simple math code. No databases, no imports except basic math. Always end with TERMINATE after showing result."
)

# Create user proxy agent with strict limits
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=2,  # Strict limit: only 2 replies
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,
    },
    llm_config=llm_config,
)

# Start the conversation with a very simple, direct task
user_proxy.initiate_chat(
    assistant,
    message="Calculate 32 * 32. Just write the simple calculation and print result. Then say TERMINATE. No extra explanations."
)

This example asks the assistant to write code that computes 32 × 32. The user proxy executes the code and displays the result (1024). The conversation ends when the assistant says “TERMINATE”.

Available Models

Cerebras offers several high-performance models optimized for different use cases:

Model	Parameters	Best For
llama-3.3-70b	70B	Best for complex reasoning, long-form content, and tasks requiring deep understanding
qwen-3-32b	32B	Balanced performance for general-purpose applications
llama3.1-8b	8B	Fastest option for simple tasks and high-throughput scenarios
gpt-oss-120b	120B	Largest model for the most demanding tasks
zai-glm-4.7	357B	Advanced 357B parameter model with strong reasoning capabilities

For multi-agent systems with complex interactions, we recommend starting with llama-3.3-70b.

Advanced Configuration

Configure Model Parameters

Cerebras supports several parameters to fine-tune model behavior. Add these to your configuration:

[
  {
    "model": "llama-3.3-70b",
    "api_key": "your-cerebras-api-key-here",
    "api_type": "cerebras",
    "max_tokens": 10000,
    "seed": 1234,
    "stream": true,
    "temperature": 0.7,
    "top_p": 0.9
  }
]

Parameter descriptions:

max_tokens: Maximum number of tokens to generate (integer ≥ 0)
seed: Random seed for reproducible outputs (integer)
stream: Enable streaming responses (true/false)
temperature: Controls randomness, 0 to 1.5 (lower = more focused)
top_p: Nucleus sampling threshold, 0 to 1 (alternative to temperature)

Set either temperature or top_p, but not both, as they control similar aspects of generation.

Use Environment Variables

Reference environment variables in your configuration file for better security:

[
  {
    "model": "llama-3.3-70b",
    "api_key": "${CEREBRAS_API_KEY}",
    "api_type": "cerebras"
  }
]

AG2 automatically substitutes the environment variable value.

Configure Multiple Model Fallback

Set up multiple models for automatic fallback if one is unavailable:

[
  {
    "model": "llama-3.3-70b",
    "api_key": "${CEREBRAS_API_KEY}",
    "api_type": "cerebras"
  },
  {
    "model": "llama3.1-8b",
    "api_key": "${CEREBRAS_API_KEY}",
    "api_type": "cerebras"
  }
]

AG2 tries models in order until one succeeds.

Track Token Usage and Costs

The Cerebras client automatically tracks token usage. After running agent conversations, you can access statistics through AG2’s built-in cost tracking using gather_usage_summary([assistant, user_proxy]) to get detailed token usage and cost information for your agents.

Frequently Asked Questions

How do I fix 'API Key Not Found' errors?

If you see an error about missing API keys:

Verify your environment variable is set: echo $CEREBRAS_API_KEY
Check that your .env file is in the correct directory
Restart your terminal after setting environment variables
Try hardcoding the key temporarily to isolate the issue

What if my model isn't responding?

If a model isn’t responding:

Verify the model name matches exactly (case-sensitive)
Check Cerebras model availability
Try a different model from your configuration list
Ensure your API key has access to the requested model

Why is code execution failing?

If the user proxy agent can’t execute code:

Check that the work_dir exists or can be created
Verify required Python packages are installed (e.g., matplotlib)
Review the code_execution_config settings
Consider enabling Docker: "use_docker": True

How can I improve slow response times?

If agents are responding slowly:

Try a smaller, faster model like llama3.1-8b
Reduce max_tokens in your configuration
Enable streaming with "stream": true
Check your network connection to Cerebras’s API

Can I use AG2 with other Cerebras models?

Yes! AG2 supports all Cerebras models. Simply update the model field in your configuration to use gpt-oss-120b, qwen-3-32b, or any other available model.

Next Steps

Now that you have AG2 configured with Cerebras, explore these advanced capabilities:

Build Complex Workflows - Create multi-agent systems with specialized roles (researcher, coder, reviewer)
Add Human-in-the-Loop - Set human_input_mode="ALWAYS" to review agent actions before execution
Explore Group Chat - Use AG2’s GroupChat feature to orchestrate conversations between multiple agents
Try Different Models - Experiment with different Cerebras models to find the best performance/cost balance
Want to migrate to GLM4.7? GLM4.7 migration guide

Get Started

Capabilities

Compatibility

Resources

Support

What is AG2?

Prerequisites

Configure AG2

Build Your First Multi-Agent System

Available Models

Advanced Configuration

Configure Model Parameters

Use Environment Variables

Configure Multiple Model Fallback

Track Token Usage and Costs

Frequently Asked Questions

Next Steps

Additional Resources

Get Started

Capabilities

Compatibility

Resources

Support

​What is AG2?

​Prerequisites

​Configure AG2

​Build Your First Multi-Agent System

​Available Models

​Advanced Configuration

​Configure Model Parameters

​Use Environment Variables

​Configure Multiple Model Fallback

​Track Token Usage and Costs

​Frequently Asked Questions

​Next Steps

​Additional Resources

What is AG2?

Prerequisites

Configure AG2

Build Your First Multi-Agent System

Available Models

Advanced Configuration

Configure Model Parameters

Use Environment Variables

Configure Multiple Model Fallback

Track Token Usage and Costs

Frequently Asked Questions

Next Steps

Additional Resources