Get Started with Weave

Prerequisites

Before you begin, ensure you have:

Cerebras API Key - Get a free API key here
Weights & Biases Account - Visit Weights & Biases and create an account or log in
Python 3.7 or higher

What is Weave?

Weave is W&B’s lightweight toolkit for tracking and evaluating LLM applications. It automatically captures traces of your LLM calls, including inputs, outputs, token usage, and latency. This makes it easy to debug issues, monitor performance, and iterate on your prompts and models. Key features when using Weave with Cerebras:

Automatic tracing of all Cerebras API calls
Version control for your prompts and code
Performance monitoring with detailed metrics
Evaluation framework for testing model outputs
Beautiful UI for exploring traces and debugging

Configure Weave

Install required dependencies

Install the Weave SDK and Cerebras Cloud SDK to get started:

pip install weave cerebras-cloud-sdk

Configure environment variables

Create a .env file in your project directory with your API keys. You can find your W&B API key in your W&B settings.

CEREBRAS_API_KEY=your-cerebras-api-key-here
WANDB_API_KEY=your-wandb-api-key-here

Initialize Weave and create your client

Weave needs to be initialized at the start of your script. This creates a project in W&B where all your traces will be logged.

import os
import weave
from cerebras.cloud.sdk import Cerebras

# Initialize Weave with your project name
weave.init("cerebras-quickstart")

# Create the Cerebras client with integration tracking
client = Cerebras(
    api_key=os.environ["CEREBRAS_API_KEY"],
    default_headers={"X-Cerebras-3rd-Party-Integration": "weave"}
)

The weave.init() call automatically starts tracking all LLM calls made through the Cerebras SDK. You don’t need to add any additional decorators or wrappers for basic tracing.

Make your first traced request

Now you can use the Cerebras SDK as usual. Weave will automatically capture all the details of your API calls, including the model used, messages sent, tokens consumed, and response time.

import os
import weave
from cerebras.cloud.sdk import Cerebras

# Initialize Weave (requires WANDB_API_KEY environment variable)
if os.getenv("WANDB_API_KEY"):
    weave.init("cerebras-quickstart")
else:
    print("WANDB_API_KEY not set - skipping Weave initialization")

# Create client with integration header
client = Cerebras(
    api_key=os.environ["CEREBRAS_API_KEY"],
    default_headers={"X-Cerebras-3rd-Party-Integration": "weave"}
)

# Make a request - Weave automatically traces this if initialized
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the fastest land animal?"}
    ],
    extra_headers={
        "X-Cerebras-3rd-Party-Integration": "weave"
    }
)

print(response.choices[0].message.content)

After running this code, visit your Weave dashboard to see the trace, including token usage, latency, and the full conversation.

Advanced Usage

Wrapping Functions with @weave.op

For more granular tracking, you can wrap your functions with the @weave.op decorator. This creates versioned operations that track inputs, outputs, and the code itself. This is especially useful when you want to track custom logic around your LLM calls.

import os
import weave
from cerebras.cloud.sdk import Cerebras

# Initialize Weave (requires WANDB_API_KEY environment variable)
if os.getenv("WANDB_API_KEY"):
    weave.init("cerebras-operations")
else:
    print("WANDB_API_KEY not set - skipping Weave initialization")

# Create client with integration header
client = Cerebras(
    api_key=os.environ["CEREBRAS_API_KEY"],
    default_headers={"X-Cerebras-3rd-Party-Integration": "weave"}
)

# Weave will track the inputs, outputs, and code of this function
@weave.op
def get_animal_speed(animal: str, model: str = "llama3.1-8b") -> str:
    """Get information about how fast an animal can run."""
    
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are a zoology expert."},
            {"role": "user", "content": f"How fast can a {animal} run? Provide the speed in mph."}
        ],
        extra_headers={
            "X-Cerebras-3rd-Party-Integration": "weave"
        }
    )
    return response.choices[0].message.content

# Each call is tracked with its inputs and outputs
result1 = get_animal_speed("cheetah")
print(result1)

The @weave.op decorator provides:

Automatic versioning - Code changes create new versions
Input/output tracking - All parameters and returns are logged
Call hierarchy - See how operations call each other
Performance metrics - Track execution time for each operation

Creating Weave Models

Weave Models are a powerful way to encapsulate your LLM logic with configurable parameters. They make it easy to experiment with different model configurations and track which settings produce the best results.

import os
import weave
from cerebras.cloud.sdk import Cerebras

# Initialize Weave (requires WANDB_API_KEY environment variable)
if os.getenv("WANDB_API_KEY"):
    weave.init("cerebras-models")
else:
    print("WANDB_API_KEY not set - skipping Weave initialization")

# Create client with integration header
client = Cerebras(
    api_key=os.environ["CEREBRAS_API_KEY"],
    default_headers={"X-Cerebras-3rd-Party-Integration": "weave"}
)

class AnimalSpeedModel(weave.Model):
    """A model for predicting animal speeds using Cerebras."""
    
    model: str
    temperature: float
    system_prompt: str = "You are a zoology expert specializing in animal locomotion."

    @weave.op
    def predict(self, animal: str) -> str:
        """Predict the top speed of an animal."""
        
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": self.system_prompt},
                {"role": "user", "content": f"What's the top speed of a {animal}? Provide a specific number in mph."}
            ],
            temperature=self.temperature,
            extra_headers={
                "X-Cerebras-3rd-Party-Integration": "weave"
            }
        )
        return response.choices[0].message.content

# Create an instance with specific configuration
speed_model = AnimalSpeedModel(
    model="llama-3.3-70b",
    temperature=0.3
)

# Make predictions
result = speed_model.predict(animal="cheetah")
print(result)

# Try with different configuration
speed_model_creative = AnimalSpeedModel(
    model="qwen-3-32b",
    temperature=0.8
)
result2 = speed_model_creative.predict(animal="cheetah")
print(result2)

Weave Models provide:

Configuration tracking - All model parameters are versioned
Easy experimentation - Compare different configurations side-by-side
Reproducibility - Exact model settings are saved with each prediction
Evaluation ready - Models can be easily evaluated with Weave’s evaluation framework

Available Models

Weave tracks all Cerebras models for comprehensive observability:

Model	Parameters	Best For
llama-3.3-70b	70B	Best for complex reasoning, long-form content, and tasks requiring deep understanding
qwen-3-32b	32B	Balanced performance for general-purpose applications
llama3.1-8b	8B	Fastest option for simple tasks and high-throughput scenarios
gpt-oss-120b	120B	Largest model for the most demanding tasks
zai-glm-4.7	357B	Advanced 357B parameter model with strong reasoning capabilities

Simply change the model parameter in your Cerebras API calls to switch between models.

Next Steps

Explore the Weave documentation for advanced features
Try different Cerebras models to compare performance
Set up evaluations to systematically test your prompts
Learn about Weave’s tracing capabilities for complex applications
Join the W&B Community to share your projects
Migrate to GLM4.7: Ready to upgrade? Follow our migration guide to start using our latest model

FAQ

Why aren't my traces appearing in the Weave dashboard?

If you don’t see traces in your Weave dashboard:

Verify that weave.init() is called before any Cerebras API calls
Check that your W&B API key is correctly set in your environment
Ensure you’re logged into the correct W&B account in your browser
Try running wandb login in your terminal to re-authenticate
Wait a few seconds after your script completes for traces to sync

What's the performance overhead of using Weave?

Weave is designed to be lightweight with minimal overhead. The tracing happens asynchronously, so it doesn’t significantly impact your API call latency. Most users see less than 10ms of additional overhead per traced call.

Can I use Weave with streaming responses from Cerebras?

Yes! Weave automatically handles streaming responses from the Cerebras SDK. The complete streamed response will be captured in the trace once the stream completes.

How do I disable Weave tracing temporarily?

You can disable tracing by simply not calling weave.init() at the start of your script. Alternatively, you can use environment variables to conditionally enable Weave:

import os
import weave

if os.getenv("ENABLE_WEAVE") == "true":
    weave.init("my-project")

For additional support, visit the Weave GitHub repository or reach out in the W&B Community forums.

Get Started

Capabilities

Compatibility

Resources

Support

Prerequisites

What is Weave?

Configure Weave

Advanced Usage

Wrapping Functions with @weave.op

Creating Weave Models

Available Models

Next Steps

FAQ

Get Started

Capabilities

Compatibility

Resources

Support

​Prerequisites

​What is Weave?

​Configure Weave

​Advanced Usage

​Wrapping Functions with @weave.op

​Creating Weave Models

​Available Models

​Next Steps

​FAQ

Prerequisites

What is Weave?

Configure Weave

Advanced Usage

Wrapping Functions with @weave.op

Creating Weave Models

Available Models

Next Steps

FAQ