Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here
- Weights & Biases Account - Visit Weights & Biases and create an account or log in
- Python 3.7 or higher
What is Weave?
Weave is W&B’s lightweight toolkit for tracking and evaluating LLM applications. It automatically captures traces of your LLM calls, including inputs, outputs, token usage, and latency. This makes it easy to debug issues, monitor performance, and iterate on your prompts and models. Key features when using Weave with Cerebras:- Automatic tracing of all Cerebras API calls
- Version control for your prompts and code
- Performance monitoring with detailed metrics
- Evaluation framework for testing model outputs
- Beautiful UI for exploring traces and debugging
Configure Weave
1
Install required dependencies
Install the Weave SDK and Cerebras Cloud SDK to get started:
2
Configure environment variables
Create a
.env file in your project directory with your API keys. You can find your W&B API key in your W&B settings.3
Initialize Weave and create your client
Weave needs to be initialized at the start of your script. This creates a project in W&B where all your traces will be logged.
The
weave.init() call automatically starts tracking all LLM calls made through the Cerebras SDK. You don’t need to add any additional decorators or wrappers for basic tracing.4
Make your first traced request
Now you can use the Cerebras SDK as usual. Weave will automatically capture all the details of your API calls, including the model used, messages sent, tokens consumed, and response time.After running this code, visit your Weave dashboard to see the trace, including token usage, latency, and the full conversation.
Advanced Usage
Wrapping Functions with @weave.op
For more granular tracking, you can wrap your functions with the@weave.op decorator. This creates versioned operations that track inputs, outputs, and the code itself. This is especially useful when you want to track custom logic around your LLM calls.
@weave.op decorator provides:
- Automatic versioning - Code changes create new versions
- Input/output tracking - All parameters and returns are logged
- Call hierarchy - See how operations call each other
- Performance metrics - Track execution time for each operation
Creating Weave Models
Weave Models are a powerful way to encapsulate your LLM logic with configurable parameters. They make it easy to experiment with different model configurations and track which settings produce the best results.- Configuration tracking - All model parameters are versioned
- Easy experimentation - Compare different configurations side-by-side
- Reproducibility - Exact model settings are saved with each prediction
- Evaluation ready - Models can be easily evaluated with Weave’s evaluation framework
Next Steps
- Explore the Weave documentation for advanced features
- Try different Cerebras models to compare performance
- Set up evaluations to systematically test your prompts
- Learn about Weave’s tracing capabilities for complex applications
- Join the W&B Community to share your projects
- Migrate to GLM4.6: Ready to upgrade? Follow our migration guide to start using our latest model
FAQ
Why aren't my traces appearing in the Weave dashboard?
Why aren't my traces appearing in the Weave dashboard?
If you don’t see traces in your Weave dashboard:
- Verify that
weave.init()is called before any Cerebras API calls - Check that your W&B API key is correctly set in your environment
- Ensure you’re logged into the correct W&B account in your browser
- Try running
wandb loginin your terminal to re-authenticate - Wait a few seconds after your script completes for traces to sync
What's the performance overhead of using Weave?
What's the performance overhead of using Weave?
Weave is designed to be lightweight with minimal overhead. The tracing happens asynchronously, so it doesn’t significantly impact your API call latency. Most users see less than 10ms of additional overhead per traced call.
Can I use Weave with streaming responses from Cerebras?
Can I use Weave with streaming responses from Cerebras?
Yes! Weave automatically handles streaming responses from the Cerebras SDK. The complete streamed response will be captured in the trace once the stream completes.
How do I disable Weave tracing temporarily?
How do I disable Weave tracing temporarily?
You can disable tracing by simply not calling
weave.init() at the start of your script. Alternatively, you can use environment variables to conditionally enable Weave:For additional support, visit the Weave GitHub repository or reach out in the W&B Community forums.

