Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Langfuse Account - Visit Langfuse and create a free account. Langfuse offers both cloud-hosted and self-hosted options.
- Langfuse API Keys - After creating your account, navigate to your project settings to generate your public and secret API keys.
- Python 3.7 or higher
Configure Langfuse
1
Install required dependencies
Install the Langfuse SDK and OpenAI client library. The Langfuse SDK provides automatic tracing capabilities that work seamlessly with OpenAI-compatible APIs like Cerebras.
2
Configure environment variables
Create a
.env file in your project directory to securely store your API keys. This keeps your credentials out of your code and makes it easy to manage different environments.If you’re self-hosting Langfuse, replace
LANGFUSE_HOST with your instance URL.3
Initialize the Langfuse-wrapped client
Langfuse provides a convenient wrapper around the OpenAI client that automatically traces all your API calls. This means every request, response, token count, and latency metric is logged without any additional code.
4
Make your first traced request
Now you can make API calls as usual, and Langfuse will automatically capture all the details. This example shows a simple chat completion that will appear in your Langfuse dashboard.After running this code, visit your Langfuse dashboard to see the traced request with full details including input, output, tokens used, and latency.
Advanced Features
Adding Custom Metadata and Tags
You can enrich your traces with custom metadata to make debugging and analysis easier. This helps you filter and analyze traces by environment, user tier, feature area, or any custom dimension.Organizing Calls into Traces
For complex applications with multiple LLM calls, you can organize them into hierarchical traces using Langfuse’s@observe() decorator. This creates a parent-child relationship that makes it easy to understand the flow of your application.
create_complete_story function and all its child LLM calls organized together.
Streaming Responses
Langfuse fully supports streaming responses from Cerebras models. The complete streamed response will be logged to Langfuse once the stream completes, allowing you to benefit from Cerebras’s fast token generation while maintaining full observability.Collecting User Feedback
Langfuse makes it easy to collect and analyze user feedback on model outputs. You can view traces in your Langfuse dashboard and add scores and feedback directly through the UI, or use the Langfuse SDK to programmatically score traces by their trace ID or observation ID.Monitoring and Analytics
Once you’ve integrated Langfuse, you can access powerful analytics in your dashboard to understand your application’s performance and usage patterns:- Request Volume: Track how many requests you’re making over time
- Latency Metrics: Monitor p50, p95, and p99 latencies for your Cerebras calls
- Token Usage: Understand your token consumption patterns across different models
- Cost Tracking: Monitor spending across different models and use cases
- Error Rates: Identify and debug failed requests quickly
- User Analytics: See which users or features generate the most requests
- Model Comparison: Compare performance across different Cerebras models
Next Steps
- Explore the Langfuse documentation for advanced features like datasets, experiments, and prompt management
- Try different Cerebras models to find the best fit for your use case
- Set up custom evaluations to automatically assess output quality
- Learn about prompt management to version and test your prompts
- Integrate Langfuse datasets for systematic testing and evaluation
- Join the Langfuse Discord community to connect with other developers
- Follow our GLM4.6 migration guide to start using our latest model
FAQ
Do traces appear in real-time?
Do traces appear in real-time?
Yes, traces typically appear in your Langfuse dashboard within seconds of making an API call. The Langfuse SDK sends data asynchronously to minimize impact on your application’s performance.
Does Langfuse add latency to my Cerebras API calls?
Does Langfuse add latency to my Cerebras API calls?
Langfuse tracing adds minimal overhead, typically less than 10ms per request. The SDK sends trace data asynchronously, so it doesn’t block your API calls. You’ll still benefit from Cerebras’s industry-leading inference speed.
Can I use Langfuse with other Cerebras models?
Can I use Langfuse with other Cerebras models?
Yes! Langfuse works with all Cerebras models including
llama-3.3-70b, qwen-3-32b, llama3.1-8b, gpt-oss-120b, and zai-glm-4.6. Simply change the model parameter in your API calls.How do I filter traces by custom metadata?
How do I filter traces by custom metadata?
In your Langfuse dashboard, use the filter panel to search by metadata fields, tags, user IDs, or any other custom attributes you’ve added to your traces. You can also create saved views for frequently used filters.
Can I self-host Langfuse?
Can I self-host Langfuse?
Yes, Langfuse is open-source and can be self-hosted. Visit the Langfuse self-hosting documentation for instructions. Simply update your
LANGFUSE_HOST environment variable to point to your instance.What if my traces aren't appearing in the dashboard?
What if my traces aren't appearing in the dashboard?
First, verify your
LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are correct. Ensure you’re importing from langfuse.openai (not just openai). Check that your LANGFUSE_HOST is set correctly. If issues persist, check your application logs for error messages or visit the Langfuse Discord for support.For additional support, visit the Langfuse Discord community or check the GitHub repository.

