Skip to main content
Langfuse is an open-source LLM engineering platform that helps you debug, analyze, and iterate on your LLM applications. By integrating Langfuse with Cerebras Inference, you can automatically trace all your API calls, monitor performance, collect user feedback, and evaluate model outputs—all while benefiting from Cerebras’s industry-leading inference speed.

Prerequisites

Before you begin, ensure you have:
  • Cerebras API Key - Get a free API key here.
  • Langfuse Account - Visit Langfuse and create a free account. Langfuse offers both cloud-hosted and self-hosted options.
  • Langfuse API Keys - After creating your account, navigate to your project settings to generate your public and secret API keys.
  • Python 3.7 or higher

Configure Langfuse

1

Install required dependencies

Install the Langfuse SDK and OpenAI client library. The Langfuse SDK provides automatic tracing capabilities that work seamlessly with OpenAI-compatible APIs like Cerebras.
pip install langfuse openai
2

Configure environment variables

Create a .env file in your project directory to securely store your API keys. This keeps your credentials out of your code and makes it easy to manage different environments.
CEREBRAS_API_KEY=your-cerebras-api-key-here
LANGFUSE_PUBLIC_KEY=your-langfuse-public-key-here
LANGFUSE_SECRET_KEY=your-langfuse-secret-key-here
LANGFUSE_HOST=https://cloud.langfuse.com  # Use your self-hosted URL if applicable
If you’re self-hosting Langfuse, replace LANGFUSE_HOST with your instance URL.
3

Initialize the Langfuse-wrapped client

Langfuse provides a convenient wrapper around the OpenAI client that automatically traces all your API calls. This means every request, response, token count, and latency metric is logged without any additional code.
import os
from langfuse.openai import OpenAI

# Initialize the Langfuse-wrapped OpenAI client
client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)
4

Make your first traced request

Now you can make API calls as usual, and Langfuse will automatically capture all the details. This example shows a simple chat completion that will appear in your Langfuse dashboard.
import os
from langfuse.openai import OpenAI

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."},
    ],
    max_completion_tokens=500,
)

print(response.choices[0].message.content)
After running this code, visit your Langfuse dashboard to see the traced request with full details including input, output, tokens used, and latency.

Advanced Features

Adding Custom Metadata and Tags

You can enrich your traces with custom metadata to make debugging and analysis easier. This helps you filter and analyze traces by environment, user tier, feature area, or any custom dimension.
import os
from langfuse.openai import OpenAI

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "What is the capital of France?"},
    ],
    # Add Langfuse-specific parameters
    name="geography-question",
    metadata={
        "environment": "production",
        "user_tier": "premium",
        "feature": "qa-system"
    })

print(response.choices[0].message.content)

Organizing Calls into Traces

For complex applications with multiple LLM calls, you can organize them into hierarchical traces using Langfuse’s @observe() decorator. This creates a parent-child relationship that makes it easy to understand the flow of your application.
import os
from langfuse.openai import OpenAI
from langfuse import observe

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)

@observe()
def generate_story_outline(topic: str):
    """Generate a story outline based on a topic."""
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": "You are a creative writing assistant."},
            {"role": "user", "content": f"Create a brief story outline about {topic}."},
        ],
        max_completion_tokens=300,
    )
    return response.choices[0].message.content

@observe()
def expand_story_section(outline: str, section: str):
    """Expand a specific section of the story."""
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": "You are a creative writing assistant."},
            {"role": "user", "content": f"Given this outline:\n{outline}\n\nExpand the {section} section with more detail."},
        ],
        max_completion_tokens=500,
    )
    return response.choices[0].message.content

@observe()
def create_complete_story(topic: str):
    """Create a complete story with outline and expanded sections."""
    # This will create a parent trace with child spans
    outline = generate_story_outline(topic)
    introduction = expand_story_section(outline, "introduction")
    conclusion = expand_story_section(outline, "conclusion")
    
    return {
        "outline": outline,
        "introduction": introduction,
        "conclusion": conclusion
    }

# Run the multi-step story generation
story = create_complete_story("a robot learning to paint")
print(story)
This creates a hierarchical trace in Langfuse where you can see the parent create_complete_story function and all its child LLM calls organized together.

Streaming Responses

Langfuse fully supports streaming responses from Cerebras models. The complete streamed response will be logged to Langfuse once the stream completes, allowing you to benefit from Cerebras’s fast token generation while maintaining full observability.
import os
from langfuse.openai import OpenAI

client = OpenAI(
    api_key=os.getenv("CEREBRAS_API_KEY"),
    base_url="https://api.cerebras.ai/v1"
)

stream = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Write a haiku about artificial intelligence."},
    ],
    stream=True,
    max_completion_tokens=100,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Collecting User Feedback

Langfuse makes it easy to collect and analyze user feedback on model outputs. You can view traces in your Langfuse dashboard and add scores and feedback directly through the UI, or use the Langfuse SDK to programmatically score traces by their trace ID or observation ID.

Monitoring and Analytics

Once you’ve integrated Langfuse, you can access powerful analytics in your dashboard to understand your application’s performance and usage patterns:
  • Request Volume: Track how many requests you’re making over time
  • Latency Metrics: Monitor p50, p95, and p99 latencies for your Cerebras calls
  • Token Usage: Understand your token consumption patterns across different models
  • Cost Tracking: Monitor spending across different models and use cases
  • Error Rates: Identify and debug failed requests quickly
  • User Analytics: See which users or features generate the most requests
  • Model Comparison: Compare performance across different Cerebras models
Visit your Langfuse dashboard to explore these features and gain insights into your LLM application.

Next Steps

FAQ

Yes, traces typically appear in your Langfuse dashboard within seconds of making an API call. The Langfuse SDK sends data asynchronously to minimize impact on your application’s performance.
Langfuse tracing adds minimal overhead, typically less than 10ms per request. The SDK sends trace data asynchronously, so it doesn’t block your API calls. You’ll still benefit from Cerebras’s industry-leading inference speed.
Yes! Langfuse works with all Cerebras models including llama-3.3-70b, qwen-3-32b, llama3.1-8b, gpt-oss-120b, and zai-glm-4.6. Simply change the model parameter in your API calls.
In your Langfuse dashboard, use the filter panel to search by metadata fields, tags, user IDs, or any other custom attributes you’ve added to your traces. You can also create saved views for frequently used filters.
Yes, Langfuse is open-source and can be self-hosted. Visit the Langfuse self-hosting documentation for instructions. Simply update your LANGFUSE_HOST environment variable to point to your instance.
First, verify your LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are correct. Ensure you’re importing from langfuse.openai (not just openai). Check that your LANGFUSE_HOST is set correctly. If issues persist, check your application logs for error messages or visit the Langfuse Discord for support.
For additional support, visit the Langfuse Discord community or check the GitHub repository.