This guide covers the ChatCerebras v3.0 node, which includes a model dropdown selector and automatic integration tracking. If you’re using an older version, consider updating Flowise to get these enhanced features.
Prerequisites
Before you begin, ensure you have:- Cerebras API Key - Get a free API key here.
- Flowise Installation - Install Flowise locally or use Flowise Cloud.
- Node.js 18 or higher - Required for running Flowise locally.
Install Flowise
Install Flowise via NPM
The easiest way to get started with Flowise is to install it globally using NPM:Alternatively, you can use Docker:
Configure Cerebras in Flowise
Create a new Chatflow
In the Flowise UI, create a new chatflow to house your Cerebras-powered application:
- Click on “Chatflows” in the left sidebar
- Click the “+Add New” button
- Give your chatflow a descriptive name like “Cerebras Chat Assistant”
Add the ChatCerebras node
Flowise has a dedicated ChatCerebras node for seamless integration:
- In the canvas, click the ”+” button or drag from the left panel
- Search for “ChatCerebras” in the Chat Models category
- Drag the ChatCerebras node onto the canvas
Configure the ChatCerebras node
Click on the ChatCerebras node to open its configuration panel and configure the following settings:Required Settings:
-
Connect Credential: Click to add your Cerebras API Key
- If this is your first time, click “Create New”
- Enter your API key from cloud.cerebras.ai (starts with
csk-) - Give it a name like “Cerebras API”
- Click “Add”
-
Model Name: Select from the dropdown:
- llama-3.3-70b - Best for complex reasoning and long-form content
- qwen-3-32b - Balanced performance for general-purpose tasks
- llama3.1-8b - Fastest model, ideal for simple tasks (default)
- gpt-oss-120b - Largest model for demanding tasks
- zai-glm-4.7 - Advanced reasoning and complex problem-solving
- Temperature: Control randomness (0.0 to 1.0, default 0.9)
- Max Tokens: Maximum response length
- Top P: Nucleus sampling parameter
- Streaming: Enable for real-time token generation (default: true)
The ChatCerebras node automatically:
- Configures the correct API endpoint (
https://api.cerebras.ai/v1) - Adds the integration tracking header for better support
- No manual configuration needed!
Connect additional nodes
Build out your chatflow by adding other nodes to create a complete application:Or with memory:
- Add a Prompt Template - Click ”+” and search for “Prompt Template” to customize your system prompts
- Add Memory (optional) - Search for “Buffer Memory” or “Conversation Buffer Memory” to maintain conversation context
- Connect the nodes - Draw connections between nodes by clicking and dragging from output ports to input ports
Test your chatflow
Once your nodes are connected, test your Cerebras-powered chatflow:
- Click the “Save” button in the top right
- Click the “Chat” icon to open the test interface
- Send a test message like “Hello! What can you help me with?”
- You should receive a response from your Cerebras-powered chatflow
https://api.cerebras.ai/v1.Using Cerebras with Flowise API
Flowise automatically generates REST APIs for your chatflows, allowing you to integrate Cerebras-powered AI into any application.Get your Chatflow API endpoint
In the Flowise UI:
- Open your chatflow
- Click the “API” button in the top right
- Copy the API endpoint URL (e.g.,
http://localhost:3000/api/v1/prediction/your-chatflow-id)
Direct Integration with OpenAI SDK
For advanced users who want to use Cerebras directly in custom Flowise nodes or external applications, you can use the OpenAI SDK with Cerebras configuration:Advanced Configuration
Using Environment Variables
For production deployments, store your Cerebras API key as an environment variable:${CEREBRAS_API_KEY}.
Streaming Responses
To enable streaming responses for real-time output:- In the ChatCerebras node, enable “Streaming”
- Your API responses will now stream tokens as they’re generated
- This is particularly useful for long-form content generation and provides a better user experience
Using Multiple Cerebras Models
You can create different chatflows for different use cases:- Fast responses: Use
llama3.1-8bfor quick, simple queries - Complex reasoning: Use
llama-3.3-70bfor complex reasoning and long-form content - General purpose: Use
qwen-3-32bfor balanced performance - Long context: Use
gpt-oss-120bfor processing large documents - Advanced reasoning: Use
zai-glm-4.7for demanding tasks
Next Steps
- Explore the Flowise documentation to learn about advanced features
- Try different Cerebras models to find the best fit for your use case
- Join the Flowise Discord community for support and inspiration
- Check out Flowise templates for pre-built chatflow examples
- Deploy your chatflow to production using Flowise Cloud
- GLM4.7 migration guide
FAQ
Error: 'Invalid API key' or 401 Unauthorized
Error: 'Invalid API key' or 401 Unauthorized
Error: 'Model not found' or invalid model name
Error: 'Model not found' or invalid model name
Ensure you’re using the correct model name format:
- Use
llama-3.3-70b - Use
qwen-3-32b - Use
llama3.1-8b - Use
gpt-oss-120b
Responses are slow or timing out
Responses are slow or timing out
If you’re experiencing slow responses:
- Check your internet connection
- Verify the Base URL is set to
https://api.cerebras.ai/v1(nothttp://) - Try reducing the
max_tokensparameter - Consider using a faster model like
cerebras/llama3.1-8bfor simpler tasks - Check Cerebras status page for any service issues
Does the integration tracking header get added automatically?
Does the integration tracking header get added automatically?
Yes! As of ChatCerebras v3.0, the
X-Cerebras-3rd-Party-Integration: flowise header is automatically included in all requests. You don’t need to manually configure anything.This header helps Cerebras:- Track integration usage and performance
- Provide better support for Flowise users
- Identify and resolve integration-specific issues faster
Can I use Cerebras with Flowise Cloud?
Can I use Cerebras with Flowise Cloud?
Yes! The same configuration works with Flowise Cloud:
- Sign up at flowiseai.com
- Create a new chatflow
- Configure the ChatCerebras node as described above
- Your chatflow will use Cerebras Inference in the cloud
How do I switch between different Cerebras models?
How do I switch between different Cerebras models?
Switching models is easy with the dropdown selector:
- Click on the ChatCerebras node in your chatflow
- Click the “Model Name” dropdown
- Select your desired model from the list (each has a description to help you choose)
- Save the chatflow
- Test with the new model
What additional latency can I expect when using Cerebras through Flowise?
What additional latency can I expect when using Cerebras through Flowise?
Flowise adds minimal overhead since it primarily orchestrates the workflow. The actual inference is performed directly by Cerebras, so you’ll experience the same ultra-low latency that Cerebras is known for. Any additional latency is typically negligible (< 50ms) and comes from Flowise’s workflow orchestration.

