Cerebras

Use Cerebras high-speed inference with Kern agents.

Cerebras Inference provides high-speed, low-latency AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. Kern integrates directly with the Cerebras Python SDK, allowing you to use state-of-the-art Llama models with a simple interface.

Prerequisites

To use Cerebras with Kern, you need to:

Install the required packages:
```
1uv pip install cerebras-cloud-sdk
```
Set your API key: The Cerebras SDK expects your API key to be available as an environment variable:
```
1export CEREBRAS_API_KEY=your_api_key_here
```

Basic Usage

Here's how to use a Cerebras model with Kern:

1from kern.agent import Agent
2from kern.models.cerebras import Cerebras
3
4agent = Agent(
5    model=Cerebras(id="llama-4-scout-17b-16e-instruct"),
6    markdown=True,
7)
8
9# Print the response in the terminal
10agent.print_response("write a two sentence horror story")

Supported Models

Cerebras currently supports the following models (see docs for the latest list):

Model Name	Model ID	Parameters	Knowledge
Llama 4 Scout	llama-4-scout-17b-16e-instruct	109 billion	August 2024
Llama 3.1 8B	llama3.1-8b	8 billion	March 2023
Llama 3.3 70B	llama-3.3-70b	70 billion	December 2023
DeepSeek R1 Distill Llama 70B*	deepseek-r1-distill-llama-70b	70 billion	December 2023

* DeepSeek R1 Distill Llama 70B is available in private preview.

Parameters

Parameter	Type	Default	Description
`id`	`str`	`"llama-4-scout-17b-16e-instruct"`	The id of the Cerebras model to use
`name`	`str`	`"Cerebras"`	The name of the model
`provider`	`str`	`"Cerebras"`	The provider of the model
`parallel_tool_calls`	`Optional[bool]`	`None`	Whether to run tool calls in parallel (automatically set to False for llama-4-scout)
`max_completion_tokens`	`Optional[int]`	`None`	Maximum number of completion tokens to generate
`repetition_penalty`	`Optional[float]`	`None`	Penalty for repeating tokens (higher values reduce repetition)
`temperature`	`Optional[float]`	`None`	Controls randomness in the model's output (0.0 to 2.0)
`top_p`	`Optional[float]`	`None`	Controls diversity via nucleus sampling (0.0 to 1.0)
`top_k`	`Optional[int]`	`None`	Controls diversity via top-k sampling
`strict_output`	`bool`	`True`	Controls schema adherence for structured outputs
`extra_headers`	`Optional[Any]`	`None`	Additional headers to include in requests
`extra_query`	`Optional[Any]`	`None`	Additional query parameters to include in requests
`extra_body`	`Optional[Any]`	`None`	Additional body parameters to include in requests
`request_params`	`Optional[Dict[str, Any]]`	`None`	Additional parameters to include in the request
`api_key`	`Optional[str]`	`None`	The API key for authenticating with Cerebras (defaults to CEREBRAS_API_KEY env var)
`base_url`	`Optional[Union[str, httpx.URL]]`	`None`	The base URL for the Cerebras API
`timeout`	`Optional[float]`	`None`	Request timeout in seconds
`max_retries`	`Optional[int]`	`None`	Maximum number of retries for failed requests
`default_headers`	`Optional[Any]`	`None`	Default headers to include in all requests
`default_query`	`Optional[Any]`	`None`	Default query parameters to include in all requests
`http_client`	`Optional[httpx.Client]`	`None`	HTTP client instance for making requests
`client_params`	`Optional[Dict[str, Any]]`	`None`	Additional parameters for client configuration
`client`	`Optional[CerebrasClient]`	`None`	A pre-configured instance of the Cerebras client
`async_client`	`Optional[AsyncCerebrasClient]`	`None`	A pre-configured instance of the async Cerebras client

Cerebras is a subclass of the Model class and has access to the same params.

Structured Outputs

The Cerebras model supports structured outputs using JSON schema:

1from kern.agent import Agent
2from kern.models.cerebras import Cerebras
3from pydantic import BaseModel
4from typing import List
5
6class MovieScript(BaseModel):
7    setting: str
8    characters: List[str]
9    plot: str
10
11agent = Agent(
12    model=Cerebras(id="llama-4-scout-17b-16e-instruct"),
13    response_format=MovieScript,
14)

Resources

SDK Examples

View more examples here.