Cerebras
Use Cerebras high-speed inference with Kern agents.
Cerebras Inference provides high-speed, low-latency AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. Kern integrates directly with the Cerebras Python SDK, allowing you to use state-of-the-art Llama models with a simple interface.
Prerequisites
To use Cerebras with Kern, you need to:
-
Install the required packages:
1uv pip install cerebras-cloud-sdk -
Set your API key: The Cerebras SDK expects your API key to be available as an environment variable:
1export CEREBRAS_API_KEY=your_api_key_here
Basic Usage
Here's how to use a Cerebras model with Kern:
1from kern.agent import Agent2from kern.models.cerebras import Cerebras34agent = Agent(5 model=Cerebras(id="llama-4-scout-17b-16e-instruct"),6 markdown=True,7)89# Print the response in the terminal10agent.print_response("write a two sentence horror story")Supported Models
Cerebras currently supports the following models (see docs for the latest list):
| Model Name | Model ID | Parameters | Knowledge |
|---|---|---|---|
| Llama 4 Scout | llama-4-scout-17b-16e-instruct | 109 billion | August 2024 |
| Llama 3.1 8B | llama3.1-8b | 8 billion | March 2023 |
| Llama 3.3 70B | llama-3.3-70b | 70 billion | December 2023 |
| DeepSeek R1 Distill Llama 70B* | deepseek-r1-distill-llama-70b | 70 billion | December 2023 |
* DeepSeek R1 Distill Llama 70B is available in private preview.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "llama-4-scout-17b-16e-instruct" | The id of the Cerebras model to use |
name | str | "Cerebras" | The name of the model |
provider | str | "Cerebras" | The provider of the model |
parallel_tool_calls | Optional[bool] | None | Whether to run tool calls in parallel (automatically set to False for llama-4-scout) |
max_completion_tokens | Optional[int] | None | Maximum number of completion tokens to generate |
repetition_penalty | Optional[float] | None | Penalty for repeating tokens (higher values reduce repetition) |
temperature | Optional[float] | None | Controls randomness in the model's output (0.0 to 2.0) |
top_p | Optional[float] | None | Controls diversity via nucleus sampling (0.0 to 1.0) |
top_k | Optional[int] | None | Controls diversity via top-k sampling |
strict_output | bool | True | Controls schema adherence for structured outputs |
extra_headers | Optional[Any] | None | Additional headers to include in requests |
extra_query | Optional[Any] | None | Additional query parameters to include in requests |
extra_body | Optional[Any] | None | Additional body parameters to include in requests |
request_params | Optional[Dict[str, Any]] | None | Additional parameters to include in the request |
api_key | Optional[str] | None | The API key for authenticating with Cerebras (defaults to CEREBRAS_API_KEY env var) |
base_url | Optional[Union[str, httpx.URL]] | None | The base URL for the Cerebras API |
timeout | Optional[float] | None | Request timeout in seconds |
max_retries | Optional[int] | None | Maximum number of retries for failed requests |
default_headers | Optional[Any] | None | Default headers to include in all requests |
default_query | Optional[Any] | None | Default query parameters to include in all requests |
http_client | Optional[httpx.Client] | None | HTTP client instance for making requests |
client_params | Optional[Dict[str, Any]] | None | Additional parameters for client configuration |
client | Optional[CerebrasClient] | None | A pre-configured instance of the Cerebras client |
async_client | Optional[AsyncCerebrasClient] | None | A pre-configured instance of the async Cerebras client |
Cerebras is a subclass of the Model class and has access to the same params.
Structured Outputs
The Cerebras model supports structured outputs using JSON schema:
1from kern.agent import Agent2from kern.models.cerebras import Cerebras3from pydantic import BaseModel4from typing import List56class MovieScript(BaseModel):7 setting: str8 characters: List[str]9 plot: str1011agent = Agent(12 model=Cerebras(id="llama-4-scout-17b-16e-instruct"),13 response_format=MovieScript,14)Resources
SDK Examples
- View more examples here.