Cerebras

Use Cerebras high-speed inference with Kern agents.

Cerebras Inference provides high-speed, low-latency AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. Kern integrates directly with the Cerebras Python SDK, allowing you to use state-of-the-art Llama models with a simple interface.

Prerequisites

To use Cerebras with Kern, you need to:

  1. Install the required packages:

    1uv pip install cerebras-cloud-sdk
  2. Set your API key: The Cerebras SDK expects your API key to be available as an environment variable:

    1export CEREBRAS_API_KEY=your_api_key_here

Basic Usage

Here's how to use a Cerebras model with Kern:

1from kern.agent import Agent
2from kern.models.cerebras import Cerebras
3
4agent = Agent(
5 model=Cerebras(id="llama-4-scout-17b-16e-instruct"),
6 markdown=True,
7)
8
9# Print the response in the terminal
10agent.print_response("write a two sentence horror story")

Supported Models

Cerebras currently supports the following models (see docs for the latest list):

Model NameModel IDParametersKnowledge
Llama 4 Scoutllama-4-scout-17b-16e-instruct109 billionAugust 2024
Llama 3.1 8Bllama3.1-8b8 billionMarch 2023
Llama 3.3 70Bllama-3.3-70b70 billionDecember 2023
DeepSeek R1 Distill Llama 70B*deepseek-r1-distill-llama-70b70 billionDecember 2023

* DeepSeek R1 Distill Llama 70B is available in private preview.

Parameters

ParameterTypeDefaultDescription
idstr"llama-4-scout-17b-16e-instruct"The id of the Cerebras model to use
namestr"Cerebras"The name of the model
providerstr"Cerebras"The provider of the model
parallel_tool_callsOptional[bool]NoneWhether to run tool calls in parallel (automatically set to False for llama-4-scout)
max_completion_tokensOptional[int]NoneMaximum number of completion tokens to generate
repetition_penaltyOptional[float]NonePenalty for repeating tokens (higher values reduce repetition)
temperatureOptional[float]NoneControls randomness in the model's output (0.0 to 2.0)
top_pOptional[float]NoneControls diversity via nucleus sampling (0.0 to 1.0)
top_kOptional[int]NoneControls diversity via top-k sampling
strict_outputboolTrueControls schema adherence for structured outputs
extra_headersOptional[Any]NoneAdditional headers to include in requests
extra_queryOptional[Any]NoneAdditional query parameters to include in requests
extra_bodyOptional[Any]NoneAdditional body parameters to include in requests
request_paramsOptional[Dict[str, Any]]NoneAdditional parameters to include in the request
api_keyOptional[str]NoneThe API key for authenticating with Cerebras (defaults to CEREBRAS_API_KEY env var)
base_urlOptional[Union[str, httpx.URL]]NoneThe base URL for the Cerebras API
timeoutOptional[float]NoneRequest timeout in seconds
max_retriesOptional[int]NoneMaximum number of retries for failed requests
default_headersOptional[Any]NoneDefault headers to include in all requests
default_queryOptional[Any]NoneDefault query parameters to include in all requests
http_clientOptional[httpx.Client]NoneHTTP client instance for making requests
client_paramsOptional[Dict[str, Any]]NoneAdditional parameters for client configuration
clientOptional[CerebrasClient]NoneA pre-configured instance of the Cerebras client
async_clientOptional[AsyncCerebrasClient]NoneA pre-configured instance of the async Cerebras client

Cerebras is a subclass of the Model class and has access to the same params.

Structured Outputs

The Cerebras model supports structured outputs using JSON schema:

1from kern.agent import Agent
2from kern.models.cerebras import Cerebras
3from pydantic import BaseModel
4from typing import List
5
6class MovieScript(BaseModel):
7 setting: str
8 characters: List[str]
9 plot: str
10
11agent = Agent(
12 model=Cerebras(id="llama-4-scout-17b-16e-instruct"),
13 response_format=MovieScript,
14)

Resources

SDK Examples

  • View more examples here.