Models

A Model in Kern wraps any LLM provider into a uniform interface.

A Model in Kern wraps any LLM provider into a uniform interface. Pass it to an Agent and switch providers without changing your application code.

Kern is heavily optimized for small language models (1-7B parameters) that run locally on your own hardware for zero cost, full privacy, and low latency.

Why small models?

Models under 7 billion parameters (like Llama 3.2 3B or Phi-4 Mini) run efficiently on consumer laptops and cost nothing to operate. They are highly capable at structured extraction, classification, summarization, and basic tool use.

Kern's template-based structured output, JSON repair, and prompt tuning are specifically designed to maximize the reliability of these resource-constrained models.

The Model Concept

Every model in Kern is a Python class that wraps a specific LLM provider's API. Regardless of the provider, the model object exposes the same methods, making your agent code fully portable:

1from kern.models.openai import OpenAIChat
2from kern import Agent
3
4# Define the model — e.g. pointing to a local model server
5model = OpenAIChat(
6    id="local-model",
7    base_url="http://localhost:8080/v1",
8    temperature=0.3,
9)
10
11agent = Agent(
12    model=model,
13    description="You are a helpful assistant.",
14)
15
16result = agent.run("Explain transformers in one sentence")
17print(result.content)

Supported Providers

Local Models (Recommended)

Run models on your own hardware for full privacy and zero API costs. Any OpenAI-compatible endpoint works out of the box:

laptop

Ollama

Install Ollama, pull a model (ollama pull llama3.2:3b), and import Ollama or connect via OpenAIChat.

server

llama.cpp / vLLM / LM Studio

Connect to any local OpenAI-compatible endpoint by setting base_url.

1# Ollama example
2from kern.models.ollama import Ollama
3model = Ollama(id="llama3.2:3b")
4
5# OpenAI-Compatible local server (llama.cpp)
6from kern.models.openai import OpenAIChat
7model = OpenAIChat(
8    id="local-model",
9    base_url="http://localhost:8080/v1",
10)

Cloud Providers

Kern also supports cloud-based model providers when you need models larger than 7B:

OpenAI (e.g. gpt-4o-mini — best-in-class small cloud model)
Anthropic (e.g. claude-3-5-sonnet)
Google (e.g. gemini-2.0-flash)
Groq / Together AI / Fireworks AI (High throughput cloud endpoints for open-source models)

Recommended Models

Model	ID	Parameters	Best For
Llama 3.2 3B	`llama3.2:3b`	3B	General purpose local tasks
Phi-4 Mini	`phi4-mini`	3.8B	Reasoning-heavy tasks, local coding
Llama 3.2 1B	`llama3.2:1b`	1B	Ultra-fast local classification / extraction
GPT-4o Mini	`gpt-4o-mini`	~8B	Cloud primary, tool use, and structured outputs

Model Shorthand (String Syntax)

For quick prototyping, you can pass a model identifier string directly to the agent instead of importing the model class:

1from kern import Agent
2
3# Kern automatically infers the model provider from the string format
4agent = Agent(
5    model="gpt-4o-mini",
6    description="You are a helpful assistant."
7)
8
9# Explicit provider shorthand
10agent_ollama = Agent(model="ollama:llama3.2:3b")
11agent_together = Agent(model="together:meta-llama/Llama-3.2-3B-Instruct-Turbo")

Fallback Models (Resilience)

If your primary model fails due to rate limits or outages, Kern can automatically failover to secondary models:

1from kern import Agent
2from kern.models.openai import OpenAIChat
3from kern.models.ollama import Ollama
4
5agent = Agent(
6    model=OpenAIChat(id="gpt-4o-mini"),
7    fallback_models=[
8        Ollama(id="llama3.2:3b"),  # Fallback to local model if OpenAI fails
9    ],
10    description="You are a resilient assistant.",
11)