Models
A Model in Kern wraps any LLM provider into a uniform interface.
A Model in Kern wraps any LLM provider into a uniform interface. Pass it to an Agent and switch providers without changing your application code.
Kern is heavily optimized for small language models (1-7B parameters) that run locally on your own hardware for zero cost, full privacy, and low latency.
Why small models?
Models under 7 billion parameters (like Llama 3.2 3B or Phi-4 Mini) run efficiently on consumer laptops and cost nothing to operate. They are highly capable at structured extraction, classification, summarization, and basic tool use.
Kern's template-based structured output, JSON repair, and prompt tuning are specifically designed to maximize the reliability of these resource-constrained models.
The Model Concept
Every model in Kern is a Python class that wraps a specific LLM provider's API. Regardless of the provider, the model object exposes the same methods, making your agent code fully portable:
1from kern.models.openai import OpenAIChat2from kern import Agent34# Define the model — e.g. pointing to a local model server5model = OpenAIChat(6 id="local-model",7 base_url="http://localhost:8080/v1",8 temperature=0.3,9)1011agent = Agent(12 model=model,13 description="You are a helpful assistant.",14)1516result = agent.run("Explain transformers in one sentence")17print(result.content)Supported Providers
Local Models (Recommended)
Run models on your own hardware for full privacy and zero API costs. Any OpenAI-compatible endpoint works out of the box:
Ollama
Install Ollama, pull a model (ollama pull llama3.2:3b), and import Ollama or connect via OpenAIChat.
llama.cpp / vLLM / LM Studio
Connect to any local OpenAI-compatible endpoint by setting base_url.
1# Ollama example2from kern.models.ollama import Ollama3model = Ollama(id="llama3.2:3b")45# OpenAI-Compatible local server (llama.cpp)6from kern.models.openai import OpenAIChat7model = OpenAIChat(8 id="local-model",9 base_url="http://localhost:8080/v1",10)Cloud Providers
Kern also supports cloud-based model providers when you need models larger than 7B:
- OpenAI (e.g.
gpt-4o-mini— best-in-class small cloud model) - Anthropic (e.g.
claude-3-5-sonnet) - Google (e.g.
gemini-2.0-flash) - Groq / Together AI / Fireworks AI (High throughput cloud endpoints for open-source models)
Recommended Models
| Model | ID | Parameters | Best For |
|---|---|---|---|
| Llama 3.2 3B | llama3.2:3b | 3B | General purpose local tasks |
| Phi-4 Mini | phi4-mini | 3.8B | Reasoning-heavy tasks, local coding |
| Llama 3.2 1B | llama3.2:1b | 1B | Ultra-fast local classification / extraction |
| GPT-4o Mini | gpt-4o-mini | ~8B | Cloud primary, tool use, and structured outputs |
Model Shorthand (String Syntax)
For quick prototyping, you can pass a model identifier string directly to the agent instead of importing the model class:
1from kern import Agent23# Kern automatically infers the model provider from the string format4agent = Agent(5 model="gpt-4o-mini",6 description="You are a helpful assistant."7)89# Explicit provider shorthand10agent_ollama = Agent(model="ollama:llama3.2:3b")11agent_together = Agent(model="together:meta-llama/Llama-3.2-3B-Instruct-Turbo")Fallback Models (Resilience)
If your primary model fails due to rate limits or outages, Kern can automatically failover to secondary models:
1from kern import Agent2from kern.models.openai import OpenAIChat3from kern.models.ollama import Ollama45agent = Agent(6 model=OpenAIChat(id="gpt-4o-mini"),7 fallback_models=[8 Ollama(id="llama3.2:3b"), # Fallback to local model if OpenAI fails9 ],10 description="You are a resilient assistant.",11)