Ollama
Run local models with Ollama in Kern agents.
Run large language models with Ollama, either locally or through Ollama Cloud.
Ollama is a fantastic tool for running models both locally and in the cloud.
Local Usage: Run models on your own hardware using the Ollama client.
Cloud Usage: Access cloud-hosted models via Ollama Cloud with an API key.
Ollama supports multiple open-source models. See the library here.
Experiment with different models to find the best fit for your use case. Here are some general recommendations:
gpt-oss:120b-cloudis an excellent general-purpose cloud model for most tasks.llama3.3models are good for most basic use-cases.qwenmodels perform specifically well with tool use.deepseek-r1models have strong reasoning capabilities.phi4models are powerful, while being really small in size.
Authentication (Ollama Cloud Only)
To use Ollama Cloud, set your OLLAMA_API_KEY environment variable. You can get an API key from Ollama Cloud.
1export OLLAMA_API_KEY=***1setx OLLAMA_API_KEY ***When using Ollama Cloud, the host is automatically set to https://ollama.com. For local usage, no API key is required.
Set up a model
Local Usage
Install ollama and run a model:
1ollama run llama3.1This starts an interactive session with the model.
To download the model for use in an Kern agent:
1ollama pull llama3.1Cloud Usage
For Ollama Cloud, no local Ollama server installation is required. Install the Ollama library, set up your API key as described in the Authentication section above, and access cloud-hosted models directly.
Examples
Local Usage
Once the model is available locally, use the Ollama model class to access it:
1from kern.agent import Agent2from kern.models.ollama import Ollama34agent = Agent(5 model=Ollama(id="llama3.1"),6 markdown=True7)89# Print the response in the terminal10agent.print_response("Share a 2 sentence horror story.")Cloud Usage
https://ollama.com. You can omit the host parameter.1from kern.agent import Agent2from kern.models.ollama import Ollama34agent = Agent(5 model=Ollama(id="gpt-oss:120b-cloud"),6 markdown=True7)89# Print the response in the terminal10agent.print_response("Share a 2 sentence horror story.")Params
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "llama3.2" | The name of the Ollama model to use |
name | str | "Ollama" | The name of the model |
provider | str | "Ollama" | The provider of the model |
host | str | "http://localhost:11434" | The host URL for the Ollama server |
timeout | Optional[int] | None | Request timeout in seconds |
format | Optional[str] | None | The format to return the response in (e.g., "json") |
options | Optional[Dict[str, Any]] | None | Additional model options (temperature, top_p, etc.) |
keep_alive | Optional[Union[float, str]] | None | How long to keep the model loaded (e.g., "5m", 3600 seconds) |
template | Optional[str] | None | The prompt template to use |
system | Optional[str] | None | System message to use |
raw | Optional[bool] | None | Whether to return raw response without formatting |
stream | bool | True | Whether to stream the response |
Ollama is a subclass of the Model class and has access to the same params.
Responses API
Ollama v0.13.3+ supports the OpenAI Responses API via the /v1/responses endpoint. Use OllamaResponses for this interface:
1from kern.agent import Agent2from kern.models.ollama import OllamaResponses34agent = Agent(5 model=OllamaResponses(id="gpt-oss:20b"),6 markdown=True,7)89agent.print_response("Share a 2 sentence horror story")The Responses API is stateless. Each request is independent with no previous_response_id chaining.
See OllamaResponses reference for full parameters.