Ollama

Run local models with Ollama in Kern agents.

Run large language models with Ollama, either locally or through Ollama Cloud.

Ollama is a fantastic tool for running models both locally and in the cloud.

Local Usage: Run models on your own hardware using the Ollama client.

Cloud Usage: Access cloud-hosted models via Ollama Cloud with an API key.

Ollama supports multiple open-source models. See the library here.

Experiment with different models to find the best fit for your use case. Here are some general recommendations:

  • gpt-oss:120b-cloud is an excellent general-purpose cloud model for most tasks.
  • llama3.3 models are good for most basic use-cases.
  • qwen models perform specifically well with tool use.
  • deepseek-r1 models have strong reasoning capabilities.
  • phi4 models are powerful, while being really small in size.

Authentication (Ollama Cloud Only)

To use Ollama Cloud, set your OLLAMA_API_KEY environment variable. You can get an API key from Ollama Cloud.

1export OLLAMA_API_KEY=***
1setx OLLAMA_API_KEY ***

When using Ollama Cloud, the host is automatically set to https://ollama.com. For local usage, no API key is required.

Set up a model

Local Usage

Install ollama and run a model:

1ollama run llama3.1

This starts an interactive session with the model.

To download the model for use in an Kern agent:

1ollama pull llama3.1

Cloud Usage

For Ollama Cloud, no local Ollama server installation is required. Install the Ollama library, set up your API key as described in the Authentication section above, and access cloud-hosted models directly.

Examples

Local Usage

Once the model is available locally, use the Ollama model class to access it:

1from kern.agent import Agent
2from kern.models.ollama import Ollama
3
4agent = Agent(
5 model=Ollama(id="llama3.1"),
6 markdown=True
7)
8
9# Print the response in the terminal
10agent.print_response("Share a 2 sentence horror story.")

Cloud Usage

NoteWhen using Ollama Cloud with an API key, the host is automatically set to https://ollama.com. You can omit the host parameter.
1from kern.agent import Agent
2from kern.models.ollama import Ollama
3
4agent = Agent(
5 model=Ollama(id="gpt-oss:120b-cloud"),
6 markdown=True
7)
8
9# Print the response in the terminal
10agent.print_response("Share a 2 sentence horror story.")
Note View more examples here.

Params

ParameterTypeDefaultDescription
idstr"llama3.2"The name of the Ollama model to use
namestr"Ollama"The name of the model
providerstr"Ollama"The provider of the model
hoststr"http://localhost:11434"The host URL for the Ollama server
timeoutOptional[int]NoneRequest timeout in seconds
formatOptional[str]NoneThe format to return the response in (e.g., "json")
optionsOptional[Dict[str, Any]]NoneAdditional model options (temperature, top_p, etc.)
keep_aliveOptional[Union[float, str]]NoneHow long to keep the model loaded (e.g., "5m", 3600 seconds)
templateOptional[str]NoneThe prompt template to use
systemOptional[str]NoneSystem message to use
rawOptional[bool]NoneWhether to return raw response without formatting
streamboolTrueWhether to stream the response

Ollama is a subclass of the Model class and has access to the same params.

Responses API

Ollama v0.13.3+ supports the OpenAI Responses API via the /v1/responses endpoint. Use OllamaResponses for this interface:

1from kern.agent import Agent
2from kern.models.ollama import OllamaResponses
3
4agent = Agent(
5 model=OllamaResponses(id="gpt-oss:20b"),
6 markdown=True,
7)
8
9agent.print_response("Share a 2 sentence horror story")

The Responses API is stateless. Each request is independent with no previous_response_id chaining.

See OllamaResponses reference for full parameters.