Multimodal Agent

Code

1from pathlib import Path
2
3from kern.agent import Agent
4from kern.media import Image
5from kern.models.ollama import Ollama
6
7agent = Agent(
8    model=Ollama(id="gemma3"),
9    markdown=True,
10)
11
12image_path = Path(__file__).parent.joinpath("sample.jpg")
13agent.print_response(
14    "Write a 3 sentence fiction story about the image",
15    images=[Image(filepath=image_path)],
16)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate

1uv venv --python 3.12
2.venv\Scripts\activate

Install Ollama

Follow the installation guide and run:

1ollama pull gemma3

Install dependencies

1uv pip install -U ollama kern-ai

Add sample image

Place a sample image named sample.jpg in the same directory as your script, or update the image_path to point to your desired image.

Run Agent

1python cookbook/11_models/ollama/image_agent_bytes.py