Multimodal Agent
Code
1from pathlib import Path23from kern.agent import Agent4from kern.media import Image5from kern.models.ollama import Ollama67agent = Agent(8 model=Ollama(id="gemma3"),9 markdown=True,10)1112image_path = Path(__file__).parent.joinpath("sample.jpg")13agent.print_response(14 "Write a 3 sentence fiction story about the image",15 images=[Image(filepath=image_path)],16)Usage
Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateInstall Ollama
Follow the installation guide and run:
1ollama pull gemma3Install dependencies
1uv pip install -U ollama kern-aiAdd sample image
Place a sample image named sample.jpg in the same directory as your script, or update the image_path to point to your desired image.
Run Agent
1python cookbook/11_models/ollama/image_agent_bytes.py