Agent with Image Input

Code

1from pathlib import Path
2
3from kern.agent import Agent
4from kern.media import Image
5from kern.models.meta import LlamaOpenAI
6from kern.tools.hackernews import HackerNewsTools
7from kern.utils.media import download_image
8
9agent = Agent(
10    model=LlamaOpenAI(id="Llama-4-Maverick-17B-128E-Instruct-FP8"),
11    tools=[HackerNewsTools()],
12    markdown=True,
13)
14
15image_path = Path(__file__).parent.joinpath("sample.jpg")
16
17download_image(
18    url="https://upload.wikimedia.org/wikipedia/commons/0/0c/GoldenGateBridge-001.jpg",
19    output_path=str(image_path),
20)
21
22# Read the image file content as bytes
23image_bytes = image_path.read_bytes()
24
25agent.print_response(
26    "Tell me about this image and give me the latest news about it.",
27    images=[
28        Image(content=image_bytes),
29    ],
30    stream=True,
31)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate

1uv venv --python 3.12
2.venv\Scripts\activate

Set your LLAMA API key

1export LLAMA_API_KEY=YOUR_API_KEY

Install dependencies

1uv pip install llama-api-client kern-ai

Run Agent

1python cookbook/11_models/meta/async_image_input.py