Image Agent

Code

1from pathlib import Path
2
3from kern.agent import Agent
4from kern.media import Image
5from kern.models.ibm import WatsonX
6from kern.tools.hackernews import HackerNewsTools
7
8agent = Agent(
9 model=WatsonX(id="meta-llama/llama-3-2-11b-vision-instruct"),
10 tools=[HackerNewsTools()],
11 markdown=True,
12)
13
14image_path = Path(__file__).parent.joinpath("sample.jpg")
15
16# Read the image file content as bytes
17with open(image_path, "rb") as img_file:
18 image_bytes = img_file.read()
19
20agent.print_response(
21 "Tell me about this image and give me the latest news about it.",
22 images=[
23 Image(content=image_bytes),
24 ],
25 stream=True,
26)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Set your API key

1export IBM_WATSONX_API_KEY=xxx
2export IBM_WATSONX_PROJECT_ID=xxx

Install dependencies

1uv pip install -U ibm-watsonx-ai kern-ai

Add sample image

Place a sample image named "sample.jpg" in the same directory as the script.

Run Agent

1python cookbook/11_models/ibm/watsonx/image_agent_bytes.py

This example shows how to use IBM WatsonX with vision capabilities. It loads an image from a file and passes it to the model along with a prompt. The model can then analyze the image and provide relevant information.

Note: This example uses a vision-capable model (meta-llama/llama-3-2-11b-vision-instruct) and requires a sample image file.