Deep Research Multimodal Input (Interactions)

Deep Research accepts images and documents (PDFs) as input, then conducts web-based research grounded in that content. Pass them as Kern Image / File objects with a URL. GCS and Gemini URIs pass through; regular HTTP URLs are downloaded and base64-encoded automatically.

Code

1from kern.agent import Agent
2from kern.media import File, Image
3from kern.models.google import GeminiInteractions
4
5agent = Agent(
6    model=GeminiInteractions(
7        agent="deep-research-preview-04-2026",
8        thinking_summaries="auto",
9    ),
10    markdown=True,
11)
12
13if __name__ == "__main__":
14    agent.print_response(
15        "Analyze the interspecies dynamics in this image and research the "
16        "symbiotic relationships shown.",
17        images=[
18            Image(
19                url="https://storage.googleapis.com/generativeai-downloads/images/generated_elephants_giraffes_zebras_sunset.jpg"
20            )
21        ],
22    )
23
24    agent.print_response(
25        "What is this document about, and how does it relate to current "
26        "research trends?",
27        files=[File(url="https://arxiv.org/pdf/1706.03762")],
28    )

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate

1uv venv --python 3.12
2.venv\Scripts\activate

Set your API key

1export GOOGLE_API_KEY=xxx

Install dependencies

1uv pip install -U "google-genai>=2.0" kern-ai

Run Agent

1python cookbook/90_models/google/gemini_interactions/deep_research_multimodal.py