Deep Research Multimodal Input (Interactions)

Deep Research accepts images and documents (PDFs) as input, then conducts web-based research grounded in that content. Pass them as Kern Image / File objects with a URL. GCS and Gemini URIs pass through; regular HTTP URLs are downloaded and base64-encoded automatically.

Code

1from kern.agent import Agent
2from kern.media import File, Image
3from kern.models.google import GeminiInteractions
4
5agent = Agent(
6 model=GeminiInteractions(
7 agent="deep-research-preview-04-2026",
8 thinking_summaries="auto",
9 ),
10 markdown=True,
11)
12
13if __name__ == "__main__":
14 agent.print_response(
15 "Analyze the interspecies dynamics in this image and research the "
16 "symbiotic relationships shown.",
17 images=[
18 Image(
19 url="https://storage.googleapis.com/generativeai-downloads/images/generated_elephants_giraffes_zebras_sunset.jpg"
20 )
21 ],
22 )
23
24 agent.print_response(
25 "What is this document about, and how does it relate to current "
26 "research trends?",
27 files=[File(url="https://arxiv.org/pdf/1706.03762")],
28 )

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Set your API key

1export GOOGLE_API_KEY=xxx

Install dependencies

1uv pip install -U "google-genai>=2.0" kern-ai

Run Agent

1python cookbook/90_models/google/gemini_interactions/deep_research_multimodal.py