Deep Research Multimodal Input (Interactions)
Deep Research accepts images and documents (PDFs) as input, then conducts web-based research grounded in that content. Pass them as Kern Image / File objects with a URL. GCS and Gemini URIs pass through; regular HTTP URLs are downloaded and base64-encoded automatically.
Code
1from kern.agent import Agent2from kern.media import File, Image3from kern.models.google import GeminiInteractions45agent = Agent(6 model=GeminiInteractions(7 agent="deep-research-preview-04-2026",8 thinking_summaries="auto",9 ),10 markdown=True,11)1213if __name__ == "__main__":14 agent.print_response(15 "Analyze the interspecies dynamics in this image and research the "16 "symbiotic relationships shown.",17 images=[18 Image(19 url="https://storage.googleapis.com/generativeai-downloads/images/generated_elephants_giraffes_zebras_sunset.jpg"20 )21 ],22 )2324 agent.print_response(25 "What is this document about, and how does it relate to current "26 "research trends?",27 files=[File(url="https://arxiv.org/pdf/1706.03762")],28 )Usage
Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateSet your API key
1export GOOGLE_API_KEY=xxxInstall dependencies
1uv pip install -U "google-genai>=2.0" kern-aiRun Agent
1python cookbook/90_models/google/gemini_interactions/deep_research_multimodal.py