Gemini

Use Google Gemini models with Kern agents.

Gemini is a family of multimodal AI models by Google that can understand and generate text, images, audio, video, and code. See their model options here.

Gemini stands out with native multimodal understanding across images, video, and audio, built-in Google Search for real-time information, File Search for RAG over your documents, native image generation and editing, text-to-speech synthesis, and advanced reasoning with thinking models.

Model Recommendations

Model	Best For	Key Strengths
`gemini-2.0-flash`	Most use-cases	Balanced speed and intelligence
`gemini-2.0-flash-lite`	High-volume tasks	Most cost-effective
`gemini-2.5-pro`	Complex tasks	Advanced reasoning, largest context
`gemini-3-pro-preview`	Latest features	Thought signatures support

Google has rate limits on their APIs. See the docs for more information.

Installation

1uv pip install google-genai kern-ai

Authentication

There are two ways to use the Gemini class: via Google AI Studio (using GOOGLE_API_KEY) or via Vertex AI (using Google Cloud credentials).

Google AI Studio

Set the GOOGLE_API_KEY environment variable. You can get one from Google AI Studio.

1export GOOGLE_API_KEY=***

1setx GOOGLE_API_KEY ***

Vertex AI

To use Vertex AI in Google Cloud:

Refer to the Vertex AI documentation to set up a project and development environment.
Install the gcloud CLI and authenticate (refer to the quickstart for more details):

1gcloud auth application-default login

Enable Vertex AI API and set the project ID environment variable (alternatively, you can set project_id in the Agent config):

Export the following variables:

1export GOOGLE_GENAI_USE_VERTEXAI="true"
2export GOOGLE_CLOUD_PROJECT="your-project-id"
3export GOOGLE_CLOUD_LOCATION="us-central1"

Or configure directly in your agent:

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5    model=Gemini(
6        id="gemini-2.0-flash",
7        vertexai=True,
8        project_id="your-project-id",
9        location="us-central1",
10    ),
11)

Example

Use Gemini with your Agent:

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5    model=Gemini(id="gemini-2.0-flash-001"),
6    markdown=True,
7)
8
9# Print the response in the terminal
10agent.print_response("Share a 2 sentence horror story.")

NoteView more examples here.

Capabilities

images

Multimodal Input

Images, video, audio, PDFs

wand-magic-sparkles

Image Generation

Generate and edit images

magnifying-glass

Grounding and Search

Real-time web grounding

folder-open

File Search

Native RAG over documents

volume-high

Speech Generation

Audio output responses

brain

Thinking Models

Advanced reasoning

Multimodal Input

Gemini natively understands images, video, audio, and documents. See Google's vision documentation for supported formats and limits.

1from kern.agent import Agent
2from kern.media import Image
3from kern.models.google import Gemini
4
5agent = Agent(
6    model=Gemini(id="gemini-2.0-flash"),
7    markdown=True,
8)
9
10agent.print_response(
11    "Tell me about this image.",
12    images=[Image(url="https://upload.wikimedia.org/wikipedia/commons/b/bf/Krakow_-_Kosciol_Mariacki.jpg")],
13)

See the following examples:

Image input
Video input
Audio input
PDF input
GCS file input (direct GCS access, up to 2GB)
External URL input (up to 100MB)
S3 pre-signed URL

Image Generation

Generate and edit images using Gemini's native image generation. See Google's image generation documentation for more details.

1from io import BytesIO
2from kern.agent import Agent, RunOutput
3from kern.models.google import Gemini
4from PIL import Image
5
6agent = Agent(
7    model=Gemini(
8        id="gemini-2.5-flash-image",
9        response_modalities=["Text", "Image"],
10    )
11)
12
13run_response = agent.run("Make me an image of a cat in a tree.")
14
15if run_response and isinstance(run_response, RunOutput) and run_response.images:
16    for image_response in run_response.images:
17        image_bytes = image_response.content
18        if image_bytes:
19            image = Image.open(BytesIO(image_bytes))
20            image.save("generated_image.png")

Grounding and Search

Gemini models support grounding and search capabilities that enable real-time web access. See more details in Google's documentation.

Enable web search by setting search=True:

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5    model=Gemini(id="gemini-2.0-flash-exp", search=True),
6    markdown=True,
7)
8
9agent.print_response("What are the latest developments in AI?")

For legacy models, use grounding=True instead:

1agent = Agent(
2    model=Gemini(
3        id="gemini-2.0-flash",
4        grounding=True,
5        grounding_dynamic_threshold=0.7,  # Optional: set threshold
6    ),
7)

Read more about search and grounding here.

Vertex AI Search

Search over your private knowledge base using Vertex AI. See Vertex AI Search documentation for setup details.

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4datastore_id = "projects/your-project-id/locations/global/collections/default_collection/dataStores/your-datastore-id"
5
6agent = Agent(
7    model=Gemini(
8        id="gemini-2.5-flash",
9        vertexai=True,
10        vertexai_search=True,
11        vertexai_search_datastore=datastore_id,
12    ),
13    markdown=True,
14)
15
16agent.print_response("What are our company's policies regarding remote work?")

URL Context

Extract and analyze content from URLs. See Google's URL context documentation for more details.

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5    model=Gemini(id="gemini-2.5-flash", url_context=True),
6    markdown=True,
7)
8
9url1 = "https://www.foodnetwork.com/recipes/ina-garten/perfect-roast-chicken-recipe-1940592"
10url2 = "https://www.allrecipes.com/recipe/83557/juicy-roasted-chicken/"
11
12agent.print_response(
13    f"Compare the ingredients and cooking times from the recipes at {url1} and {url2}"
14)

File Search

Gemini's File Search enables RAG over your documents with automatic chunking and retrieval. See Google's File Search documentation for more details.

1from pathlib import Path
2from kern.agent import Agent
3from kern.models.google import Gemini
4
5model = Gemini(id="gemini-2.5-flash")
6agent = Agent(model=model, markdown=True)
7
8# Create a File Search store and upload documents
9store = model.create_file_search_store(display_name="My Docs")
10operation = model.upload_to_file_search_store(
11    file_path=Path("documents/sample.txt"),
12    store_name=store.name,
13    display_name="Sample Document",
14)
15model.wait_for_operation(operation)
16
17# Configure model to use File Search
18model.file_search_store_names = [store.name]
19
20# Query the documents
21run = agent.run("What are the key points in the document?")
22print(run.content)
23
24# Cleanup
25model.delete_file_search_store(store.name)

Speech Generation

Generate audio responses from the model. See Google's speech generation documentation for available voices and options.

1from kern.agent import Agent
2from kern.models.google import Gemini
3from kern.utils.audio import write_wav_audio_to_file
4
5agent = Agent(
6    model=Gemini(
7        id="gemini-2.5-flash-preview-tts",
8        response_modalities=["AUDIO"],
9        speech_config={
10            "voice_config": {"prebuilt_voice_config": {"voice_name": "Kore"}}
11        },
12    )
13)
14
15run_output = agent.run("Say cheerfully: Have a wonderful day!")
16
17if run_output.response_audio is not None:
18    audio_data = run_output.response_audio.content
19    write_wav_audio_to_file("tmp/cheerful_greeting.wav", audio_data)

Context Caching

Cache large contexts to reduce costs and latency. See Google's context caching documentation for more details.

1from kern.agent import Agent
2from kern.models.google import Gemini
3from google import genai
4
5client = genai.Client()
6
7# Upload file and create cache
8txt_file = client.files.upload(file="large_document.txt")
9cache = client.caches.create(
10    model="gemini-2.0-flash-001",
11    config={
12        "system_instruction": "You are an expert at analyzing transcripts.",
13        "contents": [txt_file],
14        "ttl": "300s",
15    },
16)
17
18# Use the cached content - no need to resend the file
19agent = Agent(
20    model=Gemini(id="gemini-2.0-flash-001", cached_content=cache.name),
21)
22run_output = agent.run("Find a lighthearted moment from this transcript")

Thinking Models

Gemini 2.5+ models support extended thinking for complex reasoning tasks. See Google's thinking documentation for more details.

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5    model=Gemini(id="gemini-2.5-pro", thinking_budget=1280, include_thoughts=True),
6    markdown=True,
7)
8
9agent.print_response("Solve this logic puzzle...")

You can also use thinking_level for simpler control:

1agent = Agent(
2    model=Gemini(id="gemini-3-pro-preview", thinking_level="low"),  # "low" or "high"
3    markdown=True,
4)

Structured Outputs

Gemini supports native structured outputs using Pydantic models:

1from kern.agent import Agent
2from kern.models.google import Gemini
3from pydantic import BaseModel
4
5class MovieScript(BaseModel):
6    name: str
7    genre: str
8    storyline: str
9
10agent = Agent(
11    model=Gemini(id="gemini-2.0-flash-001"),
12    output_schema=MovieScript,
13)

Read more about structured outputs here.

Tool Use

Gemini supports function calling to interact with external tools and APIs:

1from kern.agent import Agent
2from kern.models.google import Gemini
3from kern.tools.hackernews import HackerNewsTools
4
5agent = Agent(
6    model=Gemini(id="gemini-2.0-flash-001"),
7    tools=[HackerNewsTools()],
8    markdown=True,
9)
10
11agent.print_response("Whats happening in France?")

Params

Parameter	Type	Default	Description
`id`	`str`	`"gemini-2.0-flash-001"`	The id of the Gemini model to use
`name`	`str`	`"Gemini"`	The name of the model
`provider`	`str`	`"Google"`	The provider of the model
`api_key`	`Optional[str]`	`None`	Google API key (defaults to `GOOGLE_API_KEY` env var)
`vertexai`	`bool`	`False`	Use Vertex AI instead of AI Studio
`project_id`	`Optional[str]`	`None`	Google Cloud project ID for Vertex AI
`location`	`Optional[str]`	`None`	Google Cloud region for Vertex AI
`temperature`	`Optional[float]`	`None`	Controls randomness in the model's output
`top_p`	`Optional[float]`	`None`	Controls diversity via nucleus sampling
`top_k`	`Optional[int]`	`None`	Controls diversity via top-k sampling
`max_output_tokens`	`Optional[int]`	`None`	Maximum number of tokens to generate
`stop_sequences`	`Optional[list[str]]`	`None`	Sequences where the model should stop generating
`seed`	`Optional[int]`	`None`	Random seed for reproducibility
`logprobs`	`Optional[bool]`	`None`	Whether to return log probabilities of output tokens
`presence_penalty`	`Optional[float]`	`None`	Penalizes new tokens based on whether they appear in the text so far
`frequency_penalty`	`Optional[float]`	`None`	Penalizes new tokens based on their frequency in the text so far
`search`	`bool`	`False`	Enable Google Search grounding
`grounding`	`bool`	`False`	Enable legacy grounding (use `search` for 2.0+)
`grounding_dynamic_threshold`	`Optional[float]`	`None`	Dynamic threshold for grounding
`url_context`	`bool`	`False`	Enable URL context extraction
`vertexai_search`	`bool`	`False`	Enable Vertex AI Search
`vertexai_search_datastore`	`Optional[str]`	`None`	Vertex AI Search datastore path
`file_search_store_names`	`Optional[list[str]]`	`None`	File Search store names for RAG
`file_search_metadata_filter`	`Optional[str]`	`None`	Metadata filter for File Search
`response_modalities`	`Optional[list[str]]`	`None`	Output types: `"TEXT"`, `"IMAGE"`, `"AUDIO"`
`speech_config`	`Optional[dict]`	`None`	TTS voice configuration
`thinking_budget`	`Optional[int]`	`None`	Token budget for reasoning (Gemini 2.5+)
`include_thoughts`	`Optional[bool]`	`None`	Include thought summaries in response
`thinking_level`	`Optional[str]`	`None`	Thinking intensity: `"low"` or `"high"`
`cached_content`	`Optional[Any]`	`None`	Reference to cached context
`safety_settings`	`Optional[list]`	`None`	Content safety configuration
`function_declarations`	`Optional[List[Any]]`	`None`	List of function declarations for the model
`generation_config`	`Optional[Any]`	`None`	Custom generation configuration
`generative_model_kwargs`	`Optional[Dict[str, Any]]`	`None`	Additional keyword arguments for the generative model
`request_params`	`Optional[Dict[str, Any]]`	`None`	Additional parameters for the request
`timeout`	`Optional[float]`	`None`	Request timeout in seconds
`client_params`	`Optional[Dict[str, Any]]`	`None`	Additional parameters for client configuration

Gemini is a subclass of the Model class and has access to the same params.