Gemini

Use Google Gemini models with Kern agents.

Gemini is a family of multimodal AI models by Google that can understand and generate text, images, audio, video, and code. See their model options here.

Gemini stands out with native multimodal understanding across images, video, and audio, built-in Google Search for real-time information, File Search for RAG over your documents, native image generation and editing, text-to-speech synthesis, and advanced reasoning with thinking models.

Model Recommendations

ModelBest ForKey Strengths
gemini-2.0-flashMost use-casesBalanced speed and intelligence
gemini-2.0-flash-liteHigh-volume tasksMost cost-effective
gemini-2.5-proComplex tasksAdvanced reasoning, largest context
gemini-3-pro-previewLatest featuresThought signatures support

Google has rate limits on their APIs. See the docs for more information.

Installation

1uv pip install google-genai kern-ai

Authentication

There are two ways to use the Gemini class: via Google AI Studio (using GOOGLE_API_KEY) or via Vertex AI (using Google Cloud credentials).

Google AI Studio

Set the GOOGLE_API_KEY environment variable. You can get one from Google AI Studio.

1export GOOGLE_API_KEY=***
1setx GOOGLE_API_KEY ***

Vertex AI

To use Vertex AI in Google Cloud:

  1. Refer to the Vertex AI documentation to set up a project and development environment.

  2. Install the gcloud CLI and authenticate (refer to the quickstart for more details):

1gcloud auth application-default login
  1. Enable Vertex AI API and set the project ID environment variable (alternatively, you can set project_id in the Agent config):

Export the following variables:

1export GOOGLE_GENAI_USE_VERTEXAI="true"
2export GOOGLE_CLOUD_PROJECT="your-project-id"
3export GOOGLE_CLOUD_LOCATION="us-central1"

Or configure directly in your agent:

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5 model=Gemini(
6 id="gemini-2.0-flash",
7 vertexai=True,
8 project_id="your-project-id",
9 location="us-central1",
10 ),
11)

Read more about Vertex AI setup here.

Example

Use Gemini with your Agent:

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5 model=Gemini(id="gemini-2.0-flash-001"),
6 markdown=True,
7)
8
9# Print the response in the terminal
10agent.print_response("Share a 2 sentence horror story.")
NoteView more examples here.

Capabilities

Multimodal Input

Gemini natively understands images, video, audio, and documents. See Google's vision documentation for supported formats and limits.

1from kern.agent import Agent
2from kern.media import Image
3from kern.models.google import Gemini
4
5agent = Agent(
6 model=Gemini(id="gemini-2.0-flash"),
7 markdown=True,
8)
9
10agent.print_response(
11 "Tell me about this image.",
12 images=[Image(url="https://upload.wikimedia.org/wikipedia/commons/b/bf/Krakow_-_Kosciol_Mariacki.jpg")],
13)

See the following examples:

Image Generation

Generate and edit images using Gemini's native image generation. See Google's image generation documentation for more details.

1from io import BytesIO
2from kern.agent import Agent, RunOutput
3from kern.models.google import Gemini
4from PIL import Image
5
6agent = Agent(
7 model=Gemini(
8 id="gemini-2.5-flash-image",
9 response_modalities=["Text", "Image"],
10 )
11)
12
13run_response = agent.run("Make me an image of a cat in a tree.")
14
15if run_response and isinstance(run_response, RunOutput) and run_response.images:
16 for image_response in run_response.images:
17 image_bytes = image_response.content
18 if image_bytes:
19 image = Image.open(BytesIO(image_bytes))
20 image.save("generated_image.png")

Read more about image generation here.

Grounding and Search

Gemini models support grounding and search capabilities that enable real-time web access. See more details in Google's documentation.

Enable web search by setting search=True:

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5 model=Gemini(id="gemini-2.0-flash-exp", search=True),
6 markdown=True,
7)
8
9agent.print_response("What are the latest developments in AI?")

For legacy models, use grounding=True instead:

1agent = Agent(
2 model=Gemini(
3 id="gemini-2.0-flash",
4 grounding=True,
5 grounding_dynamic_threshold=0.7, # Optional: set threshold
6 ),
7)

Read more about search and grounding here.

Vertex AI Search

Search over your private knowledge base using Vertex AI. See Vertex AI Search documentation for setup details.

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4datastore_id = "projects/your-project-id/locations/global/collections/default_collection/dataStores/your-datastore-id"
5
6agent = Agent(
7 model=Gemini(
8 id="gemini-2.5-flash",
9 vertexai=True,
10 vertexai_search=True,
11 vertexai_search_datastore=datastore_id,
12 ),
13 markdown=True,
14)
15
16agent.print_response("What are our company's policies regarding remote work?")

URL Context

Extract and analyze content from URLs. See Google's URL context documentation for more details.

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5 model=Gemini(id="gemini-2.5-flash", url_context=True),
6 markdown=True,
7)
8
9url1 = "https://www.foodnetwork.com/recipes/ina-garten/perfect-roast-chicken-recipe-1940592"
10url2 = "https://www.allrecipes.com/recipe/83557/juicy-roasted-chicken/"
11
12agent.print_response(
13 f"Compare the ingredients and cooking times from the recipes at {url1} and {url2}"
14)

Read more about URL context here.

File Search

Gemini's File Search enables RAG over your documents with automatic chunking and retrieval. See Google's File Search documentation for more details.

1from pathlib import Path
2from kern.agent import Agent
3from kern.models.google import Gemini
4
5model = Gemini(id="gemini-2.5-flash")
6agent = Agent(model=model, markdown=True)
7
8# Create a File Search store and upload documents
9store = model.create_file_search_store(display_name="My Docs")
10operation = model.upload_to_file_search_store(
11 file_path=Path("documents/sample.txt"),
12 store_name=store.name,
13 display_name="Sample Document",
14)
15model.wait_for_operation(operation)
16
17# Configure model to use File Search
18model.file_search_store_names = [store.name]
19
20# Query the documents
21run = agent.run("What are the key points in the document?")
22print(run.content)
23
24# Cleanup
25model.delete_file_search_store(store.name)

Speech Generation

Generate audio responses from the model. See Google's speech generation documentation for available voices and options.

1from kern.agent import Agent
2from kern.models.google import Gemini
3from kern.utils.audio import write_wav_audio_to_file
4
5agent = Agent(
6 model=Gemini(
7 id="gemini-2.5-flash-preview-tts",
8 response_modalities=["AUDIO"],
9 speech_config={
10 "voice_config": {"prebuilt_voice_config": {"voice_name": "Kore"}}
11 },
12 )
13)
14
15run_output = agent.run("Say cheerfully: Have a wonderful day!")
16
17if run_output.response_audio is not None:
18 audio_data = run_output.response_audio.content
19 write_wav_audio_to_file("tmp/cheerful_greeting.wav", audio_data)

Context Caching

Cache large contexts to reduce costs and latency. See Google's context caching documentation for more details.

1from kern.agent import Agent
2from kern.models.google import Gemini
3from google import genai
4
5client = genai.Client()
6
7# Upload file and create cache
8txt_file = client.files.upload(file="large_document.txt")
9cache = client.caches.create(
10 model="gemini-2.0-flash-001",
11 config={
12 "system_instruction": "You are an expert at analyzing transcripts.",
13 "contents": [txt_file],
14 "ttl": "300s",
15 },
16)
17
18# Use the cached content - no need to resend the file
19agent = Agent(
20 model=Gemini(id="gemini-2.0-flash-001", cached_content=cache.name),
21)
22run_output = agent.run("Find a lighthearted moment from this transcript")

Thinking Models

Gemini 2.5+ models support extended thinking for complex reasoning tasks. See Google's thinking documentation for more details.

1from kern.agent import Agent
2from kern.models.google import Gemini
3
4agent = Agent(
5 model=Gemini(id="gemini-2.5-pro", thinking_budget=1280, include_thoughts=True),
6 markdown=True,
7)
8
9agent.print_response("Solve this logic puzzle...")

You can also use thinking_level for simpler control:

1agent = Agent(
2 model=Gemini(id="gemini-3-pro-preview", thinking_level="low"), # "low" or "high"
3 markdown=True,
4)

Read more about thinking models here.

Structured Outputs

Gemini supports native structured outputs using Pydantic models:

1from kern.agent import Agent
2from kern.models.google import Gemini
3from pydantic import BaseModel
4
5class MovieScript(BaseModel):
6 name: str
7 genre: str
8 storyline: str
9
10agent = Agent(
11 model=Gemini(id="gemini-2.0-flash-001"),
12 output_schema=MovieScript,
13)

Read more about structured outputs here.

Tool Use

Gemini supports function calling to interact with external tools and APIs:

1from kern.agent import Agent
2from kern.models.google import Gemini
3from kern.tools.hackernews import HackerNewsTools
4
5agent = Agent(
6 model=Gemini(id="gemini-2.0-flash-001"),
7 tools=[HackerNewsTools()],
8 markdown=True,
9)
10
11agent.print_response("Whats happening in France?")

Read more about tool use here.

Params

ParameterTypeDefaultDescription
idstr"gemini-2.0-flash-001"The id of the Gemini model to use
namestr"Gemini"The name of the model
providerstr"Google"The provider of the model
api_keyOptional[str]NoneGoogle API key (defaults to GOOGLE_API_KEY env var)
vertexaiboolFalseUse Vertex AI instead of AI Studio
project_idOptional[str]NoneGoogle Cloud project ID for Vertex AI
locationOptional[str]NoneGoogle Cloud region for Vertex AI
temperatureOptional[float]NoneControls randomness in the model's output
top_pOptional[float]NoneControls diversity via nucleus sampling
top_kOptional[int]NoneControls diversity via top-k sampling
max_output_tokensOptional[int]NoneMaximum number of tokens to generate
stop_sequencesOptional[list[str]]NoneSequences where the model should stop generating
seedOptional[int]NoneRandom seed for reproducibility
logprobsOptional[bool]NoneWhether to return log probabilities of output tokens
presence_penaltyOptional[float]NonePenalizes new tokens based on whether they appear in the text so far
frequency_penaltyOptional[float]NonePenalizes new tokens based on their frequency in the text so far
searchboolFalseEnable Google Search grounding
groundingboolFalseEnable legacy grounding (use search for 2.0+)
grounding_dynamic_thresholdOptional[float]NoneDynamic threshold for grounding
url_contextboolFalseEnable URL context extraction
vertexai_searchboolFalseEnable Vertex AI Search
vertexai_search_datastoreOptional[str]NoneVertex AI Search datastore path
file_search_store_namesOptional[list[str]]NoneFile Search store names for RAG
file_search_metadata_filterOptional[str]NoneMetadata filter for File Search
response_modalitiesOptional[list[str]]NoneOutput types: "TEXT", "IMAGE", "AUDIO"
speech_configOptional[dict]NoneTTS voice configuration
thinking_budgetOptional[int]NoneToken budget for reasoning (Gemini 2.5+)
include_thoughtsOptional[bool]NoneInclude thought summaries in response
thinking_levelOptional[str]NoneThinking intensity: "low" or "high"
cached_contentOptional[Any]NoneReference to cached context
safety_settingsOptional[list]NoneContent safety configuration
function_declarationsOptional[List[Any]]NoneList of function declarations for the model
generation_configOptional[Any]NoneCustom generation configuration
generative_model_kwargsOptional[Dict[str, Any]]NoneAdditional keyword arguments for the generative model
request_paramsOptional[Dict[str, Any]]NoneAdditional parameters for the request
timeoutOptional[float]NoneRequest timeout in seconds
client_paramsOptional[Dict[str, Any]]NoneAdditional parameters for client configuration

Gemini is a subclass of the Model class and has access to the same params.