Gemini
Use Google Gemini models with Kern agents.
Gemini is a family of multimodal AI models by Google that can understand and generate text, images, audio, video, and code. See their model options here.
Gemini stands out with native multimodal understanding across images, video, and audio, built-in Google Search for real-time information, File Search for RAG over your documents, native image generation and editing, text-to-speech synthesis, and advanced reasoning with thinking models.
Model Recommendations
| Model | Best For | Key Strengths |
|---|---|---|
gemini-2.0-flash | Most use-cases | Balanced speed and intelligence |
gemini-2.0-flash-lite | High-volume tasks | Most cost-effective |
gemini-2.5-pro | Complex tasks | Advanced reasoning, largest context |
gemini-3-pro-preview | Latest features | Thought signatures support |
Google has rate limits on their APIs. See the docs for more information.
Installation
1uv pip install google-genai kern-aiAuthentication
There are two ways to use the Gemini class: via Google AI Studio (using GOOGLE_API_KEY) or via Vertex AI (using Google Cloud credentials).
Google AI Studio
Set the GOOGLE_API_KEY environment variable. You can get one from Google AI Studio.
1export GOOGLE_API_KEY=***1setx GOOGLE_API_KEY ***Vertex AI
To use Vertex AI in Google Cloud:
-
Refer to the Vertex AI documentation to set up a project and development environment.
-
Install the
gcloudCLI and authenticate (refer to the quickstart for more details):
1gcloud auth application-default login- Enable Vertex AI API and set the project ID environment variable (alternatively, you can set
project_idin theAgentconfig):
Export the following variables:
1export GOOGLE_GENAI_USE_VERTEXAI="true"2export GOOGLE_CLOUD_PROJECT="your-project-id"3export GOOGLE_CLOUD_LOCATION="us-central1"Or configure directly in your agent:
1from kern.agent import Agent2from kern.models.google import Gemini34agent = Agent(5 model=Gemini(6 id="gemini-2.0-flash",7 vertexai=True,8 project_id="your-project-id",9 location="us-central1",10 ),11)Read more about Vertex AI setup here.
Example
Use Gemini with your Agent:
1from kern.agent import Agent2from kern.models.google import Gemini34agent = Agent(5 model=Gemini(id="gemini-2.0-flash-001"),6 markdown=True,7)89# Print the response in the terminal10agent.print_response("Share a 2 sentence horror story.")Capabilities
Multimodal Input
Images, video, audio, PDFs
Image Generation
Generate and edit images
Grounding and Search
Real-time web grounding
File Search
Native RAG over documents
Speech Generation
Audio output responses
Thinking Models
Advanced reasoning
Multimodal Input
Gemini natively understands images, video, audio, and documents. See Google's vision documentation for supported formats and limits.
1from kern.agent import Agent2from kern.media import Image3from kern.models.google import Gemini45agent = Agent(6 model=Gemini(id="gemini-2.0-flash"),7 markdown=True,8)910agent.print_response(11 "Tell me about this image.",12 images=[Image(url="https://upload.wikimedia.org/wikipedia/commons/b/bf/Krakow_-_Kosciol_Mariacki.jpg")],13)See the following examples:
- Image input
- Video input
- Audio input
- PDF input
- GCS file input (direct GCS access, up to 2GB)
- External URL input (up to 100MB)
- S3 pre-signed URL
Image Generation
Generate and edit images using Gemini's native image generation. See Google's image generation documentation for more details.
1from io import BytesIO2from kern.agent import Agent, RunOutput3from kern.models.google import Gemini4from PIL import Image56agent = Agent(7 model=Gemini(8 id="gemini-2.5-flash-image",9 response_modalities=["Text", "Image"],10 )11)1213run_response = agent.run("Make me an image of a cat in a tree.")1415if run_response and isinstance(run_response, RunOutput) and run_response.images:16 for image_response in run_response.images:17 image_bytes = image_response.content18 if image_bytes:19 image = Image.open(BytesIO(image_bytes))20 image.save("generated_image.png")Read more about image generation here.
Grounding and Search
Gemini models support grounding and search capabilities that enable real-time web access. See more details in Google's documentation.
Enable web search by setting search=True:
1from kern.agent import Agent2from kern.models.google import Gemini34agent = Agent(5 model=Gemini(id="gemini-2.0-flash-exp", search=True),6 markdown=True,7)89agent.print_response("What are the latest developments in AI?")For legacy models, use grounding=True instead:
1agent = Agent(2 model=Gemini(3 id="gemini-2.0-flash",4 grounding=True,5 grounding_dynamic_threshold=0.7, # Optional: set threshold6 ),7)Read more about search and grounding here.
Vertex AI Search
Search over your private knowledge base using Vertex AI. See Vertex AI Search documentation for setup details.
1from kern.agent import Agent2from kern.models.google import Gemini34datastore_id = "projects/your-project-id/locations/global/collections/default_collection/dataStores/your-datastore-id"56agent = Agent(7 model=Gemini(8 id="gemini-2.5-flash",9 vertexai=True,10 vertexai_search=True,11 vertexai_search_datastore=datastore_id,12 ),13 markdown=True,14)1516agent.print_response("What are our company's policies regarding remote work?")URL Context
Extract and analyze content from URLs. See Google's URL context documentation for more details.
1from kern.agent import Agent2from kern.models.google import Gemini34agent = Agent(5 model=Gemini(id="gemini-2.5-flash", url_context=True),6 markdown=True,7)89url1 = "https://www.foodnetwork.com/recipes/ina-garten/perfect-roast-chicken-recipe-1940592"10url2 = "https://www.allrecipes.com/recipe/83557/juicy-roasted-chicken/"1112agent.print_response(13 f"Compare the ingredients and cooking times from the recipes at {url1} and {url2}"14)Read more about URL context here.
File Search
Gemini's File Search enables RAG over your documents with automatic chunking and retrieval. See Google's File Search documentation for more details.
1from pathlib import Path2from kern.agent import Agent3from kern.models.google import Gemini45model = Gemini(id="gemini-2.5-flash")6agent = Agent(model=model, markdown=True)78# Create a File Search store and upload documents9store = model.create_file_search_store(display_name="My Docs")10operation = model.upload_to_file_search_store(11 file_path=Path("documents/sample.txt"),12 store_name=store.name,13 display_name="Sample Document",14)15model.wait_for_operation(operation)1617# Configure model to use File Search18model.file_search_store_names = [store.name]1920# Query the documents21run = agent.run("What are the key points in the document?")22print(run.content)2324# Cleanup25model.delete_file_search_store(store.name)Speech Generation
Generate audio responses from the model. See Google's speech generation documentation for available voices and options.
1from kern.agent import Agent2from kern.models.google import Gemini3from kern.utils.audio import write_wav_audio_to_file45agent = Agent(6 model=Gemini(7 id="gemini-2.5-flash-preview-tts",8 response_modalities=["AUDIO"],9 speech_config={10 "voice_config": {"prebuilt_voice_config": {"voice_name": "Kore"}}11 },12 )13)1415run_output = agent.run("Say cheerfully: Have a wonderful day!")1617if run_output.response_audio is not None:18 audio_data = run_output.response_audio.content19 write_wav_audio_to_file("tmp/cheerful_greeting.wav", audio_data)Context Caching
Cache large contexts to reduce costs and latency. See Google's context caching documentation for more details.
1from kern.agent import Agent2from kern.models.google import Gemini3from google import genai45client = genai.Client()67# Upload file and create cache8txt_file = client.files.upload(file="large_document.txt")9cache = client.caches.create(10 model="gemini-2.0-flash-001",11 config={12 "system_instruction": "You are an expert at analyzing transcripts.",13 "contents": [txt_file],14 "ttl": "300s",15 },16)1718# Use the cached content - no need to resend the file19agent = Agent(20 model=Gemini(id="gemini-2.0-flash-001", cached_content=cache.name),21)22run_output = agent.run("Find a lighthearted moment from this transcript")Thinking Models
Gemini 2.5+ models support extended thinking for complex reasoning tasks. See Google's thinking documentation for more details.
1from kern.agent import Agent2from kern.models.google import Gemini34agent = Agent(5 model=Gemini(id="gemini-2.5-pro", thinking_budget=1280, include_thoughts=True),6 markdown=True,7)89agent.print_response("Solve this logic puzzle...")You can also use thinking_level for simpler control:
1agent = Agent(2 model=Gemini(id="gemini-3-pro-preview", thinking_level="low"), # "low" or "high"3 markdown=True,4)Read more about thinking models here.
Structured Outputs
Gemini supports native structured outputs using Pydantic models:
1from kern.agent import Agent2from kern.models.google import Gemini3from pydantic import BaseModel45class MovieScript(BaseModel):6 name: str7 genre: str8 storyline: str910agent = Agent(11 model=Gemini(id="gemini-2.0-flash-001"),12 output_schema=MovieScript,13)Read more about structured outputs here.
Tool Use
Gemini supports function calling to interact with external tools and APIs:
1from kern.agent import Agent2from kern.models.google import Gemini3from kern.tools.hackernews import HackerNewsTools45agent = Agent(6 model=Gemini(id="gemini-2.0-flash-001"),7 tools=[HackerNewsTools()],8 markdown=True,9)1011agent.print_response("Whats happening in France?")Read more about tool use here.
Params
| Parameter | Type | Default | Description |
|---|---|---|---|
id | str | "gemini-2.0-flash-001" | The id of the Gemini model to use |
name | str | "Gemini" | The name of the model |
provider | str | "Google" | The provider of the model |
api_key | Optional[str] | None | Google API key (defaults to GOOGLE_API_KEY env var) |
vertexai | bool | False | Use Vertex AI instead of AI Studio |
project_id | Optional[str] | None | Google Cloud project ID for Vertex AI |
location | Optional[str] | None | Google Cloud region for Vertex AI |
temperature | Optional[float] | None | Controls randomness in the model's output |
top_p | Optional[float] | None | Controls diversity via nucleus sampling |
top_k | Optional[int] | None | Controls diversity via top-k sampling |
max_output_tokens | Optional[int] | None | Maximum number of tokens to generate |
stop_sequences | Optional[list[str]] | None | Sequences where the model should stop generating |
seed | Optional[int] | None | Random seed for reproducibility |
logprobs | Optional[bool] | None | Whether to return log probabilities of output tokens |
presence_penalty | Optional[float] | None | Penalizes new tokens based on whether they appear in the text so far |
frequency_penalty | Optional[float] | None | Penalizes new tokens based on their frequency in the text so far |
search | bool | False | Enable Google Search grounding |
grounding | bool | False | Enable legacy grounding (use search for 2.0+) |
grounding_dynamic_threshold | Optional[float] | None | Dynamic threshold for grounding |
url_context | bool | False | Enable URL context extraction |
vertexai_search | bool | False | Enable Vertex AI Search |
vertexai_search_datastore | Optional[str] | None | Vertex AI Search datastore path |
file_search_store_names | Optional[list[str]] | None | File Search store names for RAG |
file_search_metadata_filter | Optional[str] | None | Metadata filter for File Search |
response_modalities | Optional[list[str]] | None | Output types: "TEXT", "IMAGE", "AUDIO" |
speech_config | Optional[dict] | None | TTS voice configuration |
thinking_budget | Optional[int] | None | Token budget for reasoning (Gemini 2.5+) |
include_thoughts | Optional[bool] | None | Include thought summaries in response |
thinking_level | Optional[str] | None | Thinking intensity: "low" or "high" |
cached_content | Optional[Any] | None | Reference to cached context |
safety_settings | Optional[list] | None | Content safety configuration |
function_declarations | Optional[List[Any]] | None | List of function declarations for the model |
generation_config | Optional[Any] | None | Custom generation configuration |
generative_model_kwargs | Optional[Dict[str, Any]] | None | Additional keyword arguments for the generative model |
request_params | Optional[Dict[str, Any]] | None | Additional parameters for the request |
timeout | Optional[float] | None | Request timeout in seconds |
client_params | Optional[Dict[str, Any]] | None | Additional parameters for client configuration |
Gemini is a subclass of the Model class and has access to the same params.