Audio Input (Bytes Content)

Code

1import requests
2from kern.agent import Agent
3from kern.media import Audio
4from kern.models.google import Gemini
5
6agent = Agent(
7 model=Gemini(id="gemini-2.0-flash-exp"),
8 markdown=True,
9)
10
11url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"
12
13# Download the audio file from the URL as bytes
14response = requests.get(url)
15audio_content = response.content
16
17agent.print_response(
18 "Tell me about this audio",
19 audio=[Audio(content=audio_content)],
20)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Set your API key

1export GOOGLE_API_KEY=xxx

Install dependencies

1uv pip install -U google-genai requests kern-ai

Run Agent

1python cookbook/11_models/google/gemini/audio_input_bytes_content.py