OpenAI

OpenAITools allow an Agent to interact with OpenAI models for performing audio transcription, image generation, and text-to-speech.

Prerequisites

Before using OpenAITools, ensure you have the openai library installed and your OpenAI API key configured.

Install dependencies:
```
1uv pip install -U openai
```
Set your API key: Obtain your API key from OpenAI and set it as an environment variable.
```
1export OPENAI_API_KEY=xxx
```
```
1setx OPENAI_API_KEY xxx
```

Initialization

Import OpenAITools and add it to your Agent's tool list.

1from kern.agent import Agent
2from kern.tools.openai import OpenAITools
3
4agent = Agent(
5    name="OpenAI Agent",
6    tools=[OpenAITools()],
7        markdown=True,
8)

Usage Examples

1. Transcribing Audio

This example demonstrates an agent that transcribes an audio file.

1from pathlib import Path
2from kern.agent import Agent
3from kern.tools.openai import OpenAITools
4from kern.utils.media import download_file
5
6audio_url = "https://kern-public.s3.amazonaws.com/demo_data/sample_conversation.wav"
7
8local_audio_path = Path("tmp/sample_conversation.wav")
9download_file(audio_url, local_audio_path)
10
11agent = Agent(
12    name="OpenAI Transcription Agent",
13    tools=[OpenAITools(transcription_model="whisper-1")],
14        markdown=True,
15)
16
17agent.print_response(f"Transcribe the audio file located at '{local_audio_path}'")

2. Generating Images

This example demonstrates an agent that generates an image based on a text prompt.

1from kern.agent import Agent
2from kern.tools.openai import OpenAITools
3from kern.utils.media import save_base64_data
4
5agent = Agent(
6    name="OpenAI Image Generation Agent",
7    tools=[OpenAITools(image_model="dall-e-3")],
8        markdown=True,
9)
10
11response = agent.run("Generate a photorealistic image of a cozy coffee shop interior")
12
13if response.images:
14    save_base64_data(response.images[0].content, "tmp/coffee_shop.png")

3. Generating Speech

This example demonstrates an agent that generates speech from text.

1from kern.agent import Agent
2from kern.tools.openai import OpenAITools
3from kern.utils.media import save_base64_data
4
5agent = Agent(
6    name="OpenAI Speech Agent",
7    tools=[OpenAITools(
8        text_to_speech_model="tts-1",
9        text_to_speech_voice="alloy",
10        text_to_speech_format="mp3"
11    )],
12    markdown=True,
13)
14
15response = agent.run("Generate audio for the text: 'Hello, this is a synthesized voice example.'")
16if response and response.audio:
17    save_base64_data(response.audio[0].base64_audio, "tmp/hello.mp3")

Note View more examples here.

Customization

You can customize the underlying OpenAI models used for transcription, image generation, and TTS:

1OpenAITools(
2    transcription_model="whisper-1",
3    image_model="dall-e-3",
4    text_to_speech_model="tts-1-hd",
5    text_to_speech_voice="nova",
6    text_to_speech_format="wav"
7)

Toolkit Params

Parameter	Type	Default	Description
`api_key`	`str`	`None`	OpenAI API key. Uses OPENAI_API_KEY env var if not provided
`enable_transcription`	`bool`	`True`	Enable audio transcription functionality
`enable_image_generation`	`bool`	`True`	Enable image generation functionality
`enable_speech_generation`	`bool`	`True`	Enable speech generation functionality
`all`	`bool`	`False`	Enable all tools when set to True
`transcription_model`	`str`	`whisper-1`	Model to use for audio transcription
`text_to_speech_voice`	`str`	`alloy`	Voice to use for text-to-speech (alloy, echo, fable, onyx, nova, shimmer)
`text_to_speech_model`	`str`	`tts-1`	Model to use for text-to-speech (tts-1, tts-1-hd)
`text_to_speech_format`	`str`	`mp3`	Audio format for TTS output (mp3, opus, aac, flac, wav, pcm)
`image_model`	`str`	`dall-e-3`	Model to use for image generation
`image_quality`	`str`	`None`	Quality setting for image generation
`image_size`	`str`	`None`	Size setting for image generation
`image_style`	`str`	`None`	Style setting for image generation (vivid, natural)

Toolkit Functions

The OpenAITools toolkit provides the following functions:

Function	Description
`transcribe_audio`	Transcribes audio from a local file path or a public URL
`generate_image`	Generates images based on a text prompt
`generate_speech`	Synthesizes speech from text