OpenAI

OpenAITools allow an Agent to interact with OpenAI models for performing audio transcription, image generation, and text-to-speech.

Prerequisites

Before using OpenAITools, ensure you have the openai library installed and your OpenAI API key configured.

  1. Install dependencies:

    1uv pip install -U openai
  2. Set your API key: Obtain your API key from OpenAI and set it as an environment variable.

    1export OPENAI_API_KEY=xxx
    1setx OPENAI_API_KEY xxx

Initialization

Import OpenAITools and add it to your Agent's tool list.

1from kern.agent import Agent
2from kern.tools.openai import OpenAITools
3
4agent = Agent(
5 name="OpenAI Agent",
6 tools=[OpenAITools()],
7 markdown=True,
8)

Usage Examples

1. Transcribing Audio

This example demonstrates an agent that transcribes an audio file.

1from pathlib import Path
2from kern.agent import Agent
3from kern.tools.openai import OpenAITools
4from kern.utils.media import download_file
5
6audio_url = "https://kern-public.s3.amazonaws.com/demo_data/sample_conversation.wav"
7
8local_audio_path = Path("tmp/sample_conversation.wav")
9download_file(audio_url, local_audio_path)
10
11agent = Agent(
12 name="OpenAI Transcription Agent",
13 tools=[OpenAITools(transcription_model="whisper-1")],
14 markdown=True,
15)
16
17agent.print_response(f"Transcribe the audio file located at '{local_audio_path}'")

2. Generating Images

This example demonstrates an agent that generates an image based on a text prompt.

1from kern.agent import Agent
2from kern.tools.openai import OpenAITools
3from kern.utils.media import save_base64_data
4
5agent = Agent(
6 name="OpenAI Image Generation Agent",
7 tools=[OpenAITools(image_model="dall-e-3")],
8 markdown=True,
9)
10
11response = agent.run("Generate a photorealistic image of a cozy coffee shop interior")
12
13if response.images:
14 save_base64_data(response.images[0].content, "tmp/coffee_shop.png")

3. Generating Speech

This example demonstrates an agent that generates speech from text.

1from kern.agent import Agent
2from kern.tools.openai import OpenAITools
3from kern.utils.media import save_base64_data
4
5agent = Agent(
6 name="OpenAI Speech Agent",
7 tools=[OpenAITools(
8 text_to_speech_model="tts-1",
9 text_to_speech_voice="alloy",
10 text_to_speech_format="mp3"
11 )],
12 markdown=True,
13)
14
15response = agent.run("Generate audio for the text: 'Hello, this is a synthesized voice example.'")
16if response and response.audio:
17 save_base64_data(response.audio[0].base64_audio, "tmp/hello.mp3")
Note View more examples here.

Customization

You can customize the underlying OpenAI models used for transcription, image generation, and TTS:

1OpenAITools(
2 transcription_model="whisper-1",
3 image_model="dall-e-3",
4 text_to_speech_model="tts-1-hd",
5 text_to_speech_voice="nova",
6 text_to_speech_format="wav"
7)

Toolkit Params

ParameterTypeDefaultDescription
api_keystrNoneOpenAI API key. Uses OPENAI_API_KEY env var if not provided
enable_transcriptionboolTrueEnable audio transcription functionality
enable_image_generationboolTrueEnable image generation functionality
enable_speech_generationboolTrueEnable speech generation functionality
allboolFalseEnable all tools when set to True
transcription_modelstrwhisper-1Model to use for audio transcription
text_to_speech_voicestralloyVoice to use for text-to-speech (alloy, echo, fable, onyx, nova, shimmer)
text_to_speech_modelstrtts-1Model to use for text-to-speech (tts-1, tts-1-hd)
text_to_speech_formatstrmp3Audio format for TTS output (mp3, opus, aac, flac, wav, pcm)
image_modelstrdall-e-3Model to use for image generation
image_qualitystrNoneQuality setting for image generation
image_sizestrNoneSize setting for image generation
image_stylestrNoneStyle setting for image generation (vivid, natural)

Toolkit Functions

The OpenAITools toolkit provides the following functions:

FunctionDescription
transcribe_audioTranscribes audio from a local file path or a public URL
generate_imageGenerates images based on a text prompt
generate_speechSynthesizes speech from text

Developer Resources