OpenAI
OpenAITools allow an Agent to interact with OpenAI models for performing audio transcription, image generation, and text-to-speech.
Prerequisites
Before using OpenAITools, ensure you have the openai library installed and your OpenAI API key configured.
-
Install dependencies:
1uv pip install -U openai -
Set your API key: Obtain your API key from OpenAI and set it as an environment variable.
1export OPENAI_API_KEY=xxx1setx OPENAI_API_KEY xxx
Initialization
Import OpenAITools and add it to your Agent's tool list.
1from kern.agent import Agent2from kern.tools.openai import OpenAITools34agent = Agent(5 name="OpenAI Agent",6 tools=[OpenAITools()],7 markdown=True,8)Usage Examples
1. Transcribing Audio
This example demonstrates an agent that transcribes an audio file.
1from pathlib import Path2from kern.agent import Agent3from kern.tools.openai import OpenAITools4from kern.utils.media import download_file56audio_url = "https://kern-public.s3.amazonaws.com/demo_data/sample_conversation.wav"78local_audio_path = Path("tmp/sample_conversation.wav")9download_file(audio_url, local_audio_path)1011agent = Agent(12 name="OpenAI Transcription Agent",13 tools=[OpenAITools(transcription_model="whisper-1")],14 markdown=True,15)1617agent.print_response(f"Transcribe the audio file located at '{local_audio_path}'")2. Generating Images
This example demonstrates an agent that generates an image based on a text prompt.
1from kern.agent import Agent2from kern.tools.openai import OpenAITools3from kern.utils.media import save_base64_data45agent = Agent(6 name="OpenAI Image Generation Agent",7 tools=[OpenAITools(image_model="dall-e-3")],8 markdown=True,9)1011response = agent.run("Generate a photorealistic image of a cozy coffee shop interior")1213if response.images:14 save_base64_data(response.images[0].content, "tmp/coffee_shop.png")3. Generating Speech
This example demonstrates an agent that generates speech from text.
1from kern.agent import Agent2from kern.tools.openai import OpenAITools3from kern.utils.media import save_base64_data45agent = Agent(6 name="OpenAI Speech Agent",7 tools=[OpenAITools(8 text_to_speech_model="tts-1",9 text_to_speech_voice="alloy",10 text_to_speech_format="mp3"11 )],12 markdown=True,13)1415response = agent.run("Generate audio for the text: 'Hello, this is a synthesized voice example.'")16if response and response.audio:17 save_base64_data(response.audio[0].base64_audio, "tmp/hello.mp3")Customization
You can customize the underlying OpenAI models used for transcription, image generation, and TTS:
1OpenAITools(2 transcription_model="whisper-1",3 image_model="dall-e-3",4 text_to_speech_model="tts-1-hd",5 text_to_speech_voice="nova",6 text_to_speech_format="wav"7)Toolkit Params
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key | str | None | OpenAI API key. Uses OPENAI_API_KEY env var if not provided |
enable_transcription | bool | True | Enable audio transcription functionality |
enable_image_generation | bool | True | Enable image generation functionality |
enable_speech_generation | bool | True | Enable speech generation functionality |
all | bool | False | Enable all tools when set to True |
transcription_model | str | whisper-1 | Model to use for audio transcription |
text_to_speech_voice | str | alloy | Voice to use for text-to-speech (alloy, echo, fable, onyx, nova, shimmer) |
text_to_speech_model | str | tts-1 | Model to use for text-to-speech (tts-1, tts-1-hd) |
text_to_speech_format | str | mp3 | Audio format for TTS output (mp3, opus, aac, flac, wav, pcm) |
image_model | str | dall-e-3 | Model to use for image generation |
image_quality | str | None | Quality setting for image generation |
image_size | str | None | Size setting for image generation |
image_style | str | None | Style setting for image generation (vivid, natural) |
Toolkit Functions
The OpenAITools toolkit provides the following functions:
| Function | Description |
|---|---|
transcribe_audio | Transcribes audio from a local file path or a public URL |
generate_image | Generates images based on a text prompt |
generate_speech | Synthesizes speech from text |
Developer Resources
- View Tools
- View OpenAI Transcription Guide
- View OpenAI Image Generation Guide
- View OpenAI Text-to-Speech Guide