Translation Agent

This example demonstrates how to create an intelligent translation agent that goes beyond simple text translation. The agent:

Translates text from one language to another
Analyzes emotional content in the translated text
Selects appropriate voices based on language and emotion
Creates localized voices using Cartesia's voice localization tools
Generates audio output with emotion-appropriate voice characteristics

The agent uses a step-by-step approach to ensure high-quality translation and voice generation, making it ideal for creating localized content that maintains the emotional tone of the original text.

Code

1from textwrap import dedent
2
3from kern.agent import Agent
4from kern.models.openai import OpenAIResponses
5from kern.tools.cartesia import CartesiaTools
6from kern.utils.media import save_audio
7
8agent_instructions = dedent(
9    """Follow these steps SEQUENTIALLY to translate text and generate a localized voice note:
10    1. Identify the text to translate and the target language from the user request.
11    2. Translate the text accurately to the target language. Keep this translated text for the final audio generation step.
12    3. Analyze the emotion conveyed by the *translated* text (e.g., neutral, happy, sad, angry, etc.).
13    4. Determine the standard 2-letter language code for the target language (e.g., 'fr' for French, 'es' for Spanish).
14    5. Call the 'list_voices' tool to get a list of available Cartesia voices. Wait for the result.
15    6. Examine the list of voices from the 'list_voices' result. Select the 'id' of an *existing* voice that:
16       a) Matches the target language code (from step 4).
17       b) Best reflects the analyzed emotion (from step 3).
18    7. Call the 'localize_voice' tool to create a new voice. Provide the following arguments:
19       - 'voice_id': The 'base_voice_id' selected in step 6.
20       - 'name': A suitable name for the new voice (e.g., "French Happy Female").
21       - 'description': A description reflecting the language and emotion.
22       - 'language': The target language code (from step 4).
23       - 'original_speaker_gender': User specified gender or the selected base voice gender.
24       Wait for the result of this tool call.
25    8. Check the result of the 'localize_voice' tool call from step 8:
26       a) If the call was successful and returned the details of the newly created voice, extract the 'id' of this **new** voice. This is the 'final_voice_id'.
27    9. Call the 'text_to_speech' tool to generate the audio. Provide:
28        - 'transcript': The translated text from step 2.
29        - 'voice_id': The 'final_voice_id' determined in step 9.
30    """
31)
32
33agent = Agent(
34    name="Emotion-Aware Translator Agent",
35    description="Translates text, analyzes emotion, selects a suitable voice,creates a localized voice, and generates a voice note (audio file) using Cartesia TTStools.",
36    instructions=agent_instructions,
37    model=OpenAIResponses(id="gpt-5.2"),
38    tools=[CartesiaTools(voice_localize_enabled=True)],
39)
40
41agent.print_response(
42    "Convert this phrase 'hello! how are you? Tell me more about the weather in Paris?' to French and create a voice note"
43)
44response = agent.get_last_run_output()
45
46print("\nChecking for Audio Artifacts on Agent...")
47if response.audio:
48    save_audio(
49        base64_data=response.audio[0].base64_audio, output_path="tmp/greeting.mp3"
50    )

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate

1uv venv --python 3.12
2.venv\Scripts\activate

Set your API key

1export OPENAI_API_KEY=xxx
2export CARTESIA_API_KEY=xxx

Install dependencies

1uv pip install -U kern-ai openai cartesia

Run Agent

1python cookbook/01_showcase/01_agents/translation_agent.py

1python cookbook/01_showcase/01_agents/translation_agent.py