Translation Agent
This example demonstrates how to create an intelligent translation agent that goes beyond simple text translation. The agent:
- Translates text from one language to another
- Analyzes emotional content in the translated text
- Selects appropriate voices based on language and emotion
- Creates localized voices using Cartesia's voice localization tools
- Generates audio output with emotion-appropriate voice characteristics
The agent uses a step-by-step approach to ensure high-quality translation and voice generation, making it ideal for creating localized content that maintains the emotional tone of the original text.
Code
1from textwrap import dedent23from kern.agent import Agent4from kern.models.openai import OpenAIResponses5from kern.tools.cartesia import CartesiaTools6from kern.utils.media import save_audio78agent_instructions = dedent(9 """Follow these steps SEQUENTIALLY to translate text and generate a localized voice note:10 1. Identify the text to translate and the target language from the user request.11 2. Translate the text accurately to the target language. Keep this translated text for the final audio generation step.12 3. Analyze the emotion conveyed by the *translated* text (e.g., neutral, happy, sad, angry, etc.).13 4. Determine the standard 2-letter language code for the target language (e.g., 'fr' for French, 'es' for Spanish).14 5. Call the 'list_voices' tool to get a list of available Cartesia voices. Wait for the result.15 6. Examine the list of voices from the 'list_voices' result. Select the 'id' of an *existing* voice that:16 a) Matches the target language code (from step 4).17 b) Best reflects the analyzed emotion (from step 3).18 7. Call the 'localize_voice' tool to create a new voice. Provide the following arguments:19 - 'voice_id': The 'base_voice_id' selected in step 6.20 - 'name': A suitable name for the new voice (e.g., "French Happy Female").21 - 'description': A description reflecting the language and emotion.22 - 'language': The target language code (from step 4).23 - 'original_speaker_gender': User specified gender or the selected base voice gender.24 Wait for the result of this tool call.25 8. Check the result of the 'localize_voice' tool call from step 8:26 a) If the call was successful and returned the details of the newly created voice, extract the 'id' of this **new** voice. This is the 'final_voice_id'.27 9. Call the 'text_to_speech' tool to generate the audio. Provide:28 - 'transcript': The translated text from step 2.29 - 'voice_id': The 'final_voice_id' determined in step 9.30 """31)3233agent = Agent(34 name="Emotion-Aware Translator Agent",35 description="Translates text, analyzes emotion, selects a suitable voice,creates a localized voice, and generates a voice note (audio file) using Cartesia TTStools.",36 instructions=agent_instructions,37 model=OpenAIResponses(id="gpt-5.2"),38 tools=[CartesiaTools(voice_localize_enabled=True)],39)4041agent.print_response(42 "Convert this phrase 'hello! how are you? Tell me more about the weather in Paris?' to French and create a voice note"43)44response = agent.get_last_run_output()4546print("\nChecking for Audio Artifacts on Agent...")47if response.audio:48 save_audio(49 base64_data=response.audio[0].base64_audio, output_path="tmp/greeting.mp3"50 )Usage
Set up your virtual environment
1uv venv --python 3.122source .venv/bin/activate1uv venv --python 3.122.venv\Scripts\activateSet your API key
1export OPENAI_API_KEY=xxx2export CARTESIA_API_KEY=xxxInstall dependencies
1uv pip install -U kern-ai openai cartesiaRun Agent
1python cookbook/01_showcase/01_agents/translation_agent.py1python cookbook/01_showcase/01_agents/translation_agent.py