Cartesia

Tools for interacting with Cartesia Voice AI services including text-to-speech and voice localization

CartesiaTools enable an Agent to perform text-to-speech, list available voices, and localize voices using Cartesia.

Prerequisites

The following example requires the cartesia library and an API key.

1uv pip install cartesia
1export CARTESIA_API_KEY="your_api_key_here"

Example

1from kern.agent import Agent
2from kern.tools.cartesia import CartesiaTools
3from kern.utils.audio import write_audio_to_file
4
5# Initialize Agent with Cartesia tools
6agent = Agent(
7 name="Cartesia TTS Agent",
8 description="An agent that uses Cartesia for text-to-speech.",
9 tools=[CartesiaTools()],
10)
11
12response = agent.run(
13 """Generate a simple greeting using Text-to-Speech:
14
15 Say "Welcome to Cartesia, the advanced speech synthesis platform. This speech is generated by an agent."
16 """
17)
18
19# Save the generated audio
20if response.audio:
21 write_audio_to_file(audio=response.audio[0].content, filename="tmp/greeting.mp3")

Advanced Example: Translation and Voice Localization

This example demonstrates how to translate text, analyze emotion, localize a new voice, and generate a voice note using CartesiaTools.

1from textwrap import dedent
2from kern.agent import Agent
3from kern.models.openai import OpenAIResponses
4from kern.tools.cartesia import CartesiaTools
5from kern.utils.audio import write_audio_to_file
6
7agent_instructions = dedent(
8 """Follow these steps SEQUENTIALLY to translate text and generate a localized voice note:
9 1. Identify the text to translate and the target language from the user request.
10 2. Translate the text accurately to the target language.
11 3. Analyze the emotion conveyed by the translated text.
12 4. Call `list_voices` to retrieve available voices.
13 5. Select a base voice matching the language and emotion.
14 6. Call `localize_voice` to create a new localized voice.
15 7. Call `text_to_speech` to generate the final audio.
16 """
17)
18
19agent = Agent(
20 name="Emotion-Aware Translator Agent",
21 description="Translates text, analyzes emotion, selects a suitable voice, creates a localized voice, and generates a voice note (audio file) using Cartesia TTS tools.",
22 instructions=agent_instructions,
23 model=OpenAIResponses(id="gpt-5.2"),
24 tools=[CartesiaTools(enable_localize_voice=True)],
25 )
26
27agent.print_response(
28 "Translate 'Hello! How are you? Tell me more about the weather in Paris?' to French and create a voice note."
29)
30response = agent.run_response
31
32if response.audio:
33 write_audio_to_file(
34 response.audio[0].base64_audio,
35 filename="french_weather.mp3",
36 )

Toolkit Params

ParameterTypeDefaultDescription
api_keystrNoneThe Cartesia API key for authentication. If not provided, uses the CARTESIA_API_KEY env variable.
model_idstrsonic-2The model ID to use for text-to-speech.
default_voice_idstr78ab82d5-25be-4f7d-82b3-7ad64e5b85b2The default voice ID to use for text-to-speech and localization.
enable_text_to_speechboolTrueEnable text-to-speech functionality.
enable_list_voicesboolTrueEnable listing available voices functionality.
enable_localize_voiceboolFalseEnable voice localization functionality.

Toolkit Functions

FunctionDescription
list_voicesList available voices from Cartesia.
text_to_speechConverts text to speech.
localize_voiceCreate a new localized voice.

Developer Resources