MLX Transcribe

MLX Transcribe is a tool for transcribing audio files using MLX Whisper.

Prerequisites

Install ffmpeg
- macOS: brew install ffmpeg
- Ubuntu: sudo apt-get install ffmpeg
- Windows: Download from https://ffmpeg.org/download.html
Install mlx-whisper library
```
1uv pip install mlx-whisper
```
Prepare audio files
- Create a 'storage/audio' directory
- Place your audio files in this directory
- Supported formats: mp3, mp4, wav, etc.
Download sample audio (optional)
- Visit the audio-samples (as an example) and save the audio file to the storage/audio directory.

Example

The following agent will use MLX Transcribe to transcribe audio files.

1from pathlib import Path
2from kern.agent import Agent
3from kern.models.openai import OpenAIResponses
4from kern.tools.mlx_transcribe import MLXTranscribeTools
5
6# Get audio files from storage/audio directory
7agno_root_dir = Path(__file__).parent.parent.parent.resolve()
8audio_storage_dir = agno_root_dir.joinpath("storage/audio")
9if not audio_storage_dir.exists():
10    audio_storage_dir.mkdir(exist_ok=True, parents=True)
11
12agent = Agent(
13    name="Transcription Agent",
14    model=OpenAIResponses(id="gpt-5.2"),
15    tools=[MLXTranscribeTools(base_dir=audio_storage_dir)],
16    instructions=[
17        "To transcribe an audio file, use the `transcribe` tool with the name of the audio file as the argument.",
18        "You can find all available audio files using the `read_files` tool.",
19    ],
20    markdown=True,
21)
22
23agent.print_response("Summarize the reid hoffman ted talk, split into sections", stream=True)

Toolkit Params

Parameter	Type	Default	Description
`base_dir`	`Path`	`Path.cwd()`	Base directory for audio files
`enable_read_files_in_base_dir`	`bool`	`True`	Whether to register the read_files function
`path_or_hf_repo`	`str`	`"mlx-community/whisper-large-v3-turbo"`	Path or HuggingFace repo for the model
`verbose`	`bool`	`None`	Enable verbose output
`temperature`	`float` or `Tuple[float, ...]`	`None`	Temperature for sampling
`compression_ratio_threshold`	`float`	`None`	Compression ratio threshold
`logprob_threshold`	`float`	`None`	Log probability threshold
`no_speech_threshold`	`float`	`None`	No speech threshold
`condition_on_previous_text`	`bool`	`None`	Whether to condition on previous text
`initial_prompt`	`str`	`None`	Initial prompt for transcription
`word_timestamps`	`bool`	`None`	Enable word-level timestamps
`prepend_punctuations`	`str`	`None`	Punctuations to prepend
`append_punctuations`	`str`	`None`	Punctuations to append
`clip_timestamps`	`str` or `List[float]`	`None`	Clip timestamps
`hallucination_silence_threshold`	`float`	`None`	Hallucination silence threshold
`decode_options`	`dict`	`None`	Additional decoding options

Toolkit Functions

Function	Description
`transcribe`	Transcribes an audio file using MLX Whisper
`read_files`	Lists all audio files in the base directory

Developer Resources

View Tools