MLX Transcribe
MLX Transcribe is a tool for transcribing audio files using MLX Whisper.
Prerequisites
-
Install ffmpeg
- macOS:
brew install ffmpeg - Ubuntu:
sudo apt-get install ffmpeg - Windows: Download from https://ffmpeg.org/download.html
- macOS:
-
Install mlx-whisper library
1uv pip install mlx-whisper -
Prepare audio files
- Create a 'storage/audio' directory
- Place your audio files in this directory
- Supported formats: mp3, mp4, wav, etc.
-
Download sample audio (optional)
- Visit the audio-samples (as an example) and save the audio file to the
storage/audiodirectory.
- Visit the audio-samples (as an example) and save the audio file to the
Example
The following agent will use MLX Transcribe to transcribe audio files.
1from pathlib import Path2from kern.agent import Agent3from kern.models.openai import OpenAIResponses4from kern.tools.mlx_transcribe import MLXTranscribeTools56# Get audio files from storage/audio directory7agno_root_dir = Path(__file__).parent.parent.parent.resolve()8audio_storage_dir = agno_root_dir.joinpath("storage/audio")9if not audio_storage_dir.exists():10 audio_storage_dir.mkdir(exist_ok=True, parents=True)1112agent = Agent(13 name="Transcription Agent",14 model=OpenAIResponses(id="gpt-5.2"),15 tools=[MLXTranscribeTools(base_dir=audio_storage_dir)],16 instructions=[17 "To transcribe an audio file, use the `transcribe` tool with the name of the audio file as the argument.",18 "You can find all available audio files using the `read_files` tool.",19 ],20 markdown=True,21)2223agent.print_response("Summarize the reid hoffman ted talk, split into sections", stream=True)Toolkit Params
| Parameter | Type | Default | Description |
|---|---|---|---|
base_dir | Path | Path.cwd() | Base directory for audio files |
enable_read_files_in_base_dir | bool | True | Whether to register the read_files function |
path_or_hf_repo | str | "mlx-community/whisper-large-v3-turbo" | Path or HuggingFace repo for the model |
verbose | bool | None | Enable verbose output |
temperature | float or Tuple[float, ...] | None | Temperature for sampling |
compression_ratio_threshold | float | None | Compression ratio threshold |
logprob_threshold | float | None | Log probability threshold |
no_speech_threshold | float | None | No speech threshold |
condition_on_previous_text | bool | None | Whether to condition on previous text |
initial_prompt | str | None | Initial prompt for transcription |
word_timestamps | bool | None | Enable word-level timestamps |
prepend_punctuations | str | None | Punctuations to prepend |
append_punctuations | str | None | Punctuations to append |
clip_timestamps | str or List[float] | None | Clip timestamps |
hallucination_silence_threshold | float | None | Hallucination silence threshold |
decode_options | dict | None | Additional decoding options |
Toolkit Functions
| Function | Description |
|---|---|
transcribe | Transcribes an audio file using MLX Whisper |
read_files | Lists all audio files in the base directory |
Developer Resources
- View Tools