MLX Transcribe

MLX Transcribe is a tool for transcribing audio files using MLX Whisper.

Prerequisites

  1. Install ffmpeg

  2. Install mlx-whisper library

    1uv pip install mlx-whisper
  3. Prepare audio files

    • Create a 'storage/audio' directory
    • Place your audio files in this directory
    • Supported formats: mp3, mp4, wav, etc.
  4. Download sample audio (optional)

    • Visit the audio-samples (as an example) and save the audio file to the storage/audio directory.

Example

The following agent will use MLX Transcribe to transcribe audio files.

1from pathlib import Path
2from kern.agent import Agent
3from kern.models.openai import OpenAIResponses
4from kern.tools.mlx_transcribe import MLXTranscribeTools
5
6# Get audio files from storage/audio directory
7agno_root_dir = Path(__file__).parent.parent.parent.resolve()
8audio_storage_dir = agno_root_dir.joinpath("storage/audio")
9if not audio_storage_dir.exists():
10 audio_storage_dir.mkdir(exist_ok=True, parents=True)
11
12agent = Agent(
13 name="Transcription Agent",
14 model=OpenAIResponses(id="gpt-5.2"),
15 tools=[MLXTranscribeTools(base_dir=audio_storage_dir)],
16 instructions=[
17 "To transcribe an audio file, use the `transcribe` tool with the name of the audio file as the argument.",
18 "You can find all available audio files using the `read_files` tool.",
19 ],
20 markdown=True,
21)
22
23agent.print_response("Summarize the reid hoffman ted talk, split into sections", stream=True)

Toolkit Params

ParameterTypeDefaultDescription
base_dirPathPath.cwd()Base directory for audio files
enable_read_files_in_base_dirboolTrueWhether to register the read_files function
path_or_hf_repostr"mlx-community/whisper-large-v3-turbo"Path or HuggingFace repo for the model
verboseboolNoneEnable verbose output
temperaturefloat or Tuple[float, ...]NoneTemperature for sampling
compression_ratio_thresholdfloatNoneCompression ratio threshold
logprob_thresholdfloatNoneLog probability threshold
no_speech_thresholdfloatNoneNo speech threshold
condition_on_previous_textboolNoneWhether to condition on previous text
initial_promptstrNoneInitial prompt for transcription
word_timestampsboolNoneEnable word-level timestamps
prepend_punctuationsstrNonePunctuations to prepend
append_punctuationsstrNonePunctuations to append
clip_timestampsstr or List[float]NoneClip timestamps
hallucination_silence_thresholdfloatNoneHallucination silence threshold
decode_optionsdictNoneAdditional decoding options

Toolkit Functions

FunctionDescription
transcribeTranscribes an audio file using MLX Whisper
read_filesLists all audio files in the base directory

Developer Resources