Transcribe Audio to Text — Offline, 99 Languages

Drop an audio or video file into MiniMax Converter and get a transcript without sending a single byte to a cloud service. The transcription engine is Whisper, running locally on your CPU or GPU; no API key, no monthly fee, no rate limit, no upload size cap.

Why offline transcription matters

Most online transcription services upload your audio to their servers — fine for a YouTube video, not fine for a confidential meeting, a medical recording, an unreleased interview, or anything covered by NDA or GDPR. With offline Whisper you keep the file on your machine end to end. Same accuracy as the cloud version of the same model, because it is the same model — just running locally.

How to transcribe

Two ways to get to a transcript:

Drop a video file into the app and click Transcribe on the video conversion screen. Pick "Add timestamps" to get a `.srt` subtitle file next to the source video.
Drop an audio file (mp3 / m4a / wav / flac / opus / …) and the app extracts lyrics as plain text, with verse breaks where pauses are longer than two seconds.
On first run MiniMax downloads the matching whisper.cpp build for your hardware (Core ML on Apple Silicon, CUDA on NVIDIA, Vulkan on AMD/Intel, CPU elsewhere). 142 MB base model, cached locally and reused for every subsequent transcription.

Hardware-aware acceleration

The app auto-detects what you have and uses it. Apple Silicon Macs use Core ML — fast and battery-efficient. NVIDIA GPUs use CUDA — usually 5–10× faster than CPU. AMD / Intel discrete GPUs use Vulkan. Plain CPU still works on any machine; a 10-minute file transcribes in a couple of minutes.

Questions and answers

Which languages does it support?

All 99 languages Whisper supports, auto-detected from the audio. Language detection itself happens locally — no separate API call.

Is the transcription as accurate as the OpenAI cloud one?

Yes — it is the same Whisper model, just running on your machine instead of OpenAI's servers.

Does it work on Windows?

Yes. Windows uses the same whisper.cpp build mechanism; CUDA is auto-detected on NVIDIA cards, Vulkan on AMD/Intel, CPU fallback otherwise.

Can I transcribe a long recording — say a two-hour meeting?

Yes. There is no file size or duration cap; the only practical limit is how long you are willing to wait. On a modern Apple Silicon Mac, two hours of audio transcribes in around 10–15 minutes.

Related tools

Get MiniMax Converter

Cross-platform desktop app. Linux free for non-commercial use; Windows & macOS one-time €20 license. No subscription, no telemetry, no account.

Download Buy license €20