The original FP16 (16-bit float) model is ~1.5 GB. After GGML quantization, ggml-medium.bin shrinks to ~500–700 MB . This is the "medium" sweet spot—small enough to run on a Raspberry Pi 4 or an old laptop, but accurate enough for professional-grade transcription.
Using the ggml-medium.bin model is surprisingly straightforward, thanks to the robust tooling available on the ggml-org/whisper.cpp GitHub Repository . 1. Obtaining the File ggml-medium.bin
: The GGML format is optimized for "inference" (running the model), allowing it to transcribe audio in near real-time on modern laptops. Common Use Cases The original FP16 (16-bit float) model is ~1
The most common way to utilize this file is through , the C++ port of Whisper. Using the ggml-medium
High; it is often considered the "sweet spot" for professional-grade transcription, offering a significant jump in quality over the "base" and "small" models while being faster than the "large" model. Variants: ggml-medium.bin : Multilingual support (99 languages).
Non-English translations · ggml-org whisper.cpp · Discussion #526
If your transcriptions are running slowly, use these configuration adjustments: