Category: File Formats
Definition
GGML (GPT-Generated Model Language) is the predecessor to GGUF, designed for efficient CPU inference of large language models through aggressive quantization.
How It Works
GGML quantizes model weights to 4-bit or 8-bit integers, dramatically reducing memory usage. It uses optimized matrix multiplication routines for CPU architectures.
The format focuses on inference speed rather than training, stripping unnecessary information to minimize file size.
Why It Matters
GGML pioneered running billion-parameter models on consumer CPUs, making AI accessible without specialized hardware. It inspired a wave of local AI applications.
Though superseded by GGUF, many tools still support GGML for backward compatibility.
← Back to File Formats | All Terms