product quantization

Product quantization (PQ) is a powerful vector quantization method that addresses the challenge of efficiently storing and searching high-dimensional data. Its core mechanism involves splitting a high-dimensional vector into several lower-dimensional subvectors. Each subvector space then has its own codebook, learned through clustering algorithms like k-means, allowing each subvector to be represented by a compact code (an index into its respective codebook). The original vector is thus represented by a concatenation of these subvector codes. This technique significantly reduces memory footprint and accelerates similarity search operations, as distances can be computed more efficiently using precomputed lookup tables. PQ is particularly vital in approximate nearest neighbor (ANN) search, powering large-scale image retrieval, recommendation systems, and semantic search in vector databases. More recently, it has found application in compressing the KV cache of large language models (LLMs) to enable their deployment on edge devices, transforming memory-bound attention calculations into compute-bound operations.

Core Mechanism of Product Quantization

Subvector Decomposition: Product quantization works by dividing a high-dimensional input vector into 'm' disjoint subvectors. Each subvector operates in a lower-dimensional space, simplifying the quantization task for each part.
Independent Codebook Learning: For each of the 'm' subvector spaces, a separate codebook is learned. This typically involves clustering the subvectors from a training dataset to find 'k' representative centroids, allowing each subvector to be approximated by its closest centroid's index.
Vector Representation and Distance Calculation: An original vector is represented by the concatenation of the indices of its closest centroids from each sub-codebook. Distances between quantized vectors can then be efficiently computed using asymmetric distance computation (ADC) or symmetric distance computation (SDC), often leveraging precomputed lookup tables.

Core Mechanism of Product Quantization

Product Quantization for LLM KV Cache Compression

Advantages and Characteristics of Product Quantization

Sources

At a glance

Executive summary

TL;DR

Key points

Use cases

Also known as

Related topics