SpinQuant

Summary

Outliers are hard to quantize.
Random rotation matrices and random Hadamard matrices exhibit non-negligible variance in accuracy.

Is it possible to optimize the rotation to maximize the benefit of quantization?

Mergeable rotations $R_{1}, R_{2}$ , i.e. residual and attention block, are learnable.
- Use Cayley SGD on Stiefel manifold to learn $R_{1}, R_{2}$ .
- Loss: model loss, e.g. cross entropy.
- It looks like the whole model shares the same $R_{1}, R_{2}$ ?
Online rotations $R_{3}, R_{4}$ use random Hadamard transform.
GPTQ can be incorporated by the following approach:
1. Use SpinQuant to optimize network where only activation quantization is enabled.
2. After the rotation matrices are determined, use GPTQ to quantize the weights.