ORPTQ: An Improved Large Model Quantization Method Based on Optimal Quantization Range

Shicen Tian; Kejie Huang

doi:10.5755/j01.itc.54.2.40573

Authors

Shicen Tian College of Information Science and Electronic Engineering, ZheJiang University, 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, P. R. China
Kejie Huang College of Information Science and Electronic Engineering, ZheJiang University, 38 Zheda Road, Hangzhou, 310027, Zhejiang Province, P. R. China

DOI:

https://doi.org/10.5755/j01.itc.54.2.40573

Keywords:

Large Model Quantization, Optimal Quantization Range, Transformer, GPTQ, ORPTQ

Abstract

Quantization reduces model storage by representing model in low bits. It can help to improve the application capability of transformer-based large models and make them possible to be deployed on resource-limited systems such as PCs and mobile devices. The best weight-only quantization method currently is to use second-order information to fine-tune the weight step by step during the quantization process, compensating for the quantization errors that have occurred. The method can minimize the functional loss of weight due to quantization by adjusting the remaining elements through algebraic transformations in each step. However, the performance of this quantization method will deteriorate rapidly when the adjustment for weight deviates too far from the starting point, especially in low-bit quantization (e.g. 4 bits or fewer). To meet the mathematical prerequisite of this method in the quantization, this paper introduces two parameters α, β to adjust the quantization range based on the second-order method, and presents three approaches to seek their optimal values. The experimental results show that the performance of the proposed method significantly outperforms the original second-order method in low-bit quantization. The code of this paper is available on github.com/t-scen/ORPTQ.

ORPTQ: An Improved Large Model Quantization Method Based on Optimal Quantization Range

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Information