Recently, deep learning has enjoyed a great deal of success for computer vision problems due to its capability to model highly complex tasks, such as image classification, object detection, face recognition, among many others. Although these neural networks are nowadays very powerful, there is a huge amount of parameters (i.e. the model) that need to be learned and require considerable storage space and bandwidth during transmission. This paper addresses the problems of storage and transmission of large deep learning models by proposing a compression solution that is independent of the model being trained as well as the data used for training. An efficient compression framework for the parameters of a neural network, more precisely the weights that interconnect the different neurons, which consume a significant amount of resources (memory, storage and bandwidth) is proposed. Several quantization strategies are considered as well as a statistical models for the different layers of a neural network, which are exploited by an arithmetic coding engine. Experimental results show that up to 92% bitrate savings can be obtained with minimal impact in terms of image classification accuracy.
Rate-Accuracy Optimization of Deep Convolutional Neural Network Models
FILINI, ALESSANDROMembro del Collaboration Group
;Leonardi, Riccardo
Methodology
2017-01-01
Abstract
Recently, deep learning has enjoyed a great deal of success for computer vision problems due to its capability to model highly complex tasks, such as image classification, object detection, face recognition, among many others. Although these neural networks are nowadays very powerful, there is a huge amount of parameters (i.e. the model) that need to be learned and require considerable storage space and bandwidth during transmission. This paper addresses the problems of storage and transmission of large deep learning models by proposing a compression solution that is independent of the model being trained as well as the data used for training. An efficient compression framework for the parameters of a neural network, more precisely the weights that interconnect the different neurons, which consume a significant amount of resources (memory, storage and bandwidth) is proposed. Several quantization strategies are considered as well as a statistical models for the different layers of a neural network, which are exploited by an arithmetic coding engine. Experimental results show that up to 92% bitrate savings can be obtained with minimal impact in terms of image classification accuracy.File | Dimensione | Formato | |
---|---|---|---|
ISM2017_final.pdf
solo utenti autorizzati
Descrizione: FAL_ISM-2017_pre-print
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
246.77 kB
Formato
Adobe PDF
|
246.77 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.