Recently, deep learning has enjoyed a great deal of success for computer vision problems due to its capability to model highly complex tasks, such as image classification, object detection, face recognition, among many others. Although these neural networks are nowadays very powerful, there is a huge amount of parameters (i.e. the model) that need to be learned and require considerable storage space and bandwidth during transmission. This paper addresses the problems of storage and transmission of large deep learning models by proposing a compression solution that is independent of the model being trained as well as the data used for training. An efficient compression framework for the parameters of a neural network, more precisely the weights that interconnect the different neurons, which consume a significant amount of resources (memory, storage and bandwidth) is proposed. Several quantization strategies are considered as well as a statistical models for the different layers of a neural network, which are exploited by an arithmetic coding engine. Experimental results show that up to 92% bitrate savings can be obtained with minimal impact in terms of image classification accuracy.

Rate-Accuracy Optimization of Deep Convolutional Neural Network Models

FILINI, ALESSANDRO
Membro del Collaboration Group
;
Leonardi, Riccardo
Methodology
2017-01-01

Abstract

Recently, deep learning has enjoyed a great deal of success for computer vision problems due to its capability to model highly complex tasks, such as image classification, object detection, face recognition, among many others. Although these neural networks are nowadays very powerful, there is a huge amount of parameters (i.e. the model) that need to be learned and require considerable storage space and bandwidth during transmission. This paper addresses the problems of storage and transmission of large deep learning models by proposing a compression solution that is independent of the model being trained as well as the data used for training. An efficient compression framework for the parameters of a neural network, more precisely the weights that interconnect the different neurons, which consume a significant amount of resources (memory, storage and bandwidth) is proposed. Several quantization strategies are considered as well as a statistical models for the different layers of a neural network, which are exploited by an arithmetic coding engine. Experimental results show that up to 92% bitrate savings can be obtained with minimal impact in terms of image classification accuracy.
2017
978-1-5386-2937-6
978-1-5386-2936-9
978-1-5386-2938-3
File in questo prodotto:
File Dimensione Formato  
ISM2017_final.pdf

solo utenti autorizzati

Descrizione: FAL_ISM-2017_pre-print
Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 246.77 kB
Formato Adobe PDF
246.77 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/503979
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 2
social impact