Learning-based image coding is showing improved compression efficiency, while also offering a novel advantage in enabling computer vision tasks directly within the compressed domain. The latent representation created by deep learning methods inherently contains all visual features, without a computationally expensive synthesis process at the decoder. This paper is an invited extension of a previous solution for JPEG AI compressed domain face detection that adapts a RetinaFace-based detector to operate directly on the latent tensor. In addition to a former single-scale bridging solution, this work provides a novel multi-scale bridging architecture to enable a more effective multi-scale compressed domain face detection. The results show a significant performance gain, improving accuracy up to 20% for detection of tiny faces on the WIDER FACE dataset compared to single-scale bridging, and further narrowing the gap when compared to detection on uncompressed or JPEG AI decoded images. Furthermore, since the computationally expensive decoding step is bypassed and since the bridges consist of lower-complexity networks, the overall processing cost is significantly reduced. Single and multi-scale bridging, respectively, have about 10% and 32% the complexity of applying pixel domain face detection on decoded images. The proposed architecture is expected to be extended to other multiscale sensitive vision tasks, as JPEG AI is not specifically designed for any single downstream application.

JPEG AI Compressed Domain Face Detection: a Multi-scale Bridging Perspective

Alkhateeb, Ayman;Gnutti, Alessandro
;
Guerrini, Fabrizio;Leonardi, Riccardo;
2025-01-01

Abstract

Learning-based image coding is showing improved compression efficiency, while also offering a novel advantage in enabling computer vision tasks directly within the compressed domain. The latent representation created by deep learning methods inherently contains all visual features, without a computationally expensive synthesis process at the decoder. This paper is an invited extension of a previous solution for JPEG AI compressed domain face detection that adapts a RetinaFace-based detector to operate directly on the latent tensor. In addition to a former single-scale bridging solution, this work provides a novel multi-scale bridging architecture to enable a more effective multi-scale compressed domain face detection. The results show a significant performance gain, improving accuracy up to 20% for detection of tiny faces on the WIDER FACE dataset compared to single-scale bridging, and further narrowing the gap when compared to detection on uncompressed or JPEG AI decoded images. Furthermore, since the computationally expensive decoding step is bypassed and since the bridges consist of lower-complexity networks, the overall processing cost is significantly reduced. Single and multi-scale bridging, respectively, have about 10% and 32% the complexity of applying pixel domain face detection on decoded images. The proposed architecture is expected to be extended to other multiscale sensitive vision tasks, as JPEG AI is not specifically designed for any single downstream application.
File in questo prodotto:
File Dimensione Formato  
JPEG_AI_Compressed_Domain_Face_Detection_a_Multi-scale_Bridging_Perspective-compressed.pdf

accesso aperto

Tipologia: Full Text
Licenza: Copyright dell'editore
Dimensione 457.43 kB
Formato Adobe PDF
457.43 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/633161
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact