JPEG AI Compressed Domain Face Detection: a Multi-scale Bridging Perspective

Alkhateeb, Ayman; Gnutti, Alessandro; Guerrini, Fabrizio; Leonardi, Riccardo; Ascenso, João; Pereira, Fernando

doi:10.1109/tmm.2025.3609179

Learning-based image coding is showing improved compression efficiency, while also offering a novel advantage in enabling computer vision tasks directly within the compressed domain. The latent representation created by deep learning methods inherently contains all visual features, without a computationally expensive synthesis process at the decoder. This paper is an invited extension of a previous solution for JPEG AI compressed domain face detection that adapts a RetinaFace-based detector to operate directly on the latent tensor. In addition to a former single-scale bridging solution, this work provides a novel multi-scale bridging architecture to enable a more effective multi-scale compressed domain face detection. The results show a significant performance gain, improving accuracy up to 20% for detection of tiny faces on the WIDER FACE dataset compared to single-scale bridging, and further narrowing the gap when compared to detection on uncompressed or JPEG AI decoded images. Furthermore, since the computationally expensive decoding step is bypassed and since the bridges consist of lower-complexity networks, the overall processing cost is significantly reduced. Single and multi-scale bridging, respectively, have about 10% and 32% the complexity of applying pixel domain face detection on decoded images. The proposed architecture is expected to be extended to other multiscale sensitive vision tasks, as JPEG AI is not specifically designed for any single downstream application.