Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video-coding bitstream by jointly optimizing encoding and indexing performance. This approach provides a new type of bitstream where part of the information is used for both content encoding and content description, allowing the so called "Midstream Content Access". To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video Group Of Pictures (GOP) are encoded at a first draft level by using a suitable visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key-pictures of a video shot, where boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided.
Embedded indexing in scalable video coding
ADAMI, Nicola
Methodology
;BOSCHETTI, AlbertoSoftware
;LEONARDI, Riccardo
Conceptualization
;MIGLIORATI, Pierangelo
Membro del Collaboration Group
2010-01-01
Abstract
Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video-coding bitstream by jointly optimizing encoding and indexing performance. This approach provides a new type of bitstream where part of the information is used for both content encoding and content description, allowing the so called "Midstream Content Access". To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video Group Of Pictures (GOP) are encoded at a first draft level by using a suitable visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key-pictures of a video shot, where boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided.File | Dimensione | Formato | |
---|---|---|---|
ablm-embedded-index-2010.pdf
solo utenti autorizzati
Descrizione: ABLM_MTAP-2009_full-text
Tipologia:
Full Text
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
557.59 kB
Formato
Adobe PDF
|
557.59 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
ABLM-MTAP-2009+pre-print.pdf
accesso aperto
Descrizione: ABLM_MTAP-2010_pre-print
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
413.12 kB
Formato
Adobe PDF
|
413.12 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.