Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video-coding bitstream by jointly optimizing encoding and indexing performance. This approach provides a new type of bitstream where part of the information is used for both content encoding and content description, allowing the so called "Midstream Content Access". To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video Group Of Pictures (GOP) are encoded at a first draft level by using a suitable visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key-pictures of a video shot, where boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided.

Embedded indexing in scalable video coding

ADAMI, Nicola
Methodology
;
BOSCHETTI, Alberto
Software
;
LEONARDI, Riccardo
Conceptualization
;
MIGLIORATI, Pierangelo
Membro del Collaboration Group
2010-01-01

Abstract

Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video-coding bitstream by jointly optimizing encoding and indexing performance. This approach provides a new type of bitstream where part of the information is used for both content encoding and content description, allowing the so called "Midstream Content Access". To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video Group Of Pictures (GOP) are encoded at a first draft level by using a suitable visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key-pictures of a video shot, where boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided.
File in questo prodotto:
File Dimensione Formato  
ablm-embedded-index-2010.pdf

solo utenti autorizzati

Descrizione: ABLM_MTAP-2009_full-text
Tipologia: Full Text
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 557.59 kB
Formato Adobe PDF
557.59 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
ABLM-MTAP-2009+pre-print.pdf

accesso aperto

Descrizione: ABLM_MTAP-2010_pre-print
Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 413.12 kB
Formato Adobe PDF
413.12 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/18198
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 2
social impact