Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video coding bit-stream by jointly optimizing encoding and indexing performance. This approach provides a new type of bit-stream where part of the information is used for both content encoding and content description, allowing the so called ”Midstream Content Access”. To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video GOP are encoded at a first draft level by using an optimal visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key pictures of a video shot, which boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided.

Embedded Indexing in Scalable Video Coding

ADAMI, Nicola;BOSCHETTI, Alberto;LEONARDI, Riccardo;MIGLIORATI, Pierangelo
2009-01-01

Abstract

Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video coding bit-stream by jointly optimizing encoding and indexing performance. This approach provides a new type of bit-stream where part of the information is used for both content encoding and content description, allowing the so called ”Midstream Content Access”. To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video GOP are encoded at a first draft level by using an optimal visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key pictures of a video shot, which boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided.
2009
9780769536620
File in questo prodotto:
File Dimensione Formato  
ABLM_CBMI-2009_pre-print.pdf

accesso aperto

Descrizione: ABLM_CBMI-2009_pre-print
Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 1.36 MB
Formato Adobe PDF
1.36 MB Adobe PDF Visualizza/Apri
ABLM_CBMI-2009_full-text.pdf

solo utenti autorizzati

Descrizione: ABLM_CBMI-2009_full-text
Tipologia: Full Text
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.24 MB
Formato Adobe PDF
1.24 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/30002
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
social impact