Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video coding bit-stream by jointly optimizing encoding and indexing performance. This approach provides a new type of bit-stream where part of the information is used for both content encoding and content description, allowing the so called ”Midstream Content Access”. To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video GOP are encoded at a first draft level by using an optimal visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key pictures of a video shot, which boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided.
Embedded Indexing in Scalable Video Coding
ADAMI, Nicola;BOSCHETTI, Alberto;LEONARDI, Riccardo;MIGLIORATI, Pierangelo
2009-01-01
Abstract
Effective encoding and indexing of audiovisual documents are two key aspects for enhancing the multimedia user experience. In this paper we propose the embedding of low-level content descriptors into a scalable video coding bit-stream by jointly optimizing encoding and indexing performance. This approach provides a new type of bit-stream where part of the information is used for both content encoding and content description, allowing the so called ”Midstream Content Access”. To support this concept, a novel technique based on the appropriate combination of Vector Quantization and Scalable Video Coding has been developed and evaluated. More specifically, the key-pictures of each video GOP are encoded at a first draft level by using an optimal visual-codebook, while the residual errors are encoded using a conventional approach. The same visual-codebook is also used to encode all the key pictures of a video shot, which boundaries are dynamically estimated. In this way, the visual-codebook is freely available as an efficient visual descriptor of the considered video shot. Moreover, since a new visual-codebook is introduced every time a new shot is detected, also an implicit temporal segmentation is provided.File | Dimensione | Formato | |
---|---|---|---|
ABLM_CBMI-2009_pre-print.pdf
accesso aperto
Descrizione: ABLM_CBMI-2009_pre-print
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
1.36 MB
Formato
Adobe PDF
|
1.36 MB | Adobe PDF | Visualizza/Apri |
ABLM_CBMI-2009_full-text.pdf
solo utenti autorizzati
Descrizione: ABLM_CBMI-2009_full-text
Tipologia:
Full Text
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.24 MB
Formato
Adobe PDF
|
1.24 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.