Many post-production video documents such as movies, sitcoms and cartoons present well structured story-lines organized in separated audio-visual scenes. Accurate grouping of shots into these logical video segments could lead to semantic indexing of scenes and events for interactive multimedia retrieval. In this paper we introduce a novel shot based analysis approach which aims to cluster together shots with similar audio-visual content. We demonstrate how the use of codebooks of audio and visual codewords (generated by a vector quantization process) results to be an effective method to represent clusters containing shots with similar long-term consistency of chromatic compositions and audio. The output clusters obtained by a simple single-link clustering algorithm, allow the further application of the well-known scene transition graph framework for scene change detection and shot-pattern investigation. In the end the merging of audio and visual results leads to a hierarchical description of the whole video document, useful for multimedia retrieval and summarization purposes.
Audio-Visual VQ Shot Clustering for Video Programs
BENINI, Sergio;LEONARDI, Riccardo
2005-01-01
Abstract
Many post-production video documents such as movies, sitcoms and cartoons present well structured story-lines organized in separated audio-visual scenes. Accurate grouping of shots into these logical video segments could lead to semantic indexing of scenes and events for interactive multimedia retrieval. In this paper we introduce a novel shot based analysis approach which aims to cluster together shots with similar audio-visual content. We demonstrate how the use of codebooks of audio and visual codewords (generated by a vector quantization process) results to be an effective method to represent clusters containing shots with similar long-term consistency of chromatic compositions and audio. The output clusters obtained by a simple single-link clustering algorithm, allow the further application of the well-known scene transition graph framework for scene change detection and shot-pattern investigation. In the end the merging of audio and visual results leads to a hierarchical description of the whole video document, useful for multimedia retrieval and summarization purposes.File | Dimensione | Formato | |
---|---|---|---|
BXL_AVIVDiLib-2005_post-print.pdf
accesso aperto
Descrizione: BXL_AVIVDiLib-2005_post-print
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
447.59 kB
Formato
Adobe PDF
|
447.59 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.