Audio-Visual Processing for Scene Change Detection

Saraceno, C.; Leonardi, Riccardo

doi:10.1007/3-540-63508-4_114

The organization of video data-bases according to semantic content of data, is a key point in multimedia technologies. In fact, this would allow algorithms such as indexing and retrieval to work more efficiently. As an attempt to extract semantic information, efforts have been devoted in segmenting the video in shots and for each shot trying to extract informations such as representative video frame, etc. As a video sequence is constructed from a 2-D projection of a 3-D scene, processing video information only has shown its limitations especially in solving problems such as object identification or object tracking. Further not all information is contained in the video signal and more can be achieved by analyzing the audio signal as well. Information can be obtained from the audio signal either to confirm the results obtained by a video processing unit or to acquire information that cannot be extracted from video (such as presence of music). This paper presents a technique which combines video and audio information for classification and indexing purposes.