Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper, we propose a semantic indexing algorithm for soccer programs which uses both audio and visual information for content characterization. The video signal is processed first by extracting low-level visual descriptors from the MPEG compressed bit-stream. The temporal evolution of these descriptors during a semantic event is supposed to be governed by a controlled Markov chain. This allows to determine a list of those video segments where a semantic event of interest is likely to be found, based on the maximum likelihood criterion. The audio information is then used to refine the results of the video classification procedure by ranking the candidate video segments in the list so that the segments associated to the event of interest appear in the very first positions of the ordered list. The proposed method is applied to goal detection. Experimental results show the effectiveness of the proposed cross-modal approach.
Semantic Indexing of Sport Program Sequences by Audio-Visual Analysis
LEONARDI, Riccardo;MIGLIORATI, Pierangelo;PRANDINI, Maria
2003-01-01
Abstract
Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper, we propose a semantic indexing algorithm for soccer programs which uses both audio and visual information for content characterization. The video signal is processed first by extracting low-level visual descriptors from the MPEG compressed bit-stream. The temporal evolution of these descriptors during a semantic event is supposed to be governed by a controlled Markov chain. This allows to determine a list of those video segments where a semantic event of interest is likely to be found, based on the maximum likelihood criterion. The audio information is then used to refine the results of the video classification procedure by ranking the candidate video segments in the list so that the segments associated to the event of interest appear in the very first positions of the ordered list. The proposed method is applied to goal detection. Experimental results show the effectiveness of the proposed cross-modal approach.File | Dimensione | Formato | |
---|---|---|---|
Abs-icip03.pdf
accesso aperto
Tipologia:
Abstract
Licenza:
PUBBLICO - Creative Commons 3.6
Dimensione
25.66 kB
Formato
Adobe PDF
|
25.66 kB | Adobe PDF | Visualizza/Apri |
LMP_ICIP-2003_post-print.pdf
accesso aperto
Descrizione: LMP_ICIP-2003_post-print
Tipologia:
Documento in Post-print
Licenza:
PUBBLICO - Creative Commons 3.6
Dimensione
191.59 kB
Formato
Adobe PDF
|
191.59 kB | Adobe PDF | Visualizza/Apri |
LMP_ICIP-2003_post-print.pdf
accesso aperto
Descrizione: LMP_ICIP-2003_post-print
Tipologia:
Documento in Post-print
Licenza:
PUBBLICO - Creative Commons 3.6
Dimensione
191.59 kB
Formato
Adobe PDF
|
191.59 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.