Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper, we propose a semantic indexing algorithm for soccer programs which uses both audio and visual information for content characterization. The video signal is processed first by extracting low-level visual descriptors from the MPEG compressed bit-stream. The temporal evolution of these descriptors during a semantic event is supposed to be governed by a controlled Markov chain. This allows to determine a list of those video segments where a semantic event of interest is likely to be found, based on the maximum likelihood criterion. The audio information is then used to refine the results of the video classification procedure by ranking the candidate video segments in the list so that the segments associated to the event of interest appear in the very first positions of the ordered list. The proposed method is applied to goal detection. Experimental results show the effectiveness of the proposed cross-modal approach.

Semantic Indexing of Sport Program Sequences by Audio-Visual Analysis

LEONARDI, Riccardo;MIGLIORATI, Pierangelo;PRANDINI, Maria
2003-01-01

Abstract

Semantic indexing of sports videos is a subject of great interest to researchers working on multimedia content characterization. Sports programs appeal to large audiences and their efficient distribution over various networks should contribute to widespread usage of multimedia services. In this paper, we propose a semantic indexing algorithm for soccer programs which uses both audio and visual information for content characterization. The video signal is processed first by extracting low-level visual descriptors from the MPEG compressed bit-stream. The temporal evolution of these descriptors during a semantic event is supposed to be governed by a controlled Markov chain. This allows to determine a list of those video segments where a semantic event of interest is likely to be found, based on the maximum likelihood criterion. The audio information is then used to refine the results of the video classification procedure by ranking the candidate video segments in the list so that the segments associated to the event of interest appear in the very first positions of the ordered list. The proposed method is applied to goal detection. Experimental results show the effectiveness of the proposed cross-modal approach.
2003
0780377516
File in questo prodotto:
File Dimensione Formato  
Abs-icip03.pdf

accesso aperto

Tipologia: Abstract
Licenza: PUBBLICO - Creative Commons 3.6
Dimensione 25.66 kB
Formato Adobe PDF
25.66 kB Adobe PDF Visualizza/Apri
LMP_ICIP-2003_post-print.pdf

accesso aperto

Descrizione: LMP_ICIP-2003_post-print
Tipologia: Documento in Post-print
Licenza: PUBBLICO - Creative Commons 3.6
Dimensione 191.59 kB
Formato Adobe PDF
191.59 kB Adobe PDF Visualizza/Apri
LMP_ICIP-2003_post-print.pdf

accesso aperto

Descrizione: LMP_ICIP-2003_post-print
Tipologia: Documento in Post-print
Licenza: PUBBLICO - Creative Commons 3.6
Dimensione 191.59 kB
Formato Adobe PDF
191.59 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/10242
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 2
social impact