An audio segmentation method and system which automatically segments an audio sequence into audio scenes of similar semantic content is described. The method and system initially splits the audio sequence into segments of arbitrary length (step 101). Next, each segment is subject to short term spectral analysis (step 102) to generate feature vectors characterising the audio. A vector quantisation (VQ) technique is used to generate a signature codeboook using the feature vectors of the audio segments (step 103). An Earth Mover's Distance (EMD) measure is then used to calculate distances between consecutive audio segments (step 104). By statistically analysing the respective (EMD) measures to identify peaks therein, changes in the dominant audio content can be detected indicative of audio scene changes (step 105). In this way, it is possible to automate the time­consuming and laborious process of organising and indexing increasingly large audio databases such that they can be easily browsed and searched using natural query structures.

Method and system for semantically segmenting an audio sequence

BENINI, Sergio
2005-01-01

Abstract

An audio segmentation method and system which automatically segments an audio sequence into audio scenes of similar semantic content is described. The method and system initially splits the audio sequence into segments of arbitrary length (step 101). Next, each segment is subject to short term spectral analysis (step 102) to generate feature vectors characterising the audio. A vector quantisation (VQ) technique is used to generate a signature codeboook using the feature vectors of the audio segments (step 103). An Earth Mover's Distance (EMD) measure is then used to calculate distances between consecutive audio segments (step 104). By statistically analysing the respective (EMD) measures to identify peaks therein, changes in the dominant audio content can be detected indicative of audio scene changes (step 105). In this way, it is possible to automate the time­consuming and laborious process of organising and indexing increasingly large audio databases such that they can be easily browsed and searched using natural query structures.
2005
File in questo prodotto:
File Dimensione Formato  
Patent WO2005093712A1 - Method and system for semantically segmenting an audio sequence - Google Patents.pdf

solo utenti autorizzati

Licenza: Creative commons
Dimensione 620.06 kB
Formato Adobe PDF
620.06 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/481752
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact