An audio segmentation method and system which automatically segments an audio sequence into audio scenes of similar semantic content is described. The method and system initially splits the audio sequence into segments of arbitrary length (step 101). Next, each segment is subject to short term spectral analysis (step 102) to generate feature vectors characterising the audio. A vector quantisation (VQ) technique is used to generate a signature codeboook using the feature vectors of the audio segments (step 103). An Earth Mover's Distance (EMD) measure is then used to calculate distances between consecutive audio segments (step 104). By statistically analysing the respective (EMD) measures to identify peaks therein, changes in the dominant audio content can be detected indicative of audio scene changes (step 105). In this way, it is possible to automate the timeconsuming and laborious process of organising and indexing increasingly large audio databases such that they can be easily browsed and searched using natural query structures.
Method and system for semantically segmenting an audio sequence
BENINI, Sergio
2005-01-01
Abstract
An audio segmentation method and system which automatically segments an audio sequence into audio scenes of similar semantic content is described. The method and system initially splits the audio sequence into segments of arbitrary length (step 101). Next, each segment is subject to short term spectral analysis (step 102) to generate feature vectors characterising the audio. A vector quantisation (VQ) technique is used to generate a signature codeboook using the feature vectors of the audio segments (step 103). An Earth Mover's Distance (EMD) measure is then used to calculate distances between consecutive audio segments (step 104). By statistically analysing the respective (EMD) measures to identify peaks therein, changes in the dominant audio content can be detected indicative of audio scene changes (step 105). In this way, it is possible to automate the timeconsuming and laborious process of organising and indexing increasingly large audio databases such that they can be easily browsed and searched using natural query structures.File | Dimensione | Formato | |
---|---|---|---|
Patent WO2005093712A1 - Method and system for semantically segmenting an audio sequence - Google Patents.pdf
solo utenti autorizzati
Licenza:
Creative commons
Dimensione
620.06 kB
Formato
Adobe PDF
|
620.06 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.