Method and system for semantically segmenting an audio sequence