In the domain of music production and audio processing, the implementation of automatic pitch correction of the singing voice, also known as Auto-Tune, has significantly transformed the landscape of vocal performance. While autotuning technology has offered musicians the ability to tune their vocal pitches and achieve a desired level of precision, its use has also sparked debates regarding its impact on authenticity and artistic integrity. As a result, detecting and analyzing AutoTuned vocals in music recordings has become valuable for music scholars, producers, and listeners. However, to the best of our knowledge, no prior effort has been made in this direction. This study introduces a data-driven approach leveraging triplet networks for the detection of Auto-Tuned songs, backed by the creation of a dataset composed of original and Auto-Tuned audio clips. The experimental results demonstrate the superiority of the proposed method in terms of both accuracy and robustness when compared to two baseline models: Rawnet2, an end-to-end model proposed for anti-spoofing and widely used for other audio forensic tasks, and a Graph Attention Transformer-based approach specifically designed for singing vocal deepfake detection.

Spectrogram-Based Detection of Auto-Tuned Vocals in Music Recordings

Gohari, Mahyar
Investigation
;
Benini, Sergio
Membro del Collaboration Group
;
Adami, Nicola
Supervision
2024-01-01

Abstract

In the domain of music production and audio processing, the implementation of automatic pitch correction of the singing voice, also known as Auto-Tune, has significantly transformed the landscape of vocal performance. While autotuning technology has offered musicians the ability to tune their vocal pitches and achieve a desired level of precision, its use has also sparked debates regarding its impact on authenticity and artistic integrity. As a result, detecting and analyzing AutoTuned vocals in music recordings has become valuable for music scholars, producers, and listeners. However, to the best of our knowledge, no prior effort has been made in this direction. This study introduces a data-driven approach leveraging triplet networks for the detection of Auto-Tuned songs, backed by the creation of a dataset composed of original and Auto-Tuned audio clips. The experimental results demonstrate the superiority of the proposed method in terms of both accuracy and robustness when compared to two baseline models: Rawnet2, an end-to-end model proposed for anti-spoofing and widely used for other audio forensic tasks, and a Graph Attention Transformer-based approach specifically designed for singing vocal deepfake detection.
2024
979-8-3503-6442-2
File in questo prodotto:
File Dimensione Formato  
Spectrogram-Based_Detection_of_Auto-Tuned_Vocals_in_Music_Recordings.pdf

solo utenti autorizzati

Licenza: Copyright dell'editore
Dimensione 1.47 MB
Formato Adobe PDF
1.47 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/620646
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact