<aside> 🎵 Example Song Perturbations (Section 5.1 in Paper)

</aside>

We assume that any generative music model will add some degree of variation to a training example during the generation process—the aim of these models are not to replicate the training data exactly. This variation could take many forms such as changing the pitch, speed, melody, etc.. Therefore, in this section we evaluate the ability of our methodology to return target songs that have been modified by given perturbations. For varying amounts of each perturbation, we evaluate how frequently the target song (the unmodified clip) is returned as the most similar, within the top 5 similar songs, and within the top 10 most similar songs. The 7 types of perturbations we evaluate are:

Pitch shift (in semitones; range: -12 to 12)
Time stretch (in \% of song; range: 20\% slower to 20\% faster)
White noise overlaid on top of music (in dB; range: -30 to 30 dB in relation to original audio clip)
Mash-up of two clips from training data (range: 5/95% to 95/5% e.g., 50/50\%, 60/40\%, etc.)
Mash-up of one clip from inside and one outside training data (range: 5/95\% to 95/5\%; e.g., 50/50\%, 60/40\%, etc.)
Mash-up of a prompt clip and the generated vamp (range: 5/95\% to 95/5\%; e.g., 50/50\%, 60/40\%, etc.)

You can click any of these perturbations above to quickly jump to below examples.

We selected these because we envision them as common alterations to music that would not render it unrecognizable by a human listener. We are not seeking to evaluate all types of adversarial noise since we are assuming users and creators are working cooperatively with these generative models to create something novel---not acting maliciously.

Pitch Shift

A common perturbation to audio that involves raising or lowering the original pitch of an audio clip without adjusting the length of the clip.