Subjective Listening Evaluation Examples

Below are a couple example songs and their corresponding bins that we evaluated on our subjective listening experiment for both CLAP and CLMR embeddings.

<aside> 🎵 Subjective Listening Evaluation Examples

</aside>

To explore beyond these VampNet generations, click on one of these hyperlinks:

Exploring musical roots: an audio walkthrough.

Our approach is to automatically measure similarity between the model's output and its training data at the time of generation. Presumably, output that is highly similar to a training audio clip was influenced by that clip. Of course, similarity is in the ear of the listener and many similarity measures may be created that do not align with human opinions. Therefore, we conduct an experiment with human listeners to demonstrate the alignment of our quantitative technique with human listening.

We performed ABX pairwise trials of songs to determine how human judgement aligned with the objective quantitative measure of similarity (cosine similarity). We include below some examples that listeners were shown in pairwise format (they were show the target song “X” and two randomly selected songs from the four bins, as shown in the below screenshot). More information is available in the paper.

Example subjective listening question:

The anchor clip “X” would be the first 3-second clip, and two songs from the other bins would be randomly selected for the first and second clip. We asked listeners to choose which sounded more similar to the top clip “X” 10 times throughout their evaluation.

Clips are selected from different bins as determined by distribution of similarity scores in the training data set of 5,000,000+ clips we used. We chose a random subset of 1,000 clips to obtain this distribution, then chose 15 songs and their respectively similar (within the four bins) songs randomly from that set. Below are some example songs and bins we used in the listening evaluation.