Exploring musical roots: an audio walkthrough.

We seek to find likely musical influences on the output of a generative model for music audio (VampNet). We do this by using CLMR and CLAP to embed the audio output into a good representation for similarity measurement. Using this embedding space, we measure cosine similarity between the model’s output and songs in the dataset, returning the K closest neighbors.

This site contains examples of this approach. All examples are randomly chosen, not cherrypicked. On this main page, we walk through a single illustrative example using CLMR embeddings.

NOTE: To explore beyond the main example that follows, click on one of these hyperlinks.

Example Song Perturbations

Subjective Listening Evaluation Examples

More VampNet Generations

An illustrative Example: Bossa Nova Party

VampNet generates audio (a.k.a. a ‘vamp’) in response to a prompt consisting of example audio. We randomly selected a track from our training data: Bossa Nova Party, a jazz song from 2018, and prompted VampNet with this song.

Prompt Track: Bossa Nova Party

1_Bossa Nova Party.mp3

🎵 Song:

Bossa Nova Party

♯♭♮ Genre:

Bossa Nova Jazz

💿 Album:

Coffee & Jazz

🗓️ Year:

2018

Track Generated by VampNet:

2_vamp_Bossa Nova Party.wav

Cosine similarity of the generated vamp and its prompt: 0.748

For some context, the average cosine similarity of this prompt song and a random subset of 1000 songs in the training data is 0.169, and the average cosine similarity of the VampNet generated song to the same subset is 0.165. So the generation and the prompt are fairly similar, in the context of randomly-selected training data, but not as high as the most-similar training data, as we’ll see below.