We seek to find likely musical influences on the output of a generative model for music audio (VampNet). We do this by using CLMR and CLAP to embed the audio output into a good representation for similarity measurement. Using this embedding space, we measure cosine similarity between the model’s output and songs in the dataset, returning the K closest neighbors.
This site contains examples of this approach. All examples are randomly chosen, not cherrypicked. On this main page, we walk through a single illustrative example using CLMR embeddings.
NOTE: To explore beyond the main example that follows, click on one of these hyperlinks.
Subjective Listening Evaluation Examples
VampNet generates audio (a.k.a. a ‘vamp’) in response to a prompt consisting of example audio. We randomly selected a track from our training data: Bossa Nova Party, a jazz song from 2018, and prompted VampNet with this song.
🎵 Song:
Bossa Nova Party
♯♭♮ Genre:
Bossa Nova Jazz
💿 Album:
Coffee & Jazz
🗓️ Year:
2018
For some context, the average cosine similarity of this prompt song and a random subset of 1000 songs in the training data is 0.169, and the average cosine similarity of the VampNet generated song to the same subset is 0.165. So the generation and the prompt are fairly similar, in the context of randomly-selected training data, but not as high as the most-similar training data, as we’ll see below.