We seek to find likely musical influences on the output of a generative model for music audio (VampNet). We do this by using CLMR to embed the audio output, and also the songs in the model’s training data, into a good representation for similarity measurement. Using this embedding space, we measure cosine similarity between the model’s output and songs in the dataset, returning the K closest neighbors.
This site contains examples of this approach. All examples are randomly chosen, not cherrypicked. On this main page, we walk through a single illustrative example.
NOTE: To explore beyond the main example that follows, click on one of these hyperlinks.
More VampNet Generations
More Songs from the Training Data
Comparing Embeddings: VampNet vs CLMR
VampNet generates audio (a.k.a. a ‘vamp’) in response to a prompt consisting of example audio. We randomly selected a track from our training data: Bossa Nova Party, a jazz song from 2018, and prompted VampNet with this song.
1_Bossa Nova Party.mp3
Bossa Nova Party
Bossa Nova Jazz
Coffee & Jazz
2_vamp_Bossa Nova Party.wav
For some context, the average cosine similarity of this prompt song and a random subset of 1000 songs in the training data is 0.169, and the average cosine similarity of the VampNet generated song to the same subset is 0.165. So the generation and the prompt are fairly similar, in the context of randomly-selected training data, but not as high as the most-similar training data, as we’ll see below.
Now we’ll show you the top 5 tracks in our training data for both (1) the prompt song and (2) the generated vamp! Notice the distinctly different lists for the vamp and the prompt track.