Jazz Music Improv with LSTM

Jazz Music Improv with LSTM

Background:

Leveraging Long Short-Term Memory (LSTM) to generate novel jazz solos.

The dataset is a corpus of jazz music with the generation using 78 values. In this case, values can be thought as musical notes.

Building and training the model:

This model was propositioned to learn musical patterns, so the LSTM was set with 64 dimensional hidden states. Then input and output layers were created for the network.

Next was to create a djmodel() function which:

  1. Created empty output list that saves LSTM output cell at every step.
  2. Looped through the LSTM layer using the for-loop. which included;

    a. Creating a custom lambda layer.

    b. Reshaping the x variable using the reshapor layer.

    c. Performing one step of the LSTM_cell

    d. Applying densor to the hidden state output of the LSTM_cell.

     Densor propagates the LSTM output
     activation value through a combination of dense and softmax.
    

    e. Appending the predicted value to list of outputs.

  3. Created model instance.

I defined and compiled the model using Adam optimizer and categorical cross-entropy loss. Then fitted and trained the model using 100 epochs.

Screen Shot 2021-02-27 at 8.44.56 PM.png

Predicting and sampling:

Using the trained pattern, I synthesized new music by creating a function that samples at each step that will input the activation and cell state from previous LSTM and forward propagate by one step, and get a new output and cell state. This new activation can then be used to generate the output using densor.

Next I defined my inference model which was hard coded to generate 50 values, then created the zero-valued vectors used to initialize x and the LSTM state variables.

Then I implemented the predict_and_sample function which took the inputs as part of its argument. In order to predict the output corresponding to this input, I:

  1. Used your inference model to predict an output given by my set of inputs which had the output variable 'pred' to be a list of length 20 where each element is a numpy array.
  2. Converted 'pred' into a numpy array of indices with each corresponding index computed by taking the argmax of an element of the 'pred' list.
  3. Converted the indices into their one-hot vector representations.

Screen Shot 2021-02-27 at 8.44.08 PM.png

Finally, I generated my music using the variable out_stream which calls the generate music function, and saves it to my out_stream.

Screen Shot 2021-02-27 at 8.43.30 PM.png

The generated music was a midi file, I had to use an online file converter to convert it to an mp3 file. This file is named my-music.mp3 and can be found in the output folder.

Conclusion:

This code is best run on google colab, and I used Tensorflow version 1.x.x and Keras version 2.x.x because of some dependency issues. This was a project done in one of the courses I took in my journey into Data Science and Machine Learning. The folder that houses all the requirements can be found in my Github. I can be contacted through LinkedIn or Twitter . Thank you for reading.