Skip to content

Commit 4c9da56

Browse files
committed
Working model and tensorflow
0 parents  commit 4c9da56

19 files changed

+3007
-0
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.idea/
2+
venv/
3+
bazel-0.19.2-installer-linux-x86_64.sh

README.md

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Music Genre Classification with Deep Learning
2+
3+
## Abstract
4+
5+
In this project we adapt the model from [Choi et al.](https://github.com/keunwoochoi/music-auto_tagging-keras) to train a custom music genre classification system with our own genres and data. The model takes as an input the spectogram of music frames and analyzes the image using a Convolutional Neural Network (CNN) plus a Recurrent Neural Network (RNN). The output of the system is a vector of predicted genres for the song.
6+
7+
We fine-tuned their model with a small dataset (30 songs per genre) and test it on the GTZAN dataset providing a final accuracy of 80%.
8+
9+
## Slides and Report
10+
11+
- [Slides](https://github.com/jsalbert/music-genre-classification/blob/master/Slides.pdf)
12+
13+
- [Report](https://github.com/jsalbert/music-genre-classification/blob/master/Music_genre_recognition.pdf)
14+
15+
## Code
16+
17+
In this repository we provide the scripts to fine-tune the pre-trained model and a quick music genre prediction algorithm using our own weights.
18+
19+
Currently the genres supported are the [GTZAN dataset](http://marsyasweb.appspot.com/download/data_sets/) tags:
20+
21+
- Blues
22+
- Classical
23+
- Country
24+
- Disco
25+
- HipHop
26+
- Jazz
27+
- Metal
28+
- Pop
29+
- Reggae
30+
- Rock
31+
32+
### Prerequisites
33+
34+
We have used Keras running over Theano to perform the experiments. Was done previous to Keras 2.0, not sure if it will work with the new version. It should work on CPU and GPU.
35+
- Have [pip](https://pip.pypa.io/en/stable/installing/)
36+
- Suggested install: [virtualenv](https://virtualenv.pypa.io/en/stable/)
37+
38+
Python packages necessary specified in *requirements.txt* run:
39+
```
40+
# Create environment
41+
virtualenv env_song
42+
# Activate environment
43+
source env_song/bin/activate
44+
# Install dependencies
45+
pip install -r requirements.txt
46+
47+
```
48+
49+
### Example Code
50+
51+
Fill the folder music with songs. Fill the example list with the song names.
52+
```
53+
python quick_test.py
54+
55+
```
56+
57+
## Results
58+
59+
### Sea of Dreams - Obenhofer
60+
[![Sea of Dreams - Oberhofer](https://github.com/jsalbert/Music-Genre-Classification-with-Deep-Learning/blob/master/figs/sea.png?raw=true)](https://www.youtube.com/watch?v=mIDWsTwstgs)
61+
![fig_sea](https://github.com/jsalbert/Music-Genre-Classification-with-Deep-Learning/blob/master/figs/seaofdreams.png?raw=true)
62+
![Results](https://github.com/jsalbert/Music-Genre-Classification-with-Deep-Learning/blob/master/figs/output.png?raw=true)
63+
64+
### Sky Full of Stars - Coldplay
65+
[![Sky Full of Stars- Coldplay](https://github.com/jsalbert/Music-Genre-Classification-with-Deep-Learning/blob/master/figs/sky.png?raw=true)](https://www.youtube.com/watch?v=zp7NtW_hKJI)
66+
![fig_sky](https://github.com/jsalbert/Music-Genre-Classification-with-Deep-Learning/blob/master/figs/skyfullofstars.png?raw=true)
67+
68+

Slides.pdf

1.57 MB
Binary file not shown.

audio_processor.py

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
import librosa
2+
import numpy as np
3+
from math import floor
4+
5+
def compute_melgram(audio_path):
6+
''' Compute a mel-spectrogram and returns it in a shape of (1,1,96,1366), where
7+
96 == #mel-bins and 1366 == #time frame
8+
9+
parameters
10+
----------
11+
audio_path: path for the audio file.
12+
Any format supported by audioread will work.
13+
More info: http://librosa.github.io/librosa/generated/librosa.core.load.html#librosa.core.load
14+
15+
'''
16+
17+
# mel-spectrogram parameters
18+
SR = 12000
19+
N_FFT = 512
20+
N_MELS = 96
21+
HOP_LEN = 256
22+
DURA = 29.12 # to make it 1366 frame..
23+
24+
src, sr = librosa.load(audio_path, sr=SR) # whole signal
25+
n_sample = src.shape[0]
26+
n_sample_fit = int(DURA*SR)
27+
28+
if n_sample < n_sample_fit: # if too short
29+
src = np.hstack((src, np.zeros((int(DURA*SR) - n_sample,))))
30+
elif n_sample > n_sample_fit: # if too long
31+
src = src[(n_sample-n_sample_fit)/2:(n_sample+n_sample_fit)/2]
32+
logam = librosa.logamplitude
33+
melgram = librosa.feature.melspectrogram
34+
ret = logam(melgram(y=src, sr=SR, hop_length=HOP_LEN,
35+
n_fft=N_FFT, n_mels=N_MELS)**2,
36+
ref_power=1.0)
37+
ret = ret[np.newaxis, np.newaxis, :]
38+
return ret
39+
40+
41+
def compute_melgram_multiframe(audio_path, all_song=True):
42+
''' Compute a mel-spectrogram in multiple frames of the song and returns it in a shape of (N,1,96,1366), where
43+
96 == #mel-bins, 1366 == #time frame, and N=#frames
44+
45+
parameters
46+
----------
47+
audio_path: path for the audio file.
48+
Any format supported by audioread will work.
49+
More info: http://librosa.github.io/librosa/generated/librosa.core.load.html#librosa.core.load
50+
51+
'''
52+
53+
# mel-spectrogram parameters
54+
SR = 12000
55+
N_FFT = 512
56+
N_MELS = 96
57+
HOP_LEN = 256
58+
DURA = 29.12 # to make it 1366 frame..
59+
if all_song:
60+
DURA_TRASH = 0
61+
else:
62+
DURA_TRASH = 20
63+
64+
src, sr = librosa.load(audio_path, sr=SR) # whole signal
65+
n_sample = src.shape[0]
66+
n_sample_fit = int(DURA*SR)
67+
n_sample_trash = int(DURA_TRASH*SR)
68+
69+
#remove the trash at the beginning and at the end
70+
src = src[n_sample_trash:(n_sample-n_sample_trash)]
71+
n_sample=n_sample-2*n_sample_trash
72+
73+
74+
#print n_sample
75+
#print n_sample_fit
76+
77+
ret = np.zeros((0, 1, 96, 1366), dtype=np.float32)
78+
79+
if n_sample < n_sample_fit: # if too short
80+
src = np.hstack((src, np.zeros((int(DURA*SR) - n_sample,))))
81+
logam = librosa.logamplitude
82+
melgram = librosa.feature.melspectrogram
83+
ret = logam(melgram(y=src, sr=SR, hop_length=HOP_LEN,
84+
n_fft=N_FFT, n_mels=N_MELS)**2,
85+
ref_power=1.0)
86+
ret = ret[np.newaxis, np.newaxis, :]
87+
88+
elif n_sample > n_sample_fit: # if too long
89+
N=int(floor(n_sample/n_sample_fit))
90+
91+
src_total=src
92+
93+
for i in range(0,N):
94+
src = src_total[(i*n_sample_fit):(i+1)*(n_sample_fit)]
95+
96+
logam = librosa.logamplitude
97+
melgram = librosa.feature.melspectrogram
98+
retI = logam(melgram(y=src, sr=SR, hop_length=HOP_LEN,
99+
n_fft=N_FFT, n_mels=N_MELS)**2,
100+
ref_power=1.0)
101+
retI = retI[np.newaxis, np.newaxis, :]
102+
103+
#print retI.shape
104+
105+
ret = np.concatenate((ret, retI), axis=0)
106+
107+
return ret

list_example.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
music/example.mp3

0 commit comments

Comments
 (0)