Skip to content

Commit d4906d3

Browse files
committed
Merge branch 'master' of github.com:source-separation/tutorial
2 parents c294a0f + 9ac83ef commit d4906d3

File tree

8 files changed

+52
-26
lines changed

8 files changed

+52
-26
lines changed

book/approaches/deep/architectures.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ Image used courtesy of Fabian-Robert Stöter (<a href="https://github.com/sigsep
9898
Open-Unmix is a more recent neural network architecture that boasts impressive
9999
performance. Open-Unmix has one fully connected layer with batch norm and a `tanh`
100100
activation, followed a set of three BLSTM layers in the center, and then two
101-
more fully connected layers with batch norma and `ReLU` activations. The pytorch
101+
more fully connected layers with batch norm and `ReLU` activations. The pytorch
102102
implementation has a dropout applied to the first two BLSTM layers with a
103103
zeroing probability of 40%.
104104

book/basics/evaluation.ipynb

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -133,14 +133,14 @@
133133
"inflated.\n",
134134
"\n",
135135
"Scale-Invariant Source-to-Distortion Ratio (SI-SDR) aims to remedy this\n",
136-
"by removing SDR's dependency on the amplidute scaling of the signal.\n",
136+
"by removing SDR's dependency on the amplitude scaling of the signal.\n",
137137
"{cite}`le2019sdr` It also comes with accompanying SI-SAR, and SI-SIR,\n",
138138
"which corresponds to SAR and SIR described above, respectively.\n",
139139
"Although these measures are not sensitive to amplitude scaling, it\n",
140140
"is a quicker computation because it does not require windowing\n",
141-
"the estimatd and ground truth signals like SDR.\n",
141+
"the estimated and ground truth signals like SDR.\n",
142142
"\n",
143-
"In {numref}`sdr_vs_sisdr`, the discrepency between SDR and SI-SDR\n",
143+
"In {numref}`sdr_vs_sisdr`, the discrepancy between SDR and SI-SDR\n",
144144
"scores is shown. The top spectrogram shows the ground truth signal.\n",
145145
"Above it are its scores for SDR, SNR, and SI-SDR. As expected the\n",
146146
"ground truth signal gets high values for SDR, SNR, and SI-SDR\n",
@@ -234,13 +234,6 @@
234234
"and takes a few days to get the results. Calculating SDR values on\n",
235235
"the other hand is virtually free and takes a few hours at most."
236236
]
237-
},
238-
{
239-
"cell_type": "code",
240-
"execution_count": null,
241-
"metadata": {},
242-
"outputs": [],
243-
"source": []
244237
}
245238
],
246239
"metadata": {

book/data/introduction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ that is representative of the type of data you plan to apply your model to once
8181
## Data for source separation is hard to obtain
8282

8383
Due to copyright, it is hard to obtain and share music recordings for machine learning purposes. It is even harder to obtain
84-
multi-track recordings that include the isolated stems, as these are rarely made available by artists. Fortuntaely, the research
84+
multi-track recordings that include the isolated stems, as these are rarely made available by artists. Fortunately, the research
8585
community has nonetheless been able to create and share multi-track datasets, as we shall see late. The size of these datasets
8686
is typically very small compared to other machine learning datasets. Luckily for us, we have tools to generate multiple,
8787
different mixtures from the same set of stems, helping us to maximize what our model can learn from a given set of stems.

book/data/musdb18.ipynb

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"metadata": {},
1414
"source": [
1515
"## Overview\n",
16-
"The information in this sub-section is based on the [MUSB18 dataset page](https://sigsep.github.io/datasets/musdb.html). Here we have edited down the content to focus on the details relevant to this tutorial while keeping it concise. For more details about the datataset please consult the dataset page.\n",
16+
"The information in this sub-section is based on the [MUSB18 dataset page](https://sigsep.github.io/datasets/musdb.html). {cite}`musdb18,musdb18-hq` Here we have edited down the content to focus on the details relevant to this tutorial while keeping it concise. For more details about the datataset please consult the dataset page.\n",
1717
"\n",
1818
"MUSDB18 is a dataset of 150 full length music tracks (~10h total duration) of varying genres. For each track it provides:\n",
1919
"* The mixture \n",
@@ -433,9 +433,7 @@
433433
{
434434
"cell_type": "code",
435435
"execution_count": 11,
436-
"metadata": {
437-
"scrolled": false
438-
},
436+
"metadata": {},
439437
"outputs": [
440438
{
441439
"data": {
@@ -516,7 +514,7 @@
516514
"name": "python",
517515
"nbconvert_exporter": "python",
518516
"pygments_lexer": "ipython3",
519-
"version": "3.7.9"
517+
"version": "3.8.5"
520518
}
521519
},
522520
"nbformat": 4,

book/first_steps/nussl_intro.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,12 @@
1818
"cell_type": "markdown",
1919
"metadata": {},
2020
"source": [
21-
"In this section, we will explore many source separation approaches through `nussl`, which is an open source python project featuring implementations of many methods.\n",
21+
"In this section, we will explore many source separation approaches through `nussl`, which is an open source python project featuring implementations of many methods. {cite}`nussl`\n",
2222
"\n",
2323
"\n",
2424
"## Why nussl?\n",
2525
"\n",
26-
"As we saw in the {ref}`opensrcmap` section, there are _a lot_ of open source projects for source separation. We certainly don't want do disuade you from using those projects, because they contain a ton of amazing work. But why aren't we teaching each of those repositories? Why are we only teaching `nussl` in this tutorial?\n",
26+
"As we saw in the {ref}`opensrcmap` section, there are _a lot_ of open source projects for source separation. We certainly don't want do dissuade you from using those projects, because they contain a ton of amazing work. But why aren't we teaching each of those repositories? Why are we only teaching `nussl` in this tutorial?\n",
2727
"\n",
2828
"* **nussl contains over a dozen source separation algorithms**:\n",
2929
" * nussl has ready-to-go implementations of classic and modern source separation algorithms. Learning nussl will give you access to all of them. In contrast, most of the open source projects for source separation only contain _one_ type of algorithm.\n",
@@ -437,7 +437,7 @@
437437
"cell_type": "markdown",
438438
"metadata": {},
439439
"source": [
440-
"If we hadn’t set `overwrite=True` then `to_mono()` would just return a new audio signal that is an exact copy of signal1 except it is mono. You will see this pattern come up again. In certain places, `AudioSignal`’s default behavior is to overwrite its internal data, and in other places the default is to not overwrite data. See the reference pages for more info. Let’s try:"
440+
"If we hadn’t set `overwrite=True` then `to_mono()` would just return a new audio signal that is an exact copy of `signal1` except it is mono. You will see this pattern come up again. In certain places, `AudioSignal`’s default behavior is to overwrite its internal data, and in other places the default is to not overwrite data. See the reference pages for more info. Let’s try:"
441441
]
442442
},
443443
{
@@ -537,7 +537,7 @@
537537
"cell_type": "markdown",
538538
"metadata": {},
539539
"source": [
540-
"No exceptions this time! Great! signal3 is now a new AudioSignal object. We can similarly subtract two signals.\n",
540+
"No exceptions this time! Great! `signal3` is now a new AudioSignal object. We can similarly subtract two signals.\n",
541541
"\n",
542542
"Let's write this to a file:"
543543
]

book/first_steps/repetition.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
"\n",
3232
"### REPET Overview\n",
3333
"\n",
34-
"The first algorithm we will explore here is called the REpeating Patern Extraction Technique or REPET {cite}`rafii2012repeating`. REPET works like this:\n",
34+
"The first algorithm we will explore here is called the REpeating Pattern Extraction Technique or REPET {cite}`rafii2012repeating`. REPET works like this:\n",
3535
"\n",
3636
" 1. Find a repeating period, $t_r$ seconds (_e.g._, the number of seconds which a chord progression might start over).\n",
3737
" 2. Segment the spectrogram into $N$ segments, each with $t_r$ seconds in length.\n",
@@ -662,7 +662,7 @@
662662
"\n",
663663
"\n",
664664
"**Ask yourself:**\n",
665-
"How do these numbers fit with how you percieved the output quality of our REPET model? Do you feel that the REPET model did a good job separating the singer from everything else in the mixture?"
665+
"How do these numbers fit with how you perceived the output quality of our REPET model? Do you feel that the REPET model did a good job separating the singer from everything else in the mixture?"
666666
]
667667
},
668668
{
@@ -716,7 +716,7 @@
716716
"\n",
717717
"Now let's look at a few other algorithms that leverage repetition in a musical recording and compare results to REPET.\n",
718718
"\n",
719-
"REPET-SIM {cite}`rafii2012music` is a variant of REPET that doesn't rely on a fixed repeating period. In fact, it doesn't rely on repetition as explicitly as REPET does. REPET-SIM calculates a similarity matrix between each pair of spectral frames in an STFT, selects the $k$ nearest nieghbors for each frame, and makes a mask by median filtering the bins for each of the selected neighbors. \n",
719+
"REPET-SIM {cite}`rafii2012music` is a variant of REPET that doesn't rely on a fixed repeating period. In fact, it doesn't rely on repetition as explicitly as REPET does. REPET-SIM calculates a similarity matrix between each pair of spectral frames in an STFT, selects the $k$ nearest neighbors for each frame, and makes a mask by median filtering the bins for each of the selected neighbors. \n",
720720
"\n",
721721
"We can run REPET-SIM the same way we can run REPET:"
722722
]

book/references.bib

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -730,5 +730,40 @@ @article{spleeter2020
730730
note = {Deezer Research}
731731
}
732732

733+
@misc{musdb18,
734+
author = {Rafii, Zafar and
735+
Liutkus, Antoine and
736+
Fabian-Robert St{\"o}ter and
737+
Mimilakis, Stylianos Ioannis and
738+
Bittner, Rachel},
739+
title = {The {MUSDB18} corpus for music separation},
740+
month = dec,
741+
year = 2017,
742+
doi = {10.5281/zenodo.1117372},
743+
url = {https://doi.org/10.5281/zenodo.1117372}
744+
}
745+
746+
@misc{musdb18-hq,
747+
author = {Rafii, Zafar and
748+
Liutkus, Antoine and
749+
Stöter, Fabian-Robert and
750+
Mimilakis, Stylianos Ioannis and
751+
Bittner, Rachel},
752+
title = {MUSDB18-HQ - an uncompressed version of MUSDB18},
753+
month = aug,
754+
year = 2019,
755+
doi = {10.5281/zenodo.3338373},
756+
url = {https://doi.org/10.5281/zenodo.3338373}
757+
}
758+
759+
760+
@inproceedings {nussl
761+
author = {Ethan Manilow and Prem Seetharaman and Bryan Pardo},
762+
title = "The Northwestern University Source Separation Library",
763+
publisher = "Proceedings of the 19th International Society of Music Information Retrieval
764+
Conference ({ISMIR} 2018), Paris, France, September 23-27",
765+
year = 2018
766+
}
767+
733768

734769

book/training/building_blocks.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
"scene, one obvious thing to do was to create a deep network that would predict\n",
4747
"the masks directly.\n",
4848
"\n",
49-
"```{figure} ../../images/deep_approaches/mask_inf.png\n",
49+
"```{figure} ../images/deep_approaches/mask_inf.png\n",
5050
"---\n",
5151
"alt: Diagram of the Mask Inference architecture.\n",
5252
"name: mask_inf\n",
@@ -1155,7 +1155,7 @@
11551155
"every time-frequency point to a D-dimensional unit-normalized embedding, and then use K-means\n",
11561156
"clustering to extract the actual sources. \n",
11571157
"\n",
1158-
"```{figure} ../../images/deep_approaches/deep_clustering.png\n",
1158+
"```{figure} ../images/deep_approaches/deep_clustering.png\n",
11591159
"---\n",
11601160
"alt: Diagram of the Deep Clustering architecture.\n",
11611161
"name: deep_clustering\n",

0 commit comments

Comments
 (0)