Merge branch 'master' of github.com:source-separation/tutorial

justinsalamon · justinsalamon · commit d4906d331d37 · 2020-10-08T16:42:29.000-07:00
diff --git a/book/approaches/deep/architectures.md b/book/approaches/deep/architectures.md
@@ -98,7 +98,7 @@ Image used courtesy of Fabian-Robert Stöter (<a href="https://github.com/sigsep
 Open-Unmix is a more recent neural network architecture that boasts impressive
 performance. Open-Unmix has one fully connected layer with batch norm and a `tanh`
 activation, followed a set of three BLSTM layers in the center, and then two
-more fully connected layers with batch norma and `ReLU` activations. The pytorch
+more fully connected layers with batch norm and `ReLU` activations. The pytorch
 implementation has a dropout applied to the first two BLSTM layers with a 
 zeroing probability of 40%.
 
diff --git a/book/basics/evaluation.ipynb b/book/basics/evaluation.ipynb
@@ -133,14 +133,14 @@
     "inflated.\n",
     "\n",
     "Scale-Invariant Source-to-Distortion Ratio (SI-SDR) aims to remedy this\n",
-    "by removing SDR's dependency on the amplidute scaling of the signal.\n",
+    "by removing SDR's dependency on the amplitude scaling of the signal.\n",
     "{cite}`le2019sdr` It also comes with accompanying SI-SAR, and SI-SIR,\n",
     "which corresponds to SAR and SIR described above, respectively.\n",
     "Although these measures are not sensitive to amplitude scaling, it\n",
     "is a quicker computation because it does not require windowing\n",
-    "the estimatd and ground truth signals like SDR.\n",
+    "the estimated and ground truth signals like SDR.\n",
     "\n",
-    "In {numref}`sdr_vs_sisdr`, the discrepency between SDR and SI-SDR\n",
+    "In {numref}`sdr_vs_sisdr`, the discrepancy between SDR and SI-SDR\n",
     "scores is shown. The top spectrogram shows the ground truth signal.\n",
     "Above it are its scores for SDR, SNR, and SI-SDR. As expected the\n",
     "ground truth signal gets high values for SDR, SNR, and SI-SDR\n",
@@ -234,13 +234,6 @@
     "and takes a few days to get the results. Calculating SDR values on\n",
     "the other hand is virtually free and takes a few hours at most."
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
diff --git a/book/data/introduction.md b/book/data/introduction.md
@@ -81,7 +81,7 @@ that is representative of the type of data you plan to apply your model to once
 ## Data for source separation is hard to obtain
 
 Due to copyright, it is hard to obtain and share music recordings for machine learning purposes. It is even harder to obtain
-multi-track recordings that include the isolated stems, as these are rarely made available by artists. Fortuntaely, the research
+multi-track recordings that include the isolated stems, as these are rarely made available by artists. Fortunately, the research
 community has nonetheless been able to create and share multi-track datasets, as we shall see late. The size of these datasets
 is typically very small compared to other machine learning datasets. Luckily for us, we have tools to generate multiple, 
 different mixtures from the same set of stems, helping us to maximize what our model can learn from a given set of stems. 
diff --git a/book/data/musdb18.ipynb b/book/data/musdb18.ipynb
@@ -13,7 +13,7 @@
    "metadata": {},
    "source": [
     "## Overview\n",
-    "The information in this sub-section is based on the [MUSB18 dataset page](https://sigsep.github.io/datasets/musdb.html). Here we have edited down the content to focus on the details relevant to this tutorial while keeping it concise. For more details about the datataset please consult the dataset page.\n",
+    "The information in this sub-section is based on the [MUSB18 dataset page](https://sigsep.github.io/datasets/musdb.html). {cite}`musdb18,musdb18-hq` Here we have edited down the content to focus on the details relevant to this tutorial while keeping it concise. For more details about the datataset please consult the dataset page.\n",
     "\n",
     "MUSDB18 is a dataset of 150 full length music tracks (~10h total duration) of varying genres. For each track it provides:\n",
     "* The mixture \n",
@@ -433,9 +433,7 @@
   {
    "cell_type": "code",
    "execution_count": 11,
-   "metadata": {
-    "scrolled": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -516,7 +514,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.9"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,
diff --git a/book/first_steps/nussl_intro.ipynb b/book/first_steps/nussl_intro.ipynb
@@ -18,12 +18,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this section, we will explore many source separation approaches through `nussl`, which is an open source python project featuring implementations of many methods.\n",
+    "In this section, we will explore many source separation approaches through `nussl`, which is an open source python project featuring implementations of many methods. {cite}`nussl`\n",
     "\n",
     "\n",
     "## Why nussl?\n",
     "\n",
-    "As we saw in the {ref}`opensrcmap` section, there are _a lot_ of open source projects for source separation. We certainly don't want do disuade you from using those projects, because they contain a ton of amazing work. But why aren't we teaching each of those repositories? Why are we only teaching `nussl` in this tutorial?\n",
+    "As we saw in the {ref}`opensrcmap` section, there are _a lot_ of open source projects for source separation. We certainly don't want do dissuade you from using those projects, because they contain a ton of amazing work. But why aren't we teaching each of those repositories? Why are we only teaching `nussl` in this tutorial?\n",
     "\n",
     "* **nussl contains over a dozen source separation algorithms**:\n",
     "    * nussl has ready-to-go implementations of classic and modern source separation algorithms. Learning nussl will give you access to all of them. In contrast, most of the open source projects for source separation only contain _one_ type of algorithm.\n",
@@ -437,7 +437,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If we hadn’t set `overwrite=True` then `to_mono()` would just return a new audio signal that is an exact copy of signal1 except it is mono. You will see this pattern come up again. In certain places, `AudioSignal`’s default behavior is to overwrite its internal data, and in other places the default is to not overwrite data. See the reference pages for more info. Let’s try:"
+    "If we hadn’t set `overwrite=True` then `to_mono()` would just return a new audio signal that is an exact copy of `signal1` except it is mono. You will see this pattern come up again. In certain places, `AudioSignal`’s default behavior is to overwrite its internal data, and in other places the default is to not overwrite data. See the reference pages for more info. Let’s try:"
    ]
   },
   {
@@ -537,7 +537,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "No exceptions this time! Great! signal3 is now a new AudioSignal object. We can similarly subtract two signals.\n",
+    "No exceptions this time! Great! `signal3` is now a new AudioSignal object. We can similarly subtract two signals.\n",
     "\n",
     "Let's write this to a file:"
    ]
diff --git a/book/first_steps/repetition.ipynb b/book/first_steps/repetition.ipynb
@@ -31,7 +31,7 @@
     "\n",
     "### REPET Overview\n",
     "\n",
-    "The first algorithm we will explore here is called the REpeating Patern Extraction Technique or REPET {cite}`rafii2012repeating`. REPET works like this:\n",
+    "The first algorithm we will explore here is called the REpeating Pattern Extraction Technique or REPET {cite}`rafii2012repeating`. REPET works like this:\n",
     "\n",
     "  1. Find a repeating period, $t_r$ seconds (_e.g._, the number of seconds which a chord progression might start over).\n",
     "  2. Segment the spectrogram into $N$ segments, each with $t_r$ seconds in length.\n",
@@ -662,7 +662,7 @@
     "\n",
     "\n",
     "**Ask yourself:**\n",
-    "How do these numbers fit with how you percieved the output quality of our REPET model? Do you feel that the REPET model did a good job separating the singer from everything else in the mixture?"
+    "How do these numbers fit with how you perceived the output quality of our REPET model? Do you feel that the REPET model did a good job separating the singer from everything else in the mixture?"
    ]
   },
   {
@@ -716,7 +716,7 @@
     "\n",
     "Now let's look at a few other algorithms that leverage repetition in a musical recording and compare results to REPET.\n",
     "\n",
-    "REPET-SIM {cite}`rafii2012music` is a variant of REPET that doesn't rely on a fixed repeating period. In fact, it doesn't rely on repetition as explicitly as REPET does. REPET-SIM calculates a similarity matrix between each pair of spectral frames in an STFT, selects the $k$ nearest nieghbors for each frame, and makes a mask by median filtering the bins for each of the selected neighbors. \n",
+    "REPET-SIM {cite}`rafii2012music` is a variant of REPET that doesn't rely on a fixed repeating period. In fact, it doesn't rely on repetition as explicitly as REPET does. REPET-SIM calculates a similarity matrix between each pair of spectral frames in an STFT, selects the $k$ nearest neighbors for each frame, and makes a mask by median filtering the bins for each of the selected neighbors. \n",
     "\n",
     "We can run REPET-SIM the same way we can run REPET:"
    ]
diff --git a/book/references.bib b/book/references.bib
@@ -730,5 +730,40 @@ @article{spleeter2020
   note = {Deezer Research}
 }
 
+@misc{musdb18,
+  author       = {Rafii, Zafar and
+                  Liutkus, Antoine and
+                  Fabian-Robert St{\"o}ter and
+                  Mimilakis, Stylianos Ioannis and
+                  Bittner, Rachel},
+  title        = {The {MUSDB18} corpus for music separation},
+  month        = dec,
+  year         = 2017,
+  doi          = {10.5281/zenodo.1117372},
+  url          = {https://doi.org/10.5281/zenodo.1117372}
+}
+
+@misc{musdb18-hq,
+  author       = {Rafii, Zafar and
+                  Liutkus, Antoine and
+                  Stöter, Fabian-Robert and
+                  Mimilakis, Stylianos Ioannis and
+                  Bittner, Rachel},
+  title        = {MUSDB18-HQ - an uncompressed version of MUSDB18},
+  month        = aug,
+  year         = 2019,
+  doi          = {10.5281/zenodo.3338373},
+  url          = {https://doi.org/10.5281/zenodo.3338373}
+}
+
+
+@inproceedings {nussl
+    author = {Ethan Manilow and Prem Seetharaman and Bryan Pardo},
+    title = "The Northwestern University Source Separation Library",
+    publisher = "Proceedings of the 19th International Society of Music Information Retrieval
+        Conference ({ISMIR} 2018), Paris, France, September 23-27",
+    year = 2018
+}
+
 
 
diff --git a/book/training/building_blocks.ipynb b/book/training/building_blocks.ipynb
@@ -46,7 +46,7 @@
     "scene, one obvious thing to do was to create a deep network that would predict\n",
     "the masks directly.\n",
     "\n",
-    "```{figure} ../../images/deep_approaches/mask_inf.png\n",
+    "```{figure} ../images/deep_approaches/mask_inf.png\n",
     "---\n",
     "alt: Diagram of the Mask Inference architecture.\n",
     "name: mask_inf\n",
@@ -1155,7 +1155,7 @@
     "every time-frequency point to a D-dimensional unit-normalized embedding, and then use K-means\n",
     "clustering to extract the actual sources. \n",
     "\n",
-    "```{figure} ../../images/deep_approaches/deep_clustering.png\n",
+    "```{figure} ../images/deep_approaches/deep_clustering.png\n",
     "---\n",
     "alt: Diagram of the Deep Clustering architecture.\n",
     "name: deep_clustering\n",