Done with concavity of entropy

gnthibault · gnthibault · commit 6ddfdd50f796 · 2024-02-29T07:48:05.000+01:00
diff --git a/InformationTheoryOptimization.ipynb b/InformationTheoryOptimization.ipynb
@@ -212,8 +212,8 @@
    "source": [
     "#### Convexity of KL-divergence\n",
     "Note that the KL divergence is convex in the space of pairs of probability distributions $(p,q)$, ie:\n",
-    "\\begin{align*}\n",
-    " KL[\\lambda p_1 + (1-\\lambda) p_2 \\| \\lambda q_1 + (1-\\lambda) q_2] \\leq \\lambda KL[p_1\\|q_1] + (1-\\lambda) KL[p_2\\|q_2 \\tag{(2.1)}\n",
+    "\\begin{align*}\\tag{2.1}\n",
+    " KL[\\lambda p_1 + (1-\\lambda) p_2 \\| \\lambda q_1 + (1-\\lambda) q_2] \\leq \\lambda KL[p_1\\|q_1] + (1-\\lambda) KL[p_2\\|q_2]\n",
     "\\end{align*}\n",
     "\n",
     "We recall that KL-divergence between two distribution $(p,q)$ reads\n",
@@ -248,8 +248,17 @@
     "    &\\geq a log\\left(\\frac{a}{b}\\right) \\\\\n",
     "\\end{align*}\n",
     "\n",
-    "Now, using (2.4)\n",
-    "\n"
+    "You can notice that the last line of (2.4) is equivalent to the equation (2.3) we just have proved, we can now get back to our original problem, ie proving convexity of KL-divergence. Lets restart from (2.2):\n",
+    "\n",
+    "\\begin{align*}\n",
+    " &KL[\\lambda p_1 + (1-\\lambda) p_2 \\| \\lambda q_1 + (1-\\lambda) q_2] \\\\\n",
+    "&= - \\sum_{i=0}^{N-1} (\\lambda p_1(x_i) + (1-\\lambda) p_2(x_i))log\\left(\\frac{\\lambda p_1(x_i) + (1-\\lambda) p_2(x_i)}{\\lambda q_1(x_i) + (1-\\lambda) q_2(x_i)}\\right) \\\\\n",
+    "&\\leq - \\sum_{i=0}^{N-1} \\lambda p_1(x_i)log\\left(\\frac{\\lambda p_1(x_i)}{\\lambda q_1(x_i)}\\right) + (1-\\lambda)p_2(x_i))log\\left(\\frac{(1-\\lambda) p_2(x_i)}{(1-\\lambda) q_2(x_i)}\\right) \\qquad \\text{thanks to (2.3)} \\\\\n",
+    "&= \\lambda \\left[-\\sum_{i=0}^{N-1} p_1(x_i)log\\left(\\frac{p_1(x_i)}{q_1(x_i)}\\right) \\right] +(1-\\lambda) \\left[-\\sum_{i=0}^{N-1}p_2(x_i))log\\left(\\frac{ p_2(x_i)}{q_2(x_i)}\\right)\\right] \\\\\n",
+    "&= \\lambda KL[p_1 \\| q_1] + (1-\\lambda)KL[p_2 \\| q_2]\n",
+    "\\end{align*}\n",
+    "\n",
+    "KL-divergence is indeed convex in the space of pairs of distributions with matching support. Note that the support constraint is important, this will be reminded later on in the section dedicated to KL-divergennce."
    ]
   },
   {
@@ -259,12 +268,12 @@
     "#### Using convexity of KL-divergence to prove entropy concavity\n",
     "\n",
     "Lets take a special case of equation (2.1) where $(q_1,q_2)=(u,u)$ a pair of uniform discrete distributions:\n",
-    "\\begin{align*} \\tag{(2.2)}\n",
+    "\\begin{align*} \\tag{2.5}\n",
     " KL[\\lambda p_1 + (1-\\lambda) p_2 \\| \\lambda u + (1-\\lambda) u] &\\leq \\lambda KL[p_1\\|u] + (1-\\lambda) KL[p_2\\|u] \\\\\n",
     " KL[\\lambda p_1 + (1-\\lambda) p_2 \\| u] &\\leq \\lambda KL[p_1\\|u] + (1-\\lambda) KL[p_2\\|u]\n",
     "\\end{align*}\n",
     "\n",
-    "Lets now replace $KL[p\\|u]$ from equation (2.2), ie $KL[p\\|u]=log(N)-H[p]$ with the expression obtained in (1.2):\n",
+    "Lets now replace $KL[p\\|u]$ from equation (2.5) with the expression obtained in (1.2), ie $KL[p\\|u]=log(N)-H[p]$ :\n",
     "\\begin{align*}\n",
     " KL[\\lambda p_1 + (1-\\lambda) p_2 \\| u] &\\leq \\lambda KL[p_1\\|u] + (1-\\lambda) KL[p_2\\|u] \\\\\n",
     " log(N)-H[\\lambda p_1 + (1-\\lambda) p_2] &\\leq \\lambda (log(N)-H[p_1]) + (1-\\lambda)(log(N)-H[p_2]) \\\\\n",