|
564 | 564 | "I would like to stop and mention that as we now interpret $P$ as a joint probability matrix, we can define its entropy, the marginal probabiilty entropy, and KL-divergence between two different transportation matrix. These takes the form of\n",
|
565 | 565 | "\n",
|
566 | 566 | "\\begin{align*}\n",
|
567 |
| - " \\text{Entropy} &= H(P) &= -\\sum_{i,j} p_{i,j} log(p_{i,j}) & \\\\\n", |
568 |
| - " \\text{Marginal source distribution entropy r} &= H(r) &= -\\sum_{i} \\left( \\sum_{j} p_{i,j} \\right) log\\left( \\sum_{j} p_{i,j} \\right) &= −\\sum_{i} r_i log(r_i)\\\\\n", |
569 |
| - " \\text{Marginal destination distribution entropy c} &= H(c) &= -\\sum_{j} \\left( \\sum_{i} p_{i,j} \\right) log\\left( \\sum_{i} p_{i,j} \\right) &= −\\sum_{i} c_i log(c_i)\\\\\n", |
570 |
| - " \\text{KL-divergence between P and Q transportation} &= KL(P\\|Q) &= \\sum_{i,j} p_{i,j} log\\left(\\frac{p_{i,j}}{{q}_{i,j}}\\right) &\n", |
| 567 | + " \\text{Entropy} &= H(P) &= -\\sum_{i,j} p_{i,j} log(p_{i,j}) & \\tag{1.1} \\\\\n", |
| 568 | + " \\text{Marginal source distribution entropy r} &= H(r) &= -\\sum_{i} \\left( \\sum_{j} p_{i,j} \\right) log\\left( \\sum_{j} p_{i,j} \\right) &= −\\sum_{i} r_i log(r_i) \\tag{1.2}\\\\\n", |
| 569 | + " \\text{Marginal destination distribution entropy c} &= H(c) &= -\\sum_{j} \\left( \\sum_{i} p_{i,j} \\right) log\\left( \\sum_{i} p_{i,j} \\right) &= −\\sum_{i} c_i log(c_i) \\tag{1.3}\\\\\n", |
| 570 | + " \\text{KL-divergence between P and Q transportation} &= KL(P\\|Q) &= \\sum_{i,j} p_{i,j} log\\left(\\frac{p_{i,j}}{{q}_{i,j}}\\right) & \\tag{1.4}\n", |
| 571 | + "\\end{align*}" |
| 572 | + ] |
| 573 | + }, |
| 574 | + { |
| 575 | + "cell_type": "markdown", |
| 576 | + "metadata": {}, |
| 577 | + "source": [ |
| 578 | + "We can easily get the following inequality for joint distributions entropy (we recall that $\\Sigma_d$ is a probability simplex in dimension d, and $U(r,c)$ is the set of transport matrices on $(r,c)$) :\n", |
| 579 | + "\\begin{align*} \\tag{1.5}\n", |
| 580 | + " \\forall r,c \\in \\Sigma_d, \\forall P \\in U(r,c), h(P) \\leq h(r) + h(c)\n", |
| 581 | + "\\end{align*}\n", |
| 582 | + "ie, by using log-sum inequality, proved in the notebook called InformationTheoryOptimization\n", |
| 583 | + "\\begin{align*}\\tag{1.6}\n", |
| 584 | + " \\sum_{i=0}^{N-1} a_i log\\left(\\frac{a_i}{b_i}\\right) &\\geq \\left(\\sum_{i=0}^{N-1} a_i\\right) log\\left(\\frac{\\sum_{i=0}^{N-1}a_i}{\\sum_{i=0}^{N-1}b_i}\\right)\n", |
571 | 585 | "\\end{align*}\n",
|
| 586 | + "Where $a_i,b_i \\in [\\mathbb{R}^{+} \\times \\mathbb{R}^{+}]$, $a=\\sum_{i=0}^{N-1} a_i$, $b=\\sum_{i=0}^{N-1} b_i = 1$.\n", |
572 | 587 | "\n",
|
573 |
| - "We remember the following inequality for joint distributions entropy (we recall that $\\Sigma_d$ is a probability simplex in dimension d, and $U(r,c)$ is the set of transport matrices on this set) :\n", |
| 588 | + "ie:\n", |
574 | 589 | "\\begin{align*}\n",
|
575 |
| - " \\forall r,c \\in \\Sigma_d, \\forall P \\in U(r,c), h(P) \\leq h(r) + h(c)\n", |
| 590 | + " H(r) &= -\\sum_{i} \\left( \\sum_{j} p_{i,j} \\right) log\\left( \\sum_{j} p_{i,j} \\right) \\\\\n", |
| 591 | + " &leq\n", |
576 | 592 | "\\end{align*}\n",
|
577 | 593 | "\n",
|
578 |
| - "By the concavity of entropy, we can introduce the convex set\n", |
| 594 | + "\n" |
| 595 | + ] |
| 596 | + }, |
| 597 | + { |
| 598 | + "cell_type": "markdown", |
| 599 | + "metadata": {}, |
| 600 | + "source": [ |
| 601 | + "By the concavity of entropy (we proved in the notebook called InformationTheoryOptimization) we can introduce the convex set\n", |
579 | 602 | "\\begin{align*}\n",
|
580 | 603 | " U_{\\alpha}(r,c) := \\{ P \\in U(r,c) | KL(P\\|rcˆT) \\leq \\alpha \\} = \\{ P \\in U(r,c)|h(P) \\geq h(r)+h(c)−\\alpha \\} \\subset U(r,c)\n",
|
581 | 604 | "\\end{align*}\n",
|
|
588 | 611 | "This quantity is also the mutual information $I(X\\|Y)$ of two random variables $(X, Y)$ should they follow the joint probability $P$ . Hence, the set of tables P whose Kullback-Leibler divergence to rcT is constrained to lie below a certain threshold can be interpreted as the set of joint probabilities P in U (r, c) which have sufficient entropy with respect to h(r) and h(c), or small enough mutual information. For reasons that will become clear in Section 4, we call the quantity below the Sinkhorn distance of r and c:"
|
589 | 612 | ]
|
590 | 613 | },
|
591 |
| - { |
592 |
| - "cell_type": "markdown", |
593 |
| - "metadata": {}, |
594 |
| - "source": [] |
595 |
| - }, |
596 | 614 | {
|
597 | 615 | "cell_type": "markdown",
|
598 | 616 | "metadata": {},
|
|
0 commit comments