Skip to content

Commit 068287f

Browse files
authored
doc: add introduction on soft gradient boosting (#74)
1 parent 5c6933c commit 068287f

File tree

2 files changed

+21
-8
lines changed

2 files changed

+21
-8
lines changed
69.4 KB
Loading

docs/introduction.rst

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ Voting and bagging are popularly used ensemble methods. Basically, voting and ba
2828

2929
Compared to voting, bagging further uses sampling with replacement on each batch of data. Notice that sub-sampling is not typically used when training neural networks, because the neural networks typically achieve better performance with more training data.
3030

31-
Gradient Boosting
32-
-----------------
31+
Gradient Boosting [1]_
32+
----------------------
3333

3434
Gradient boosting trains all base estimators in a sequential fashion, as the learning target of a base estimator :math:`h^m` is associated with the outputs from base estimators fitted before, i.e., :math:`\{h^1, \cdots, h^{m-1}\}`.
3535

@@ -49,7 +49,7 @@ The figure below presents the data flow of gradient boosting during the training
4949
:align: center
5050
:width: 500
5151

52-
Snapshot Ensemble [1]_
52+
Snapshot Ensemble [2]_
5353
----------------------
5454

5555
Unlike all methods above, where :math:`M` independent base estimators will be trained, snapshot ensemble generates the ensemble by enforcing a single base estimator to converge to different local minima :math:`M` times. At each minima, the parameters of this estimator are saved (i.e., a snapshot), serving as a base estimator in the ensemble. The output of snapshot ensemble also takes the average over the predictions from all snapshots.
@@ -61,7 +61,7 @@ To obtain snapshots with good performance, snapshot ensemble uses **cyclic annea
6161
6262
Notice that the iteration above indicates the loop on enumerating all batches within each epoch, instead of the loop on iterating over all training epochs.
6363

64-
Adversarial Training [2]_
64+
Adversarial Training [3]_
6565
-------------------------
6666

6767
Adversarial samples can be used to improve the performance of base estimators, as validated by the authors in [2]. The implemented ``AdversarialTrainingClassifier`` and ``AdversarialTrainingRegressor`` contain :math:`M` independent base estimators, and each of them is fitted independently as in Voting and Bagging.
@@ -70,13 +70,26 @@ During the training stage of each base estimator :math:`h^m`, an adversarial sam
7070

7171
Same as Voting and Bagging, the output of ``AdversarialTrainingClassifier`` or ``AdversarialTrainingRegressor`` during the evaluating stage is the average over predictions from all base estimators.
7272

73-
Fast Geometric Ensemble [3]_
73+
Fast Geometric Ensemble [4]_
7474
----------------------------
7575

7676
Motivated by geometric insights on the loss surface of deep neural networks, Fast Geometirc Ensembling (FGE) is an efficient ensemble that uses a customized learning rate scheduler to generate base estimators, similar to snapshot ensemble.
7777

78+
Soft Gradient Boosting [5]_
79+
---------------------------
80+
81+
The sequential training stage of gradient boosting makes it prohibitively expensive to use when large neural networks are chosen as the base estimator. The recently proposed soft gradient boosting machine mitigates this problem by concatenating all base estimators in the ensemble, and by using local and global training objectives inspired from gradient boosting. As a result, it is able to simultaneously train all base estimators, while achieving similar boosting performance as gradient boosting.
82+
83+
The figure below is the model architecture of soft gradient boosting.
84+
85+
.. image:: ./_images/soft_gradient_boosting.png
86+
:align: center
87+
:width: 400
88+
7889
**References**
7990

80-
.. [1] Huang Gao, Sharon Yixuan Li, Geoff Pleisset, et al., "Snapshot ensembles: Train 1, get m for free." ICLR, 2017.
81-
.. [2] Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell., "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles." NIPS 2017.
82-
.. [3] Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin et al., "Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs." NeurIPS, 2018.
91+
.. [1] Jerome H. Friedman., "Greedy Function Approximation: A Gradient Boosting Machine." The Annals of Statistics, 2001.
92+
.. [2] Huang Gao, Sharon Yixuan Li, Geoff Pleisset, et al., "Snapshot Ensembles: Train 1, Get M for Free." ICLR, 2017.
93+
.. [3] Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell., "Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles." NIPS 2017.
94+
.. [4] Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin et al., "Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs." NeurIPS, 2018.
95+
.. [5] Ji Feng, Yi-Xuan Xu, Yuan Jiang, Zhi-Hua Zhou., "Soft Gradient Boosting Machine.", arXiv, 2020.

0 commit comments

Comments
 (0)