HippoTrainer is a PyTorch-compatible library for gradient-based hyperparameter optimization, implementing cutting-edge algorithms that leverage automatic differentiation to efficiently tune hyperparameters.
- Technical Meeting 1 - Presentation
- Technical Meeting 2 - Jupyter Notebook
- Technical Meeting 3 - Jupyter Notebook
- Documentation
- Tests
- Blog Post
- Algorithm Zoo: T1-T2, Neumann, HOAG, DrMAD
- PyTorch Native: Direct integration with
torch.nn.Module
- Memory Efficient: Checkpointing & implicit differentiation
- Scalable: From laptop to cluster with PyTorch backend
- T1-T2 (Paper): One-step unrolled optimization
- Neumann (Paper): Leveraging Neumann series approximation for implicit differentiation
- HOAG (Paper): Implicit differentiation via conjugate gradient
- DrMAD (Paper): Memory-efficient piecewise-linear backpropagation
Our python library hippotrainer
supports multiple ways to be installed.
You can choose the most suitable for your purposes.
We suggest to use installation from source (pip install git+https://github.com/intsystems/hippotrainer
), if you want to get the latest package version.
pip install hippotrainer
pip install git+https://github.com/intsystems/hippotrainer
git clone https://github.com/intsystems/hippotrainer
cd hippotrainer
pip install -e .
You can use our library to tune almost all (see below) hyperparameters in your own code.
The HyperOptimizer
interface is very similar to Optimizer
from PyTorch
.
It supports key functionalities:
step
to do an optimization step over parameters (or hyperparameters, see below)zero_grad
to zero out the parameters gradients (same asoptimizer.zero_grad()
)
We provide demo experiments with each implemented method in this notebook. They works as follows:
- Get next batch from train dataloader
- Forward and backward on calculated loss
hyper_optimizer.step(loss)
do model parameters step and (if inner steps were accumulated) hyperparameters step (calculate hypergradients, do the optimization step, zeroes hypergradients)hyper_optimizer.zero_grad()
zeroes the model parameters gradients (same asoptimizer.zero_grad()
)
Gradient-based hyperparameters optimization involves hyper-optimization steps during the
model parameters optimization. Thus, we combine Optimizer
method step
with inner_steps
,
defined by each method.
For example, T1T2
do NOT use any inner steps, therefore optimization over parameters
and hyperparameters is done step by step. But Neumann
method do some inner optimization steps
over model parameters before it do the hyperstep.
See more details here.
The HyperOptimizer
logic is well-suited for almost all CONTINUOUS (required for gradient-based methods) hyperparameters types:
- Model hyperparameters (e.g., gate coefficients)
- Loss hyperparameters (e.g., L1/L2-regularization)
However, it currently does not support (or support, but actually was not sufficiently tested) learning rate tuning. We plan to improve our functionality in future releases, stay tuned!
- Daniil Dorin (Basic code writing, Final demo, Algorithms)
- Igor Ignashin (Project wrapping, Documentation writing, Algorithms)
- Nikita Kiselev (Project planning, Blog post, Algorithms)
- Andrey Veprikov (Tests writing, Documentation writing, Algorithms)
- We welcome contributions!
HippoTrainer is MIT licensed. See LICENSE for details.