Skip to content

Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 Workshop)

License

Notifications You must be signed in to change notification settings

Gunale0926/Grams

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grams: Gradient Descent with Adaptive Momentum Scaling (ICLR 2025 SCOPE Workshop)

arXiv PyPI - Version

Authors: Yang Cao, Xiaoyu Li, Zhao Song

This repository contains the official PyTorch implementation for Grams optimizer.

We introduce Gradient Descent with Adaptive Momentum Scaling (Grams), a novel optimization algorithm that decouples the direction and magnitude of parameter updates in deep learning. Unlike traditional optimizers that directly integrate momentum into updates, Grams separates the update direction, derived from current gradients, from momentum, which is used solely for adaptive magnitude scaling. This approach enables Grams to achieve improved loss descent compared to state-of-the-art cautious and momentum-based optimizers.

image

Install

Use the following command to install our pytorch implementation for Grams:

pip install grams-pytorch

How to use Grams

Switching from Adam/AdamW to Grams is simple and requires only two lines of code:

Before:

import torch
optimizer = torch.optim.adam(model.parameters(), lr=1e-3, weight_decay=0.0)

Switching to Grams:

from grams import Grams
optimizer = Grams(model.parameters(), lr=1e-3, weight_decay=0.0)

Just import Grams and swap the optimizer—everything else remains the same!

Citation

Please cite our work!

@inproceedings{cao2025grams,
title={Grams: Gradient Descent with Adaptive Momentum Scaling},
author={Yang Cao and Xiaoyu Li and Zhao Song},
booktitle={ICLR 2025 First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models},
year={2025},
url={https://openreview.net/forum?id=GmKQnpQdsc}
}