Skip to content

Commit 6cfe51e

Browse files
xieenzelawrence-cj
authored andcommitted
initial code update;
Signed-off-by: junsongc <cjs1020440147@icloud.com>
1 parent 80976a9 commit 6cfe51e

File tree

109 files changed

+14813
-26
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+14813
-26
lines changed

.gitignore

100644100755
+20-13
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,19 @@
1+
# Sana related files
2+
.idea/
3+
*.png
4+
*.json
5+
tmp*
6+
output*
7+
output/
8+
outputs/
9+
wandb/
10+
.vscode/
11+
private/
12+
ldm_ae*
13+
data/*
14+
*.pth
15+
.gradio/
16+
117
# Byte-compiled / optimized / DLL files
218
__pycache__/
319
*.py[cod]
@@ -106,8 +122,10 @@ ipython_config.py
106122
#pdm.lock
107123
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
108124
# in version control.
109-
# https://pdm.fming.dev/#use-with-ide
125+
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
110126
.pdm.toml
127+
.pdm-python
128+
.pdm-build/
111129

112130
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
113131
__pypackages__/
@@ -157,15 +175,4 @@ cython_debug/
157175
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
158176
# and can be added to the global gitignore or merged into this file. For a more nuclear
159177
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
160-
.idea/
161-
162-
*png
163-
*json
164-
tmp*
165-
output/
166-
wandb/
167-
.vscode/
168-
private/
169-
ldm_ae*
170-
data/*
171-
*pth
178+
#.idea/

LICENSE

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
Copyright (c) 2019, NVIDIA Corporation. All rights reserved.
2+
3+
4+
Nvidia Source Code License-NC
5+
6+
=======================================================================
7+
8+
1. Definitions
9+
10+
“Licensor” means any person or entity that distributes its Work.
11+
12+
“Work” means (a) the original work of authorship made available under
13+
this license, which may include software, documentation, or other
14+
files, and (b) any additions to or derivative works thereof
15+
that are made available under this license.
16+
17+
“NVIDIA Processors” means any central processing unit (CPU),
18+
graphics processing unit (GPU), field-programmable gate array (FPGA),
19+
application-specific integrated circuit (ASIC) or any combination
20+
thereof designed, made, sold, or provided by NVIDIA or its affiliates.
21+
22+
The terms “reproduce,” “reproduction,” “derivative works,” and
23+
“distribution” have the meaning as provided under U.S. copyright law;
24+
provided, however, that for the purposes of this license, derivative
25+
works shall not include works that remain separable from, or merely
26+
link (or bind by name) to the interfaces of, the Work.
27+
28+
Works are “made available” under this license by including in or with
29+
the Work either (a) a copyright notice referencing the applicability
30+
of this license to the Work, or (b) a copy of this license.
31+
32+
"Safe Model" means ShieldGemma-2B, which is a series of safety
33+
content moderation models designed to moderate four categories of
34+
harmful content: sexually explicit material, dangerous content,
35+
hate speech, and harassment, and which you separately obtain
36+
from Google at https://huggingface.co/google/shieldgemma-2b.
37+
38+
39+
2. License Grant
40+
41+
2.1 Copyright Grant. Subject to the terms and conditions of this
42+
license, each Licensor grants to you a perpetual, worldwide,
43+
non-exclusive, royalty-free, copyright license to use, reproduce,
44+
prepare derivative works of, publicly display, publicly perform,
45+
sublicense and distribute its Work and any resulting derivative
46+
works in any form.
47+
48+
3. Limitations
49+
50+
3.1 Redistribution. You may reproduce or distribute the Work only if
51+
(a) you do so under this license, (b) you include a complete copy of
52+
this license with your distribution, and (c) you retain without
53+
modification any copyright, patent, trademark, or attribution notices
54+
that are present in the Work.
55+
56+
3.2 Derivative Works. You may specify that additional or different
57+
terms apply to the use, reproduction, and distribution of your
58+
derivative works of the Work (“Your Terms”) only if (a) Your Terms
59+
provide that the use limitation in Section 3.3 applies to your
60+
derivative works, and (b) you identify the specific derivative works
61+
that are subject to Your Terms. Notwithstanding Your Terms, this
62+
license (including the redistribution requirements in Section 3.1)
63+
will continue to apply to the Work itself.
64+
65+
3.3 Use Limitation. The Work and any derivative works thereof only may
66+
be used or intended for use non-commercially and with NVIDIA Processors,
67+
in accordance with Section 3.4, below. Notwithstanding the foregoing,
68+
NVIDIA Corporation and its affiliates may use the Work and any
69+
derivative works commercially. As used herein, “non-commercially”
70+
means for research or evaluation purposes only.
71+
72+
3.4 You shall filter your input content to the Work and any derivative
73+
works thereof through the Safe Model to ensure that no content described
74+
as Not Safe For Work (NSFW) is processed or generated. You shall not use
75+
the Work to process or generate NSFW content. You are solely responsible
76+
for any damages and liabilities arising from your failure to adequately
77+
filter content in accordance with this section. As used herein,
78+
“Not Safe For Work” or “NSFW” means content, videos or website pages
79+
that contain potentially disturbing subject matter, including but not
80+
limited to content that is sexually explicit, dangerous, hate,
81+
or harassment.
82+
83+
3.5 Patent Claims. If you bring or threaten to bring a patent claim
84+
against any Licensor (including any claim, cross-claim or counterclaim
85+
in a lawsuit) to enforce any patents that you allege are infringed by
86+
any Work, then your rights under this license from such Licensor
87+
(including the grant in Section 2.1) will terminate immediately.
88+
89+
3.6 Trademarks. This license does not grant any rights to use any
90+
Licensor’s or its affiliates’ names, logos, or trademarks, except as
91+
necessary to reproduce the notices described in this license.
92+
93+
3.7 Termination. If you violate any term of this license, then your
94+
rights under this license (including the grant in Section 2.1) will
95+
terminate immediately.
96+
97+
4. Disclaimer of Warranty.
98+
99+
THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY
100+
KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
101+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR
102+
NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES
103+
UNDER THIS LICENSE.
104+
105+
5. Limitation of Liability.
106+
107+
EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL
108+
THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE
109+
SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT,
110+
INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF
111+
OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK
112+
(INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION,
113+
LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER
114+
DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE
115+
POSSIBILITY OF SUCH DAMAGES.
116+
117+
=======================================================================

README.md

+140-13
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
<a href="https://hanlab.mit.edu/projects/sana/"><img src="https://img.shields.io/static/v1?label=Page&message=MIT&color=darkred&logo=github-pages"></a> &ensp;
1010
<a href="https://arxiv.org/abs/2410.10629"><img src="https://img.shields.io/static/v1?label=Arxiv&message=Sana&color=red&logo=arxiv"></a> &ensp;
1111
<a href="https://nv-sana.mit.edu/"><img src="https://img.shields.io/static/v1?label=Demo&message=MIT&color=yellow"></a> &ensp;
12+
<a href="https://discord.gg/rde6eaE5Ta"><img src="https://img.shields.io/static/v1?label=Discuss&message=Discord&color=purple&logo=discord"></a> &ensp;
1213
</div>
1314

1415
<p align="center" border-raduis="10px">
@@ -34,13 +35,22 @@ As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.
3435

3536
## 🔥🔥 News
3637

37-
- Sana code is coming soon
38-
- (🔥 New) \[2024/10\] [Demo](https://nv-sana.mit.edu/) is released.
39-
- (🔥 New) \[2024/10\] [DC-AE Code](https://github.com/mit-han-lab/efficientvit/blob/master/applications/dc_ae/README.md) and [weights](https://huggingface.co/collections/mit-han-lab/dc-ae-670085b9400ad7197bb1009b) are released!
38+
- (🔥 New) \[2024/11\] Training & Inference & Metrics code are released.
39+
- \[2024/10\] [Demo](https://nv-sana.mit.edu/) is released.
40+
- \[2024/10\] [DC-AE Code](https://github.com/mit-han-lab/efficientvit/blob/master/applications/dc_ae/README.md) and [weights](https://huggingface.co/collections/mit-han-lab/dc-ae-670085b9400ad7197bb1009b) are released!
4041
- \[2024/10\] [Paper](https://arxiv.org/abs/2410.10629) is on Arxiv!
4142

4243
## Performance
4344

45+
| Methods (1024x1024) | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID 👆 | CLIP 👆 | GenEval 👆 | DPG 👆 |
46+
|------------------------------|------------------------|-------------|------------|-----------|-------------|--------------|-------------|-------------|
47+
| FLUX-dev | 0.04 | 23.0 | 12.0 | 1.0× | 10.15 | 27.47 | _0.67_ | _84.0_ |
48+
| **Sana-0.6B** | 1.7 | 0.9 | 0.6 | **39.5×** | <u>5.81</u> | 28.36 | 0.64 | 83.6 |
49+
| **Sana-1.6B** | 1.0 | 1.2 | 1.6 | **23.3×** | **5.76** | <u>28.67</u> | <u>0.66</u> | **84.8** |
50+
51+
<details>
52+
<summary><h3>Click to show all</h3></summary>
53+
4454
| Methods | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID 👆 | CLIP 👆 | GenEval 👆 | DPG 👆 |
4555
|------------------------------|------------------------|-------------|------------|-----------|-------------|--------------|-------------|-------------|
4656
| _**512 × 512 resolution**_ | | | | | | | | |
@@ -61,28 +71,149 @@ As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.
6171
| **Sana-0.6B** | 1.7 | 0.9 | 0.6 | **39.5×** | <u>5.81</u> | 28.36 | 0.64 | 83.6 |
6272
| **Sana-1.6B** | 1.0 | 1.2 | 1.6 | **23.3×** | **5.76** | <u>28.67</u> | <u>0.66</u> | **84.8** |
6373

74+
</details>
75+
6476
## Contents
6577

78+
- [Env](#-1-dependencies-and-installation)
79+
- [Demo](#-3-how-to-inference)
80+
- [Training](#-2-how-to-train)
81+
- [Testing](#-4-how-to-inference--test-metrics-fid-clip-score-geneval-dpg-bench-etc)
6682
- [TODO](#to-do-list)
6783
- [Citation](#bibtex)
6884

69-
## 💪To-Do List
85+
# 🔧 1. Dependencies and Installation
86+
87+
- Python >= 3.10.0 (Recommend to use [Anaconda](https://www.anaconda.com/download/#linux) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html))
88+
- [PyTorch >= 2.0.1+cu12.1](https://pytorch.org/)
89+
90+
```bash
91+
git clone https://github.com/NVlabs/Sana.git
92+
cd Sana
93+
94+
./environment_setup.sh sana
95+
# or you can install each components step by step following environment_setup.sh
96+
```
97+
98+
# 💻 2. How to Play with Sana (Inference)
99+
100+
## 💰Hardware requirement
101+
102+
- 9GB VRAM is required for 0.6B model and 12GB VRAM for 1.6B model. Our later quantization version will require less than 8GB for inference.
103+
- All the tests are done on A100 GPUs. Different GPU version may be different.
104+
105+
## 🔛 Quick start with [Gradio](https://www.gradio.app/guides/quickstart)
106+
107+
```bash
108+
# official online demo
109+
DEMO_PORT=15432 \
110+
pyhton app/sana_app.py \
111+
--config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
112+
--model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth
113+
```
114+
115+
```python
116+
import torch
117+
from app.sana_pipeline import SanaPipeline
118+
from torchvision.utils import save_image
119+
120+
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
121+
generator = torch.Generator(device=device).manual_seed(42)
122+
123+
sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml")
124+
sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth")
125+
prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
126+
127+
image = sana(
128+
prompt=prompt,
129+
height=1024,
130+
width=1024,
131+
guidance_scale=5.0,
132+
pag_guidance_scale=2.0,
133+
num_inference_steps=18,
134+
generator=generator,
135+
)
136+
save_image(image, 'output/sana.png', nrow=1, normalize=True, value_range=(-1, 1))
137+
```
138+
139+
## 🔛 Run inference with TXT or JSON files
140+
141+
```bash
142+
# Run samples in a txt file
143+
python scripts/inference.py \
144+
--config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
145+
--model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth
146+
--txt_file=asset/samples_mini.txt
147+
148+
# Run samples in a json file
149+
python scripts/inference.py \
150+
--config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
151+
--model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth
152+
--json_file=asset/samples_mini.json
153+
```
154+
155+
where each line of [`asset/samples_mini.txt`](asset/samples_mini.txt) contains a prompt to generate
156+
157+
# 🔥 3. How to Train Sana
158+
159+
## 💰Hardware requirement
160+
161+
- 32GB VRAM is required for both 0.6B and 1.6B model's training
162+
163+
We provide a training example here and you can also select your desired config file from [config files dir](configs/sana_config) based on your data structure.
164+
165+
To launch Sana training, you will first need to prepare data in the following formats
166+
167+
```bash
168+
asset/example_data
169+
├── AAA.txt
170+
├── AAA.png
171+
├── BCC.txt
172+
├── BCC.png
173+
├── ......
174+
├── CCC.txt
175+
└── CCC.png
176+
```
177+
178+
Then Sana's training can be launched via
179+
180+
```bash
181+
# Example of training Sana 0.6B with 512x512 resolution
182+
bash train_scripts/train.sh \
183+
configs/sana_config/512ms/Sana_600M_img512.yaml \
184+
--data.data_dir="[asset/example_data]" \
185+
--data.type=SanaImgDataset \
186+
--model.multi_scale=false \
187+
--train.train_batch_size=32
188+
189+
# Example of training Sana 1.6B with 1024x1024 resolution
190+
bash train_scripts/train.sh \
191+
configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
192+
--data.data_dir="[asset/example_data]" \
193+
--data.type=SanaImgDataset \
194+
--model.multi_scale=false \
195+
--train.train_batch_size=8
196+
```
197+
198+
# 💻 4. Metric toolkit
199+
200+
Refer to [Toolkit Manual](asset/docs/metrics_toolkit.md).
201+
202+
# 💪To-Do List
70203

71204
We will try our best to release
72205

73-
- \[ \] Training code
74-
- \[ \] Inference code
206+
- \[x\] Training code
207+
- \[x\] Inference code
75208
- \[ \] Model zoo
76209
- \[ \] Diffusers
77210
- \[ \] ComfyUI
211+
- \[ \] Laptop development
78212

79213
# 🤗Acknowledgements
80214

81215
- Thanks to [PixArt-α](https://github.com/PixArt-alpha/PixArt-alpha), [PixArt-Σ](https://github.com/PixArt-alpha/PixArt-sigma) and [Efficient-ViT](https://github.com/mit-han-lab/efficientvit) for their wonderful work and codebase!
82216

83-
[//]: # (- Thanks to [Diffusers]&#40;https://github.com/huggingface/diffusers&#41; for their wonderful technical support and awesome collaboration!)
84-
[//]: # (- Thanks to [Hugging Face]&#40;https://github.com/huggingface&#41; for sponsoring the nicely demo!)
85-
86217
# 📖BibTeX
87218

88219
```
@@ -96,7 +227,3 @@ We will try our best to release
96227
url={https://arxiv.org/abs/2410.10629},
97228
}
98229
```
99-
100-
[//]: # (## Star History)
101-
102-
[//]: # ([![Star History Chart]&#40;https://api.star-history.com/svg?repos=NVlabs/Sana&type=Date&#41;]&#40;https://star-history.com/#NVlabs/sana&Date&#41;)

0 commit comments

Comments
 (0)