You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-13Lines changed: 22 additions & 13 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
# Octo
2
-
[](https://colab.research.google.com/drive/1z0vELj_lX9OWeoMG_WvXnQs43aPOEAhz?usp=sharing)
2
+
[](https://githubtocolab.com/octo-models/octo/blob/main/examples/01_inference_pretrained.ipynb)
We offer three finetuning modes depending on the parts of the model that are kept frozen: ```head_only```, ```head_mlp_only```, and ```full``` to finetune the full model.
@@ -114,9 +114,9 @@ Loading and running a trained Octo model is as easy as:
114
114
```python
115
115
from octo.model import OctoModel
116
116
117
-
model = OctoModel.load_pretrained("hf://rail-berkeley/octo-small")
117
+
model = OctoModel.load_pretrained("hf://rail-berkeley/octo-small-1.5")
118
118
task = model.create_tasks(texts=["pick up the spoon"])
#### What is the `pad_mask` in the observation dictionary?
144
-
The `pad_mask` indicates which observations should be attended to, which is important when using multiple timesteps of observation history. Octo was trained with a history window size of 2, meaning the model can predict an action using both the current observation and the previous observation. However, at the very beginning of the trajectory, there is no previous observation, so we need to set `pad_mask=False` at the corresponding index. If you use Octo with a window size of 1, pad_mask should always just be `[True]`, indicating that the one and only observation in the window should be attended to. Note that if you wrap your robot environment with the `HistoryWrapper` (see [gym_wrappers.py](octo/utils/gym_wrappers.py)), the `pad_mask` key will be added to the observation dictionary for you.
143
+
#### What is the `timestep_pad_mask` in the observation dictionary?
144
+
The `timestep_pad_mask` indicates which observations should be attended to, which is important when using multiple timesteps of observation history. Octo was trained with a history window size of 2, meaning the model can predict an action using both the current observation and the previous observation. However, at the very beginning of the trajectory, there is no previous observation, so we need to set `timestep_pad_mask=False` at the corresponding index. If you use Octo with a window size of 1, `timestep_pad_mask` should always just be `[True]`, indicating that the one and only observation in the window should be attended to. Note that if you wrap your robot environment with the `HistoryWrapper` (see [gym_wrappers.py](octo/utils/gym_wrappers.py)), the `timestep_pad_mask` key will be added to the observation dictionary for you.
145
145
#### What is `pad_mask_dict` in the observation dictionary?
146
-
While `pad_mask` indicates which observations should be attended to on a timestep level, `pad_mask_dict` indicates which elements of the observation should be attended to within a single timestep. For example, for datasets without language labels, `pad_mask_dict["language_instruction"]` is set to `False`. For datasets without a wrist camera, `pad_mask_dict["image_wrist"]` is set to `False`. For convenience, if a key is missing from the observation dict, it is equivalent to setting `pad_mask_dict` to `False` for that key.
146
+
While `timestep_pad_mask` indicates which observations should be attended to on a timestep level, `pad_mask_dict` indicates which elements of the observation should be attended to within a single timestep. For example, for datasets without language labels, `pad_mask_dict["language_instruction"]` is set to `False`. For datasets without a wrist camera, `pad_mask_dict["image_wrist"]` is set to `False`. For convenience, if a key is missing from the observation dict, it is equivalent to setting `pad_mask_dict` to `False` for that key.
147
147
#### Does `model.sample_actions([...])` return the full trajectory to solve a task?
148
148
Octo was pretrained with an action chunking size of 4, meaning it predicts the next 4 actions at once. You can choose to execute all these actions before sampling new ones, or only execute the first action before sampling new ones (also known as receding horizon control). You can also do something more advanced like [temporal ensembling](octo/utils/gym_wrappers.py).
149
149
150
+
## Updates for Version 1.5
151
+
- Improved cross-attention between visual and language tokens by repeating language tokens at every timestep in the context window.
152
+
- Augmented the language instructions in the data with rephrasings from GPT-3.5.
153
+
- Bug fixes:
154
+
- Turned off dropout in the diffusion head due to incompatibility with layer norm.
155
+
- Fixed an off-by-one error with the attention mask.
156
+
- Fixed an issue where different image augmentations did not get fresh random seeds.
157
+
150
158
## Citation
151
159
152
160
```
153
-
@misc{octo_2023,
161
+
@inproceedings{octo_2023,
154
162
title={Octo: An Open-Source Generalist Robot Policy},
155
-
author = {{Octo Model Team} and Dibya Ghosh and Homer Walke and Karl Pertsch and Kevin Black and Oier Mees and Sudeep Dasari and Joey Hejna and Charles Xu and Jianlan Luo and Tobias Kreiman and {You Liang} Tan and Dorsa Sadigh and Chelsea Finn and Sergey Levine},
author = {{Octo Model Team} and Dibya Ghosh and Homer Walke and Karl Pertsch and Kevin Black and Oier Mees and Sudeep Dasari and Joey Hejna and Charles Xu and Jianlan Luo and Tobias Kreiman and {You Liang} Tan and Pannag Sanketi and Quan Vuong and Ted Xiao and Dorsa Sadigh and Chelsea Finn and Sergey Levine},
164
+
booktitle = {Proceedings of Robotics: Science and Systems},
0 commit comments