You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[2022/5/28] Code for DN-DETR is available here! [2022/5/22] We release a notebook for visualizion in inference_and_visualize.ipynb. [2022/4/14] We release the .pptx file of our DETR-like models comparison figure for those who want to draw model arch figures in paper. [2022/4/12] We fix a bug in the file datasets/coco_eval.py. The parameter useCats of CocoEvaluator should be True by default. [2022/4/9] Our code is available! [2022/3/9] We build a repo Awesome Detection Transformer to present papers about transformer for detection and segmenttion. Welcome to your attention! [2022/3/8] Our new work DINO set a new record of 63.3AP on the MS-COCO leader board. [2022/3/8] Our new work DN-DETR has been accpted by CVPR 2022! [2022/1/21] Our work has been accepted to ICLR 2022.
Abstract
We present in this paper a novel query formulation using dynamic anchor boxes for DETR (DEtection TRansformer) and offer a deeper understanding of the role of queries in DETR. This new formulation directly uses box coordinates as queries in Transformer decoders and dynamically updates them layer-by-layer. Using box coordinates not only helps using explicit positional priors to improve the query-to-feature similarity and eliminate the slow training convergence issue in DETR, but also allows us to modulate the positional attention map using the box width and height information. Such a design makes it clear that queries in DETR can be implemented as performing soft ROI pooling layer-by-layer in a cascade manner. As a result, it leads to the best performance on MS-COCO benchmark among the DETR-like detection models under the same setting, e.g., AP 45.7% using ResNet50-DC5 as backbone trained in 50 epochs. We also conducted extensive experiments to confirm our analysis and verify the effectiveness of our methods.
Model
Model Zoo
We provide our models with R50 backbone, including both DAB-DETR and DAB-Deformable-DETR (See Appendix C of our paper for more details).
1: The models with marks (3 pat) are trained with multiple pattern embeds (refer to Anchor DETR or our paper for more details.).
2: The term "fixxy" means we use random initialization of anchors and do not update their parameters during training (See Appendix H of our paper for more details).
3: The DAB-Deformbale-DETR(Deformbale Encoder Only) is a multiscale version of our DAB-DETR. See DN-DETR for more details.
4: The result here is better than the number in our paper, as we use different losses coefficients during training. Refer to our config file for more details.
Usage
Installation
We use the great DETR project as our codebase, hence no extra dependency is needed for our DAB-DETR. For the DAB-Deformable-DETR, you need to compile the deformable attention operator manually.
We test our models under python=3.7.3,pytorch=1.9.0,cuda=11.1. Other versions might be available as well.
Clone this repo
git clone https://github.com/IDEA-opensource/DAB-DETR.git
cd DAB-DETR
The final AP should be similar to ours. (42.2 for DAB-DETR and 48.7 for DAB-Deformable-DETR). Our configs and logs(see the model_zoo) could be used as references as well.
Notes:
The results are sensitive to the batch size. We use 16(2 images each GPU x 8 GPUs) by default.
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. Hao Zhang*, Feng Li*, Shilong Liu*, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum arxiv 2022. [paper][code]
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. Feng Li*, Hao Zhang*, Shilong Liu, Jian Guo, Lionel M. Ni, Lei Zhang. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022. [paper][code]
License
DAB-DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.
Copyright (c) IDEA. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use these files except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Citation
@inproceedings{
liu2022dabdetr,
title={{DAB}-{DETR}: Dynamic Anchor Boxes are Better Queries for {DETR}},
author={Shilong Liu and Feng Li and Hao Zhang and Xiao Yang and Xianbiao Qi and Hang Su and Jun Zhu and Lei Zhang},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=oMI9PjOb9Jl}
}
0 commit comments