Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
hkc 7c3194a8a5 | 1 year ago | |
---|---|---|
.dev_scripts | 2 years ago | |
configs | 1 year ago | |
demo | 2 years ago | |
docker | 2 years ago | |
docs | 2 years ago | |
docs_zh-CN | 2 years ago | |
logs | 2 years ago | |
mmcv_custom | 2 years ago | |
mmdet | 1 year ago | |
requirements | 2 years ago | |
resources | 2 years ago | |
tests | 2 years ago | |
tools | 2 years ago | |
.gitignore | 2 years ago | |
.pre-commit-config.yaml | 2 years ago | |
.readthedocs.yml | 2 years ago | |
CITATION.cff | 2 years ago | |
LICENSE | 2 years ago | |
MANIFEST.in | 2 years ago | |
README.md | 2 years ago | |
model-index.yml | 2 years ago | |
pytest.ini | 2 years ago | |
requirements.txt | 2 years ago | |
setup.cfg | 2 years ago | |
setup.py | 2 years ago |
Results | Updates | Usage | Todo | Acknowledge
This branch contains the unofficial pytorch implementation of Exploring Plain Vision Transformer Backbones for Object Detection. Thanks for their wonderful work!
The models are trained on 4 A100 machines with 2 images per gpu, which makes a batch size of 64 during training.
Model | Pretrain | Machine | FrameWork | Box mAP | Mask mAP | config | log | weight |
---|---|---|---|---|---|---|---|---|
ViT-Base | IN1K+MAE | TPU | Mask RCNN | 51.1 | 45.5 | config | log | OneDrive |
ViT-Base | IN1K+MAE | GPU | Mask RCNN | 51.1 | 45.4 | config | log | OneDrive |
ViTAE-Base | IN1K+MAE | GPU | Mask RCNN | 51.6 | 45.8 | config | log | OneDrive |
ViTAE-Small | IN1K+Sup | GPU | Mask RCNN | 45.6 | 40.1 | config | log | OneDrive |
[2022-04-18] Explore using small 1K supervised trained models (20M parameters) for ViTDet (45.6 mAP). The results with multi-stage structure is 46.0 mAP for Swin-T and 47.8 mAP for ViTAEv2-S with Mask RCNN on COCO.
[2022-04-17] Release the pretrained weights and logs for ViT-B and ViTAE-B on MS COCO. The models are totally trained with PyTorch on GPU.
[2022-04-16] Release the initial unofficial implementation of ViTDet with ViT-Base model! It obtains 51.1 mAP and 45.5 mAP on detection and segmentation, respectively. The weights and logs will be uploaded soon.
Applications of ViTAE Transformer include: image classification | object detection | semantic segmentation | animal pose segmentation | remote sensing | matting
We use PyTorch 1.9.0 or NGC docker 21.06, and mmcv 1.3.9 for the experiments.
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout v1.3.9
MMCV_WITH_OPS=1 pip install -e .
cd ..
git clone https://github.com/ViTAE-Transformer/ViTDet.git
cd ViTDet
pip install -v -e .
After install the two repos, install timm and einops, i.e.,
pip install timm==0.4.9 einops
Download the pretrained models from MAE or ViTAE, and then conduct the experiments by
# for single machine
bash tools/dist_train.sh <Config PATH> <NUM GPUs> --cfg-options model.pretrained=<Pretrained PATH>
# for multiple machines
python -m torch.distributed.launch --nnodes <Num Machines> --node_rank <Rank of Machine> --nproc_per_node <GPUs Per Machine> --master_addr <Master Addr> --master_port <Master Port> tools/train.py <Config PATH> --cfg-options model.pretrained=<Pretrained PATH> --launcher pytorch
This repo current contains modifications including:
There are other things to do:
We acknowledge the excellent implementation from mmdetection, MAE, MViT, and BeiT.
@article{Li2022ExploringPV,
title={Exploring Plain Vision Transformer Backbones for Object Detection},
author={Yanghao Li and Hanzi Mao and Ross B. Girshick and Kaiming He},
journal={ArXiv},
year={2022},
volume={abs/2203.16527}
}
For ViTAE and ViTAEv2, please refer to:
@article{xu2021vitae,
title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}
@article{zhang2022vitaev2,
title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
journal={arXiv preprint arXiv:2202.10108},
year={2022}
}
No Description
Python Markdown Shell other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》