Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
yuyanze 334de6566b | 1 year ago | |
---|---|---|
gifs | 1 year ago | |
segm | 1 year ago | |
.gitignore | 1 year ago | |
LICENSE | 1 year ago | |
README.md | 1 year ago | |
attn_maps_dec.png | 1 year ago | |
attn_maps_enc.png | 1 year ago | |
overview.png | 1 year ago | |
requirements.txt | 1 year ago | |
setup.py | 1 year ago |
Segmenter: Transformer for Semantic Segmentation
by Robin Strudel*, Ricardo Garcia*, Ivan Laptev and Cordelia Schmid, ICCV 2021.
*Equal Contribution
🔥 Segmenter is now available on MMSegmentation.
Define os environment variables pointing to your checkpoint and dataset directory, put in your .bashrc
:
export DATASET=/path/to/dataset/dir
Install PyTorch 1.9 then pip install .
at the root of this repository.
To download ADE20K, use the following command:
python -m segm.scripts.prepare_ade20k $DATASET
We release models with a Vision Transformer backbone initialized from the improved ViT models.
Segmenter models with ViT backbone:
Name | mIoU (SS/MS) | # params | Resolution | FPS | Download | ||
---|---|---|---|---|---|---|---|
Seg-T-Mask/16 | 38.1 / 38.8 | 7M | 512x512 | 52.4 | model | config | log |
Seg-S-Mask/16 | 45.3 / 46.9 | 27M | 512x512 | 34.8 | model | config | log |
Seg-B-Mask/16 | 48.5 / 50.0 | 106M | 512x512 | 24.1 | model | config | log |
Seg-B/8 | 49.5 / 50.5 | 89M | 512x512 | 4.2 | model | config | log |
Seg-L-Mask/16 | 51.8 / 53.6 | 334M | 640x640 | - | model | config | log |
Segmenter models with DeiT backbone:
Name | mIoU (SS/MS) | # params | Resolution | FPS | Download | ||
---|---|---|---|---|---|---|---|
Seg-B†/16 | 47.1 / 48.1 | 87M | 512x512 | 27.3 | model | config | log |
Seg-B†-Mask/16 | 48.7 / 50.1 | 106M | 512x512 | 24.1 | model | config | log |
Name | mIoU (SS/MS) | # params | Resolution | FPS | Download | ||
---|---|---|---|---|---|---|---|
Seg-L-Mask/16 | 58.1 / 59.0 | 334M | 480x480 | - | model | config | log |
Name | mIoU (SS/MS) | # params | Resolution | FPS | Download | ||
---|---|---|---|---|---|---|---|
Seg-L-Mask/16 | 79.1 / 81.3 | 322M | 768x768 | - | model | config | log |
Download one checkpoint with its configuration in a common folder, for example seg_tiny_mask
.
You can generate segmentation maps from your own data with:
python -m segm.inference --model-path seg_tiny_mask/checkpoint.pth -i images/ -o segmaps/
To evaluate on ADE20K, run the command:
# single-scale evaluation:
python -m segm.eval.miou seg_tiny_mask/checkpoint.pth ade20k --singlescale
# multi-scale evaluation:
python -m segm.eval.miou seg_tiny_mask/checkpoint.pth ade20k --multiscale
Train Seg-T-Mask/16
on ADE20K on a single GPU:
python -m segm.train --log-dir seg_tiny_mask --dataset ade20k \
--backbone vit_tiny_patch16_384 --decoder mask_transformer
To train Seg-B-Mask/16
, simply set vit_base_patch16_384
as backbone and launch the above command using a minimum of 4 V100 GPUs (~12 minutes per epoch) and up to 8 V100 GPUs (~7 minutes per epoch). The code uses SLURM environment variables.
To plot the logs of your experiments, you can use
python -m segm.utils.logs logs.yml
with logs.yml
located in utils/
with the path to your experiments logs:
root: /path/to/checkpoints/
logs:
seg-t: seg_tiny_mask/log.txt
seg-b: seg_base_mask/log.txt
To visualize the attention maps for Seg-T-Mask/16
encoder layer 0 and patch (0, 21)
, you can use:
python -m segm.scripts.show_attn_map seg_tiny_mask/checkpoint.pth \
images/im0.jpg output_dir/ --layer-id 0 --x-patch 0 --y-patch 21 --enc
Different options are provided to select the generated attention maps:
--enc
or --dec
: Select encoder or decoder attention maps respectively.--patch
or --cls
: --patch
generates attention maps for the patch with coordinates (x_patch, y_patch)
. --cls
combined with --enc
generates attention maps for the CLS token of the encoder. --cls
combined with --dec
generates maps for each class embedding of the decoder.--x-patch
and --y-patch
: Coordinates of the patch to draw attention maps from. This flag is ignored when --cls
is used.--layer-id
: Select the layer for which the attention maps are generated.For example, to generate attention maps for the decoder class embeddings, you can use:
python -m segm.scripts.show_attn_map seg_tiny_mask/checkpoint.pth \
images/im0.jpg output_dir/ --layer-id 0 --dec --cls
Attention maps for patch (0, 21)
in Seg-L-Mask/16
encoder layers 1, 4, 8, 12 and 16:
Attention maps for the class embeddings in Seg-L-Mask/16
decoder layer 0:
Zero shot video segmentation on DAVIS video dataset with Seg-B-Mask/16 model trained on ADE20K.
@article{strudel2021,
title={Segmenter: Transformer for Semantic Segmentation},
author={Strudel, Robin and Garcia, Ricardo and Laptev, Ivan and Schmid, Cordelia},
journal={arXiv preprint arXiv:2105.05633},
year={2021}
}
The Vision Transformer code is based on timm library and the semantic segmentation training and evaluation pipeline
is using mmsegmentation.
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》