This file documents a large collection of baselines trained
with detectron2 in Sep-Oct, 2019.
All numbers were obtained on Big Basin
servers with 8 NVIDIA V100 GPUs & NVLink. The software in use were PyTorch 1.3, CUDA 9.2, cuDNN 7.4.2 or 7.6.3.
You can access these models from code using detectron2.model_zoo APIs.
In addition to these official baseline models, you can find more models in projects/.
tools/train_net.py
with this config filemetrics
file.tools/train_net.py --eval-only
, or inference_on_dataset(),metrics
for each model.All COCO models were trained on train2017
and evaluated on val2017
.
The default settings are not directly comparable with Detectron's standard settings.
For example, our default training data augmentation uses scale jittering in addition to horizontal flipping.
To make fair comparisons with Detectron's settings, see
Detectron1-Comparisons for accuracy comparison,
and benchmarks
for speed comparison.
For Faster/Mask R-CNN, we provide baselines based on 3 different backbone combinations:
Most models are trained with the 3x schedule (~37 COCO epochs).
Although 1x models are heavily under-trained, we provide some ResNet-50 models with the 1x (~12 COCO epochs)
training schedule for comparison when doing quick research iteration.
We provide backbone models pretrained on ImageNet-1k dataset.
These models have different format from those provided in Detectron: we do not fuse BatchNorm into an affine layer.
Pretrained models in Detectron's format can still be used. For example:
Torchvision's ResNet models can be used after converted by this script.
All models available for download through this document are licensed under the
Creative Commons Attribution-ShareAlike 3.0 license.
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
model id | download |
---|---|---|---|---|---|---|---|
R50-C4 | 1x | 0.551 | 0.102 | 4.8 | 35.7 | 137257644 | model | metrics |
R50-DC5 | 1x | 0.380 | 0.068 | 5.0 | 37.3 | 137847829 | model | metrics |
R50-FPN | 1x | 0.210 | 0.038 | 3.0 | 37.9 | 137257794 | model | metrics |
R50-C4 | 3x | 0.543 | 0.104 | 4.8 | 38.4 | 137849393 | model | metrics |
R50-DC5 | 3x | 0.378 | 0.070 | 5.0 | 39.0 | 137849425 | model | metrics |
R50-FPN | 3x | 0.209 | 0.038 | 3.0 | 40.2 | 137849458 | model | metrics |
R101-C4 | 3x | 0.619 | 0.139 | 5.9 | 41.1 | 138204752 | model | metrics |
R101-DC5 | 3x | 0.452 | 0.086 | 6.1 | 40.6 | 138204841 | model | metrics |
R101-FPN | 3x | 0.286 | 0.051 | 4.1 | 42.0 | 137851257 | model | metrics |
X101-FPN | 3x | 0.638 | 0.098 | 6.7 | 43.0 | 139173657 | model | metrics |
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
model id | download |
---|---|---|---|---|---|---|---|
R50 | 1x | 0.205 | 0.056 | 4.1 | 37.4 | 190397773 | model | metrics |
R50 | 3x | 0.205 | 0.056 | 4.1 | 38.7 | 190397829 | model | metrics |
R101 | 3x | 0.291 | 0.069 | 5.2 | 40.4 | 190397697 | model | metrics |
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
prop. AR |
model id | download |
---|---|---|---|---|---|---|---|---|
RPN R50-C4 | 1x | 0.130 | 0.034 | 1.5 | 51.6 | 137258005 | model | metrics | |
RPN R50-FPN | 1x | 0.186 | 0.032 | 2.7 | 58.0 | 137258492 | model | metrics | |
Fast R-CNN R50-FPN | 1x | 0.140 | 0.029 | 2.6 | 37.8 | 137635226 | model | metrics |
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
mask AP |
model id | download |
---|---|---|---|---|---|---|---|---|
R50-C4 | 1x | 0.584 | 0.110 | 5.2 | 36.8 | 32.2 | 137259246 | model | metrics |
R50-DC5 | 1x | 0.471 | 0.076 | 6.5 | 38.3 | 34.2 | 137260150 | model | metrics |
R50-FPN | 1x | 0.261 | 0.043 | 3.4 | 38.6 | 35.2 | 137260431 | model | metrics |
R50-C4 | 3x | 0.575 | 0.111 | 5.2 | 39.8 | 34.4 | 137849525 | model | metrics |
R50-DC5 | 3x | 0.470 | 0.076 | 6.5 | 40.0 | 35.9 | 137849551 | model | metrics |
R50-FPN | 3x | 0.261 | 0.043 | 3.4 | 41.0 | 37.2 | 137849600 | model | metrics |
R101-C4 | 3x | 0.652 | 0.145 | 6.3 | 42.6 | 36.7 | 138363239 | model | metrics |
R101-DC5 | 3x | 0.545 | 0.092 | 7.6 | 41.9 | 37.3 | 138363294 | model | metrics |
R101-FPN | 3x | 0.340 | 0.056 | 4.6 | 42.9 | 38.6 | 138205316 | model | metrics |
X101-FPN | 3x | 0.690 | 0.103 | 7.2 | 44.3 | 39.5 | 139653917 | model | metrics |
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
kp. AP |
model id | download |
---|---|---|---|---|---|---|---|---|
R50-FPN | 1x | 0.315 | 0.072 | 5.0 | 53.6 | 64.0 | 137261548 | model | metrics |
R50-FPN | 3x | 0.316 | 0.066 | 5.0 | 55.4 | 65.5 | 137849621 | model | metrics |
R101-FPN | 3x | 0.390 | 0.076 | 6.1 | 56.4 | 66.1 | 138363331 | model | metrics |
X101-FPN | 3x | 0.738 | 0.121 | 8.7 | 57.3 | 66.0 | 139686956 | model | metrics |
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
mask AP |
PQ | model id | download |
---|---|---|---|---|---|---|---|---|---|
R50-FPN | 1x | 0.304 | 0.053 | 4.8 | 37.6 | 34.7 | 39.4 | 139514544 | model | metrics |
R50-FPN | 3x | 0.302 | 0.053 | 4.8 | 40.0 | 36.5 | 41.5 | 139514569 | model | metrics |
R101-FPN | 3x | 0.392 | 0.066 | 6.0 | 42.4 | 38.5 | 43.0 | 139514519 | model | metrics |
Mask R-CNN baselines on the LVIS dataset, v0.5.
These baselines are described in Table 3(c) of the LVIS paper.
NOTE: the 1x schedule here has the same amount of iterations as the COCO 1x baselines.
They are roughly 24 epochs of LVISv0.5 data.
The final results of these configs have large variance across different runs.
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
mask AP |
model id | download |
---|---|---|---|---|---|---|---|---|
R50-FPN | 1x | 0.292 | 0.107 | 7.1 | 23.6 | 24.4 | 144219072 | model | metrics |
R101-FPN | 1x | 0.371 | 0.114 | 7.8 | 25.6 | 25.9 | 144219035 | model | metrics |
X101-FPN | 1x | 0.712 | 0.151 | 10.2 | 26.7 | 27.1 | 144219108 | model | metrics |
Simple baselines for
Name | train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
box AP50 |
mask AP |
model id | download |
---|---|---|---|---|---|---|---|---|
R50-FPN, Cityscapes | 0.240 | 0.078 | 4.4 | 36.5 | 142423278 | model | metrics | ||
R50-C4, VOC | 0.537 | 0.081 | 4.8 | 51.9 | 80.3 | 142202221 | model | metrics |
Ablations for Deformable Conv and Cascade R-CNN:
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
mask AP |
model id | download |
---|---|---|---|---|---|---|---|---|
Baseline R50-FPN | 1x | 0.261 | 0.043 | 3.4 | 38.6 | 35.2 | 137260431 | model | metrics |
Deformable Conv | 1x | 0.342 | 0.048 | 3.5 | 41.5 | 37.5 | 138602867 | model | metrics |
Cascade R-CNN | 1x | 0.317 | 0.052 | 4.0 | 42.1 | 36.4 | 138602847 | model | metrics |
Baseline R50-FPN | 3x | 0.261 | 0.043 | 3.4 | 41.0 | 37.2 | 137849600 | model | metrics |
Deformable Conv | 3x | 0.349 | 0.047 | 3.5 | 42.7 | 38.5 | 144998336 | model | metrics |
Cascade R-CNN | 3x | 0.328 | 0.053 | 4.0 | 44.3 | 38.5 | 144998488 | model | metrics |
Ablations for normalization methods, and a few models trained from scratch following Rethinking ImageNet Pre-training.
(Note: The baseline uses 2fc
head while the others use 4conv1fc
head)
Name | lr sched |
train time (s/iter) |
inference time (s/im) |
train mem (GB) |
box AP |
mask AP |
model id | download |
---|---|---|---|---|---|---|---|---|
Baseline R50-FPN | 3x | 0.261 | 0.043 | 3.4 | 41.0 | 37.2 | 137849600 | model | metrics |
GN | 3x | 0.309 | 0.060 | 5.6 | 42.6 | 38.6 | 138602888 | model | metrics |
SyncBN | 3x | 0.345 | 0.053 | 5.5 | 41.9 | 37.8 | 169527823 | model | metrics |
GN (from scratch) | 3x | 0.338 | 0.061 | 7.2 | 39.9 | 36.6 | 138602908 | model | metrics |
GN (from scratch) | 9x | N/A | 0.061 | 7.2 | 43.7 | 39.6 | 183808979 | model | metrics |
SyncBN (from scratch) | 9x | N/A | 0.055 | 7.2 | 43.6 | 39.3 | 184226666 | model | metrics |
A few very large models trained for a long time, for demo purposes. They are trained using multiple machines:
Name | inference time (s/im) |
train mem (GB) |
box AP |
mask AP |
PQ | model id | download |
---|---|---|---|---|---|---|---|
Panoptic FPN R101 | 0.098 | 11.4 | 47.4 | 41.3 | 46.1 | 139797668 | model | metrics |
Mask R-CNN X152 | 0.234 | 15.1 | 50.2 | 44.0 | 18131413 | model | metrics | |
above + test-time aug. | 51.9 | 45.9 |
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》