Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
guobowen 8469554255 | 2 years ago | |
---|---|---|
kd_losses | 2 years ago | |
prune | 2 years ago | |
README.md | 2 years ago | |
dataset.py | 2 years ago | |
example_train_script.sh | 2 years ago | |
network.py | 2 years ago | |
train_base.py | 2 years ago | |
train_bss.py | 2 years ago | |
train_crd.py | 2 years ago | |
train_dml.py | 2 years ago | |
train_ft.py | 2 years ago | |
train_kd.py | 2 years ago | |
utils.py | 2 years ago |
Pytorch implementation of various Knowledge Distillation (KD) methods.
This repository is a simple reference, mainly focuses on basic knowledge distillation/transfer methods. Thus many tricks and variations, such as step-by-step training, iterative training, ensemble of teachers, ensemble of KD methods, data-free, self-distillation, online distillation etc. are not considered. Hope it is useful for your project or research.
I will update this repo regularly with new KD methods. If there some basic methods I missed, please contact with me.
Name | Method | Paper Link | Code Link |
---|---|---|---|
Baseline | basic model with softmax loss | — | code |
Logits | mimic learning via regressing logits | paper | code |
ST | soft target | paper | code |
AT | attention transfer | paper | code |
Fitnet | hints for thin deep nets | paper | code |
NST | neural selective transfer | paper | code |
PKT | probabilistic knowledge transfer | paper | code |
FSP | flow of solution procedure | paper | code |
FT | factor transfer | paper | code |
RKD | relational knowledge distillation | paper | code |
AB | activation boundary | paper | code |
SP | similarity preservation | paper | code |
Sobolev | sobolev/jacobian matching | paper | code |
BSS | boundary supporting samples | paper | code |
CC | correlation congruence | paper | code |
LwM | learning without memorizing | paper | code |
IRG | instance relationship graph | paper | code |
VID | variational information distillation | paper | code |
OFD | overhaul of feature distillation | paper | code |
AFD | attention feature distillation | paper | code |
CRD | contrastive representation distillation | paper | code |
DML | deep mutual learning | paper | code |
AT
: I use the sum of absolute values with power p=2 as the attention.Fitnet
: The training procedure is one stage without hint layer.NST
: I employ polynomial kernel with d=2 and c=0.AB
: Two-stage training, the first 50 epochs for initialization, the second stage only employs CE without ST.BSS
: 80% epochs employ CE+BSS loss, the rest 20% only uses CE. In addition, warmup for the first 10 epochs.CC
: For consistency, I only consider CC without instance congruence. Gaussian RBF kernel is employed because Bilinear Pool kernel is similar with PKT. I choose P=2 order Taylor of Gaussian RBF kernel. No special sampling strategy.LwM
: I employ it after rb2 (middle conv layer) but not rb3 (last conv layer), because the base net is resnet with the end of GAP followed by a classifier. If after rb3, the grad-CAN has the same values across H and W in each channel.IRG
: I only use one-to-one mode.VID
: I set the hidden channel size to be same with the output channel size and remove BN in μ.AFD
: I find the original implementation of attention is unstable, thus replace it with a SE block.DML
: Just two nets are employed. Synchronous update to avoid multiple forwards.The networks are same with Tabel 6 in paper.
./dataset
directory and downloading CIFAR10/CIFAR100 in it.example_train_script.sh
to train various KD methods. You can simply specify the hyper-parameters listed in train_xxx.py
or manually change them.--lambda_kd
and other hyper-parameters are not chosen carefully. Thus the following results do not reflect which method is better than the others.Teacher | Student | Name | CIFAR10 | CIFAR100 |
- | resnet-20 | Baseline | 92.37% | 68.92% |
resnet-20 | resnet-20 | Logits | 93.30% | 70.36% |
resnet-20 | resnet-20 | ST | 93.12% | 70.27% |
resnet-20 | resnet-20 | AT | 92.89% | 69.70% |
resnet-20 | resnet-20 | Fitnet | 92.73% | 70.08% |
resnet-20 | resnet-20 | NST | 92.79% | 69.21% |
resnet-20 | resnet-20 | PKT | 92.50% | 69.25% |
resnet-20 | resnet-20 | FSP | 92.76% | 69.61% |
resnet-20 | resnet-20 | FT | 92.98% | 69.90% |
resnet-20 | resnet-20 | RKD | 92.72% | 69.48% |
resnet-20 | resnet-20 | AB | 93.04% | 69.96% |
resnet-20 | resnet-20 | SP | 92.88% | 69.85% |
resnet-20 | resnet-20 | Sobolev | 92.78% | 69.39% |
resnet-20 | resnet-20 | BSS | 92.58% | 69.96% |
resnet-20 | resnet-20 | CC | 93.01% | 69.27% |
resnet-20 | resnet-20 | LwM | 92.80% | 69.23% |
resnet-20 | resnet-20 | IRG | 92.77% | 70.37% |
resnet-20 | resnet-20 | VID | 92.61% | 69.39% |
resnet-20 | resnet-20 | OFD | 92.82% | 69.93% |
resnet-20 | resnet-20 | AFD | 92.56% | 69.63% |
resnet-20 | resnet-20 | CRD | 92.96% | 70.33% |
Teacher | Student | Name | CIFAR10 | CIFAR100 |
- | resnet-20 | Baseline | 92.37% | 68.92% |
- | resnet-110 | Baseline | 93.86% | 73.15% |
resnet-110 | resnet-20 | Logits | 92.98% | 69.78% |
resnet-110 | resnet-20 | ST | 92.82% | 70.06% |
resnet-110 | resnet-20 | AT | 93.21% | 69.28% |
resnet-110 | resnet-20 | Fitnet | 93.04% | 69.81% |
resnet-110 | resnet-20 | NST | 92.83% | 69.31% |
resnet-110 | resnet-20 | PKT | 93.01% | 69.31% |
resnet-110 | resnet-20 | FSP | 92.78% | 69.78% |
resnet-110 | resnet-20 | FT | 93.01% | 69.49% |
resnet-110 | resnet-20 | RKD | 93.21% | 69.36% |
resnet-110 | resnet-20 | AB | 92.96% | 69.41% |
resnet-110 | resnet-20 | SP | 93.30% | 69.45% |
resnet-110 | resnet-20 | Sobolev | 92.60% | 69.23% |
resnet-110 | resnet-20 | BSS | 92.78% | 69.71% |
resnet-110 | resnet-20 | CC | 92.98% | 69.33% |
resnet-110 | resnet-20 | LwM | 92.52% | 69.11% |
resnet-110 | resnet-20 | IRG | 93.13% | 69.36% |
resnet-110 | resnet-20 | VID | 92.98% | 69.49% |
resnet-110 | resnet-20 | OFD | 93.13% | 69.81% |
resnet-110 | resnet-20 | AFD | 92.92% | 69.60% |
resnet-110 | resnet-20 | CRD | 92.92% | 70.80% |
Teacher | Student | Name | CIFAR10 | CIFAR100 |
- | resnet-110 | Baseline | 93.86% | 73.15% |
resnet-110 | resnet-110 | Logits | 94.38% | 74.89% |
resnet-110 | resnet-110 | ST | 94.59% | 74.33% |
resnet-110 | resnet-110 | AT | 94.42% | 74.64% |
resnet-110 | resnet-110 | Fitnet | 94.43% | 73.63% |
resnet-110 | resnet-110 | NST | 94.43% | 73.55% |
resnet-110 | resnet-110 | PKT | 94.35% | 73.74% |
resnet-110 | resnet-110 | FSP | 94.39% | 73.59% |
resnet-110 | resnet-110 | FT | 94.30% | 74.72% |
resnet-110 | resnet-110 | RKD | 94.39% | 73.78% |
resnet-110 | resnet-110 | AB | 94.63% | 73.91% |
resnet-110 | resnet-110 | SP | 94.45% | 74.07% |
resnet-110 | resnet-110 | Sobolev | 94.26% | 73.14% |
resnet-110 | resnet-110 | BSS | 94.19% | 73.87% |
resnet-110 | resnet-110 | CC | 94.49% | 74.43% |
resnet-110 | resnet-110 | LwM | 94.19% | 73.28% |
resnet-110 | resnet-110 | IRG | 94.44% | 74.96% |
resnet-110 | resnet-110 | VID | 94.25% | 73.63% |
resnet-110 | resnet-110 | OFD | 94.38% | 74.11% |
resnet-110 | resnet-110 | AFD | 94.44% | 73.90% |
resnet-110 | resnet-110 | CRD | 94.30% | 75.44% |
Net1 | Net2 | Name | CIFAR10 | CIFAR100 |
- | resnet-20 | baseline | 92.37% | 68.92% |
- | resnet-110 | baseline | 93.86% | 73.15% |
resnet20 | resnet20 | DML | 93.07%/93.37% | 70.39%/70.22% |
resnet110 | resnet20 | DML | 94.45%/92.92% | 74.53%/70.29% |
resnet110 | resnet110 | DML | 94.74%/94.79% | 74.72%/75.55% |
This repo is partly based on the following repos, thank the authors a lot.
If you employ the listed KD methods in your research, please cite the corresponding papers.
distillation zoo with resnet 110,20
Python Shell
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》