Branch: master

8.8 KiB

Raw Permalink Blame History

Contents

python train.py \
    --dataset cifar100 \
    --save_ckpt true \
    --load_ckpt false \
    --save_ckpt_path ./checkpoints \
    --load_ckpt_path ./checkpoints/model_300.ckpt \
    --load_data_path ./data/cifar-100-binary \
    --log_path ./logs \
    --print_interval 10 \
    --ckpt_interval 2000 \
    --num_ite 2 2 2 2 \
    --num_channel_u 256 \
    --num_channel_f 256 \
    --wise_b true \
    --batch_size 128 \
    --epochs 300 \
    --lr 1e-1 \
    --download_data mgnet \
    --force_download false \
    --amp_level O3 \
    --device_id 0 \
    --mode 0

Script Description

Script and Sample Code

File structures are as follows:

├── mgnet
│   ├── checkpoints                                 # checkpoints files
│   ├── data                                        # data files
│   │   ├── cifar-10-batches-bin                    # cifar-10 dataset directory
│   │   ├── cifar-100-binary                        # cifar-100 dataset directory
│   │   ├── t10k-images.idx3-ubyte                  # mnist test data images
│   │   ├── t10k-labels.idx1-ubyte                  # mnist test data labels
│   │   ├── train-images.idx3-ubyte                 # mnist training data images
│   │   └──  train-labels.idx1-ubyte                # mnist training data labels
│   ├── figures                                     # figures directory
│   ├── logs                                        # log files
│   ├── src                                         # source codes
│   │   ├── network.py                              # network architecture
│   │   └── process.py                              # data process
│   ├── config.yaml                                 # hyper-parameters configuration
│   ├── README.md                                   # English model descriptions
│   ├── README_CN.md                                # Chinese model description
│   ├── requirements.txt                            # library requirements for this model
│   ├── train.py                                    # python training script
│   └── eval.py                                     # python evaluation script

Script Parameters

Important parameters in train.py are as follows:

parameter	description	default value
dataset	dataset name to load, can be cifar10, cifar100, or mnist	cifar100
save_ckpt	whether save checkpoint or not	true
load_ckpt	whether load checkpoint or not	false
save_ckpt_path	checkpoint saving path	./checkpoints
load_ckpt_path	checkpoint loading path	./checkpoints/model_300.ckpt
load_data_path	path to load data	./data/cifar-100-binary
log_path	log saving path	./logs
print_interval	time and loss print interval	10
ckpt_interval	checkpoint save interval	2000
num_ite	the number of ite: in four level(layer), use with 2 2 2 2 or 3 4 5 6	2 2 2 2
num_channel_u	number of channels of u	256
num_channel_f	number of channels of f	256
wise_b	different B in different grid	true
lr	learning rate	1e-1
epochs	number of epochs	300
batch_size	batch size	128
download_data	necessary dataset and/or checkpoints	mgnet
force_download	whether download the dataset or not by force	false
amp_level	MindSpore auto mixed precision level	O0
device_id	device id to set	None
mode	MindSpore Graph mode(0) or Pynative mode(1)	0

Training Process

running on GPU/Ascend

python train.py

The loss values during training will be printed in the console, which can also be inspected after training in log
file.

# grep "loss:" log
total 10661220 parameters
start training ...
epoch: 0/300, step: 10/390, loss:5.592, interval: 66.312 ms, total: 7.027 s
epoch: 0/300, step: 20/390, loss:6.389, interval: 66.504 ms, total: 7.725 s
epoch: 0/300, step: 30/390, loss:4.639, interval: 83.561 ms, total: 8.464 s
epoch: 0/300, step: 40/390, loss:4.428, interval: 68.359 ms, total: 9.181 s
epoch: 0/300, step: 50/390, loss:4.408, interval: 70.381 ms, total: 9.897 s
...

After training, you can still review the training process through the log file saved in log_path, ./logs directory
by default.
The model checkpoint will be saved in save_ckpt_path, ./checkpoint directory by default.

Evaluation Process

Before running the command below, please check the checkpoint loading path load_ckpt_path specified
in config.yaml for evaluation. You can download the checkpoint files as described here.

running on GPU/Ascend

python eval.py

You can view the process and results through the log_path, ./logs by default.
The result pictures are saved in figures_path, ./figures by default.

8.8 KiB

Raw Permalink Blame History

Contents

MgNet Description

Dataset

Environment Requirements

Quick Start

Script Description

Script and Sample Code

Script Parameters

Training Process

Evaluation Process

8.8 KiB Raw Permalink Blame History

Contents

8.8 KiB

Raw Permalink Blame History