ENGLISH | 简体中文
MgNet is a unified model that simultaneously recovers some convolutional neural networks (CNN) for image classification
and multigrid (MG) methods for solving discretized partial differential equations (PDEs). Here is a diagram of its
architecture.
paper: He J, Xu J. MgNet: A unified framework of
multigrid and convolutional neural network[J]. Science china mathematics, 2019, 62: 1331-1354.
The training dataset and pretrained checkpoint files will be downloaded automatically at the first launch.
Dataset used: [cifar10]/[cifar100]/[mnist]
./data
directory, the directory structure is as follows:├── data
│ ├── cifar-10-batches-bin
│ ├── cifar-100-binary
│ ├── t10k-images.idx3-ubyte
│ ├── t10k-labels.idx1-ubyte
│ ├── train-images.idx3-ubyte
│ └── train-labels.idx1-ubyte
If you need to download the dataset or checkpoint files manually,
please visit this link.
After installing MindSpore via the official website and the required dataset above, you can start training
and evaluation as follows:
Default:
python train.py
Full command:
python train.py \
--dataset cifar100 \
--save_ckpt true \
--load_ckpt false \
--save_ckpt_path ./checkpoints \
--load_ckpt_path ./checkpoints/model_300.ckpt \
--load_data_path ./data/cifar-100-binary \
--log_path ./logs \
--print_interval 10 \
--ckpt_interval 2000 \
--num_ite 2 2 2 2 \
--num_channel_u 256 \
--num_channel_f 256 \
--wise_b true \
--batch_size 128 \
--epochs 300 \
--lr 1e-1 \
--download_data mgnet \
--force_download false \
--amp_level O3 \
--device_id 0 \
--mode 0
File structures are as follows:
├── mgnet
│ ├── checkpoints # checkpoints files
│ ├── data # data files
│ │ ├── cifar-10-batches-bin # cifar-10 dataset directory
│ │ ├── cifar-100-binary # cifar-100 dataset directory
│ │ ├── t10k-images.idx3-ubyte # mnist test data images
│ │ ├── t10k-labels.idx1-ubyte # mnist test data labels
│ │ ├── train-images.idx3-ubyte # mnist training data images
│ │ └── train-labels.idx1-ubyte # mnist training data labels
│ ├── figures # figures directory
│ ├── logs # log files
│ ├── src # source codes
│ │ ├── network.py # network architecture
│ │ └── process.py # data process
│ ├── config.yaml # hyper-parameters configuration
│ ├── README.md # English model descriptions
│ ├── README_CN.md # Chinese model description
│ ├── requirements.txt # library requirements for this model
│ ├── train.py # python training script
│ └── eval.py # python evaluation script
Important parameters in train.py are as follows:
parameter | description | default value |
---|---|---|
dataset | dataset name to load, can be cifar10, cifar100, or mnist | cifar100 |
save_ckpt | whether save checkpoint or not | true |
load_ckpt | whether load checkpoint or not | false |
save_ckpt_path | checkpoint saving path | ./checkpoints |
load_ckpt_path | checkpoint loading path | ./checkpoints/model_300.ckpt |
load_data_path | path to load data | ./data/cifar-100-binary |
log_path | log saving path | ./logs |
print_interval | time and loss print interval | 10 |
ckpt_interval | checkpoint save interval | 2000 |
num_ite | the number of ite: in four level(layer), use with 2 2 2 2 or 3 4 5 6 | 2 2 2 2 |
num_channel_u | number of channels of u | 256 |
num_channel_f | number of channels of f | 256 |
wise_b | different B in different grid | true |
lr | learning rate | 1e-1 |
epochs | number of epochs | 300 |
batch_size | batch size | 128 |
download_data | necessary dataset and/or checkpoints | mgnet |
force_download | whether download the dataset or not by force | false |
amp_level | MindSpore auto mixed precision level | O0 |
device_id | device id to set | None |
mode | MindSpore Graph mode(0) or Pynative mode(1) | 0 |
running on GPU/Ascend
python train.py
The loss values during training will be printed in the console, which can also be inspected after training in log
file.
# grep "loss:" log
total 10661220 parameters
start training ...
epoch: 0/300, step: 10/390, loss:5.592, interval: 66.312 ms, total: 7.027 s
epoch: 0/300, step: 20/390, loss:6.389, interval: 66.504 ms, total: 7.725 s
epoch: 0/300, step: 30/390, loss:4.639, interval: 83.561 ms, total: 8.464 s
epoch: 0/300, step: 40/390, loss:4.428, interval: 68.359 ms, total: 9.181 s
epoch: 0/300, step: 50/390, loss:4.408, interval: 70.381 ms, total: 9.897 s
...
After training, you can still review the training process through the log file saved in log_path
, ./logs
directory
by default.
The model checkpoint will be saved in save_ckpt_path
, ./checkpoint
directory by default.
Before running the command below, please check the checkpoint loading path load_ckpt_path
specified
in config.yaml
for evaluation. You can download the checkpoint files as described here.
python eval.py
You can view the process and results through the log_path
, ./logs
by default.
The result pictures are saved in figures_path
, ./figures
by default.
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》