Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Gogery 909e070779 | 1 year ago | |
---|---|---|
ascend310_infer | 1 year ago | |
scripts | 1 year ago | |
src | 1 year ago | |
README.md | 1 year ago | |
default_config.yaml | 1 year ago | |
eval.py | 1 year ago | |
export.py | 1 year ago | |
mindspore_hub_conf.py | 1 year ago | |
postprocess.py | 1 year ago | |
requirements.txt | 1 year ago | |
train.py | 1 year ago |
Xception by Google is extreme version of Inception. With a modified depthwise separable convolution, it is even better than Inception-v3. This paper was published in 2017.
Paper Franois Chollet. Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) IEEE, 2017.
The overall network architecture of Xception is show below:
Dataset used can refer to paper.
The mixed precision training method accelerates the deep learning neural network training process by using both the single-precision and half-precision data formats, and maintains the network precision achieved by the single-precision training at the same time. Mixed precision training can accelerate the computation process, reduce memory usage, and enable a larger model or batch size to be trained on specific hardware.
For FP16 operators, if the input data type is FP32, the backend of MindSpore will automatically handle it with reduced precision. Users could check the reduced-precision operators by enabling INFO log and then searching ‘reduce precision’.
.
└─Xception
├─README.md
├─ascend310_infer #application for 310 inference
├─scripts
├─run_standalone_train.sh # launch standalone training with ascend platform(1p)
├─run_distribute_train.sh # launch distributed training with ascend platform(8p)
├─run_train_gpu_fp32.sh # launch standalone or distributed fp32 training with gpu platform(1p or 8p)
├─run_train_gpu_fp16.sh # launch standalone or distributed fp16 training with gpu platform(1p or 8p)
├─run_eval.sh # launch evaluating with ascend platform
├─run_infer_310.sh # shell script for 310 inference
└─run_eval_gpu.sh # launch evaluating with gpu platform
├─src
├─model_utils
├─config.py # parsing parameter configuration file of "*.yaml"
├─device_adapter.py # local or ModelArts training
├─local_adapter.py # get related environment variables on local
└─moxing_adapter.py # get related environment variables abd transfer data on ModelArts
├─dataset.py # data preprocessing
├─Xception.py # network definition
├─loss.py # Customized CrossEntropy loss function
└─lr_generator.py # learning rate generator
├─default_config.yaml # parameter configuration
├─mindspore_hub_conf.py # mindspore hub interface
├─train.py # train net
├─postprogress.py # post process for 310 inference
├─export.py # export net
└─eval.py # eval net
Parameters for both training and evaluation can be set in default_config.yaml
.
Major parameters in train.py and config.py are:
'num_classes': 1000 # dataset class numbers
'batch_size': 128 # input batchsize
'loss_scale': 1024 # loss scale
'momentum': 0.9 # momentum
'weight_decay': 1e-4 # weight decay
'epoch_size': 250 # total epoch numbers
'save_checkpoint': True # save checkpoint
'save_checkpoint_epochs': 1 # save checkpoint epochs
'keep_checkpoint_max': 5 # max numbers to keep checkpoints
'save_checkpoint_path': "./" # save checkpoint path
'warmup_epochs': 1 # warmup epoch numbers
'lr_decay_mode': "liner" # lr decay mode
'use_label_smooth': True # use label smooth
'finish_epoch': 0 # finished epochs numbers
'label_smooth_factor': 0.1 # label smoothing factor
'lr_init': 0.00004 # initiate learning rate
'lr_max': 0.4 # max bound of learning rate
'lr_end': 0.00004 # min bound of learning rate
Major parameters in train.py and config.py are:
'num_classes': 1000 # dataset class numbers
'batch_size': 64 # input batchsize
'loss_scale': 1024 # loss scale
'momentum': 0.9 # momentum
'weight_decay': 1e-4 # weight decay
'epoch_size': 250 # total epoch numbers
'save_checkpoint': True # save checkpoint
'save_checkpoint_epochs': 1 # save checkpoint epochs
'keep_checkpoint_max': 5 # max numbers to keep checkpoints
'save_checkpoint_path': "./gpu-ckpt" # save checkpoint path
'warmup_epochs': 1 # warmup epoch numbers
'lr_decay_mode': "linear" # lr decay mode
'use_label_smooth': True # use label smooth
'finish_epoch': 0 # finished epochs numbers
'label_smooth_factor': 0.1 # label smoothing factor
'lr_init': 0.00004 # initiate learning rate
'lr_max': 0.4 # max bound of learning rate
'lr_end': 0.00004 # min bound of learning rate
You can start training using python or shell scripts. The usage of shell scripts as follows:
# distribute training example(8p)
bash scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
# standalone training
bash scripts/run_standalone_train.sh DEVICE_ID DATA_PATH
# fp32 distributed training example(8p)
bash scripts/run_train_gpu_fp32.sh DEVICE_NUM DATASET_PATH PRETRAINED_CKPT_PATH(optional)
# fp32 standalone training example
bash scripts/run_train_gpu_fp32.sh 1 DATASET_PATH PRETRAINED_CKPT_PATH(optional)
# fp16 distributed training example(8p)
bash scripts/run_train_gpu_fp16.sh DEVICE_NUM DATASET_PATH PRETRAINED_CKPT_PATH(optional)
# fp16 standalone training example
bash scripts/run_train_gpu_fp16.sh 1 DATASET_PATH PRETRAINED_CKPT_PATH(optional)
# infer example
bash run_eval_gpu.sh DEVICE_ID DATASET_PATH CHECKPOINT_PATH
#ascend310 infer example
bash run_infer_310.sh MINDIR_PATH DATA_PATH LABEL_FILE DEVICE_ID
Notes: RANK_TABLE_FILE can refer to Link, and the device_ip can be got as Link.
# training example
python:
Ascend:
python train.py --device_target Ascend --dataset_path /dataset/train
GPU:
python train.py --device_target GPU --dataset_path /dataset/train
shell:
Ascend:
# distribute training example(8p)
bash scripts/run_distribute_train.sh RANK_TABLE_FILE DATA_PATH
# standalone training
bash scripts/run_standalone_train.sh DEVICE_ID DATA_PATH
GPU:
# fp16 training example(8p)
bash scripts/run_train_gpu_fp16.sh DEVICE_NUM DATA_PATH
# fp32 training example(8p)
bash scripts/run_train_gpu_fp32.sh DEVICE_NUM DATA_PATH
Training result will be stored in the example path. Checkpoints will be stored at ./ckpt_0
for Ascend and ./gpu_ckpt
for GPU by default, and training log will be redirected to log.txt
fo Ascend and log_gpu.txt
for GPU like following.
epoch: 1 step: 1251, loss is 4.8427444
epoch time: 701242.350 ms, per step time: 560.545 ms
epoch: 2 step: 1251, loss is 4.0637593
epoch time: 598591.422 ms, per step time: 478.490 ms
epoch: 1 step: 20018, loss is 5.479554
epoch time: 5664051.330 ms, per step time: 282.948 ms
epoch: 2 step: 20018, loss is 5.179064
epoch time: 5628609.779 ms, per step time: 281.177 ms
If you want to run in modelarts, please check the official documentation of modelarts, and you can start training as follows
# (1) Upload the code folder to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/xception" on the website UI interface.
# (4) Set the startup file to /{path}/xception/train.py" on the website UI interface.
# (5) Perform a or b.
# a. setting parameters in /{path}/xception/default_config.yaml.
# 1. Set ”enable_modelarts: True“
# 2. Set “is_distributed: True”
# 3. Set “modelarts_dataset_unzip_name: {folder_name}", if the data is uploaded in the form of zip package.
# 4. Set “folder_name_under_zip_file: {path}”, (dateset path under the unzip folder, such as './ImageNet_Original/train')
# b. adding on the website UI interface.
# 1. Add ”enable_modelarts=True“
# 2. Add “is_distributed: True”
# 3. Add “modelarts_dataset_unzip_name: {folder_name}", if the data is uploaded in the form of zip package.
# 4. Add “folder_name_under_zip_file: {path}”, (dateset path under the unzip folder, such as './ImageNet_Original/train')
# (6) Upload the mindrecdrd dataset to S3 bucket.
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path.
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Under the item "resource pool selection", select the specification of 8 cards..
# (10) Create your job.
You can start training using python or shell scripts. The usage of shell scripts as follows:
bash scripts/run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
bash scripts/run_eval_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
# eval example
python:
Ascend: python eval.py --device_target Ascend --checkpoint_path PATH_CHECKPOINT --dataset_path DATA_DIR
GPU: python eval.py --device_target GPU --checkpoint_path PATH_CHECKPOINT --dataset_path DATA_DIR
shell:
Ascend: bash scripts/run_eval.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
GPU: bash scripts/run_eval_gpu.sh DEVICE_ID DATA_DIR PATH_CHECKPOINT
checkpoint can be produced in training process.
Evaluation result will be stored in the example path, you can find result like the following in eval.log
on ascend and eval_gpu.log
on gpu.
result: {'Loss': 1.7797744848789312, 'Top_1_Acc': 0.7985777243589743, 'Top_5_Acc': 0.9485777243589744}
result: {'Loss': 1.7846775874590903, 'Top_1_Acc': 0.798735595390525, 'Top_5_Acc': 0.9498439500640204}
If you want to run in modelarts, please check the official documentation of modelarts, and you can start training as follows
# (1) Upload the code folder 'xception' to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/xception" on the website UI interface.
# (4) Set the startup file to /{path}/xception/eval.py" on the website UI interface.
# (5) Perform a or b.
# a. setting parameters in /{path}/xception/default_config.yaml.
# 1. Set ”enable_modelarts: True“
# 2. Set “checkpoint_path: ./{path}/*.ckpt”('load_checkpoint_path' indicates the path of the weight file to be evaluated relative to the file `eval.py`, and the weight file must be included in the code directory.)
# 3. Set “modelarts_dataset_unzip_name: {folder_name}", if the data is uploaded in the form of zip package.
# 4. Set “folder_name_under_zip_file: {path}”, (dateset path under the unzip folder, such as './ImageNet_Original/validation_preprocess')
# b. adding on the website UI interface.
# 1. Add ”enable_modelarts: True“
# 2. Add “checkpoint_path: ./{path}/*.ckpt”('load_checkpoint_path' indicates the path of the weight file to be evaluated relative to the file `eval.py`, and the weight file must be included in the code directory.)
# 3. Add “modelarts_dataset_unzip_name: {folder_name}", if the data is uploaded in the form of zip package.
# 4. Add “folder_name_under_zip_file: {path}”, (dateset path under the unzip folder, such as './ImageNet_Original/validation_preprocess')
# (6) Upload the dataset(not mindrecord format) to S3 bucket.
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path.
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Under the item "resource pool selection", select the specification of a single card.
# (10) Create your job.
Export on local
python export.py --ckpt_file [CKPT_PATH] --device_target [DEVICE_TARGET] --file_format[EXPORT_FORMAT] --batch_size [BATCH_SIZE]
EXPORT_FORMAT
should be in ["AIR", "MINDIR"]
Export on ModelArts (If you want to run in modelarts, please check the official documentation of modelarts, and you can start as follows)
# (1) Upload the code folder to S3 bucket.
# (2) Click to "create training task" on the website UI interface.
# (3) Set the code directory to "/{path}/xception" on the website UI interface.
# (4) Set the startup file to /{path}/xception/export.py" on the website UI interface.
# (5) Perform a or b.
# a. setting parameters in /{path}/xception/default_config.yaml.
# 1. Set ”enable_modelarts: True“
# 2. Set “ckpt_file: ./{path}/*.ckpt”('ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
# 3. Set ”file_name: xception“
# 4. Set ”file_format:MINDIR“
# b. adding on the website UI interface.
# 1. Add ”enable_modelarts=True“
# 2. Add “ckpt_file=./{path}/*.ckpt”('ckpt_file' indicates the path of the weight file to be exported relative to the file `export.py`, and the weight file must be included in the code directory.)
# 3. Add ”file_name=xception“
# 4. Add ”file_format=MINDIR“
# (7) Check the "data storage location" on the website UI interface and set the "Dataset path" path (This step is useless, but necessary.).
# (8) Set the "Output file path" and "Job log path" to your path on the website UI interface.
# (9) Under the item "resource pool selection", select the specification of a single card.
# (10) Create your job.
# You will see xception.mindir under {Output file path}.
Before performing inference, we need to export model first. Air model can only be exported in Ascend 910 environment, mindir model can be exported in any environment.
Current batch_ size can only be set to 1.
# Ascend310 inference
bash run_infer_310.sh [MINDIR_PATH] [DATA_PATH] [LABEL_FILE] [DEVICE_ID]
-Note: the Imagenet data set is used in densnet121 network. The label of the picture is the number from 0 after sorting the folder.
Inference result will be stored in the script path, you can find result like the followings in acc.log.
Top_1_Acc: 0.79886%, Top_5_Acc: 0.94882%
Parameters | Ascend | GPU |
---|---|---|
Model Version | Xception | Xception |
Resource | HUAWEI CLOUD Modelarts | HUAWEI CLOUD Modelarts |
uploaded Date | 12/10/2020 | 02/09/2021 |
MindSpore Version | 1.1.0 | 1.1.0 |
Dataset | 1200k images | 1200k images |
Batch_size | 128 | 64 |
Training Parameters | src/config.py | src/config.py |
Optimizer | Momentum | Momentum |
Loss Function | CrossEntropySmooth | CrossEntropySmooth |
Loss | 1.78 | 1.78 |
Accuracy (8p) | Top1[79.8%] Top5[94.8%] | Top1[79.8%] Top5[94.9%] |
Per step time (8p) | 479 ms/step | 282 ms/step |
Total time (8p) | 42h | 51h |
Params (M) | 180M | 180M |
Scripts | Xception script | Xception script |
Parameters | Ascend | GPU |
---|---|---|
Model Version | Xception | Xception |
Resource | HUAWEI CLOUD Modelarts | HUAWEI CLOUD Modelarts |
Uploaded Date | 12/10/2020 | 02/09/2021 |
MindSpore Version | 1.1.0 | 1.1.0 |
Dataset | 50k images | 50k images |
Batch_size | 128 | 64 |
Accuracy | Top1[79.8%] Top5[94.8%] | Top1[79.8%] Top5[94.9%] |
Total time | 3mins | 4.7mins |
In dataset.py
, we set the seed inside create_dataset
function. We also use random seed in train.py
.
Please check the official homepage.
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》