Benchmark
Table of Contents
Parameter Settings
Experimental environment:
- A100
- CUDA 11.8
- python 3.10
- torch 2.1.1
- flash_attn 2.3.4
- xformers 0.0.23
- auto_gptq 0.5.1
- bitsandbytes 0.41.3.post2
The following are the same command line settings for all experiments:
--dataset_test_ratio 0 \
--dataset cls-fudan-news-zh \
--save_strategy no \
--check_dataset_strategy warning \
--preprocess_num_proc 4 \
If the following parameters are not specified, the following default values are used:
--max_length 2048 \
--batch_size 1 \
--gradient_checkpointing true \
--use_flash_attn true \
--lora_rank 8 \
--lora_target_modules DEFAULT \
--quantization_bit 0 \
--gradient_accumulation_steps 16 \
Token statistics of the corresponding test dataset (obtained by qwen's tokenizer): 3234.4±2547.5, min=91, max=19548.
The experimental script can be found in scripts/benchmark/test_memory_time/
.
Quantization
The test script is:
swift sft \
--model_type {MODEL_TYPE} \
--quantization_bit {QUANTIZATION_BIT} \
--sft_type lora \
...
Model Type [LoRA] |
Quantization |
Training Speed (samples/s) |
GPU Memory (GiB) |
qwen-7b-chat |
bf16 |
4.31 |
27.74 |
int4 (gptq) |
2.05 |
19.21 |
int8 (gptq) |
1.97 |
22.20 |
int4 (bnb) |
2.41 |
23.85 |
qwen-14b-chat |
bf16 |
2.60 |
40.14 |
int4 (gptq) |
1.15 |
23.30 |
int8 (gptq) |
1.08 |
29.13 |
int4 (bnb) |
1.36 |
30.05 |
qwen-72b-chat |
bf16 |
0.59 (2*A100) |
73.71+78.54 |
int4 (gptq) |
0.23 |
54.86 |
int8 (gptq) |
0.21 |
78.44 |
int4 (bnb) |
0.28 |
74.87 |
Model Type & Max Length
LoRA
The test script is:
swift sft \
--model_type {MODEL_TYPE} \
--max_length {MAX_LENGTH} \
--sft_type lora \
...
Model Type [LoRA] |
Max Length |
Training Speed (samples/s) |
GPU Memory (GiB) |
qwen-1_8b-chat |
512 |
9.88 |
6.99 |
1024 |
9.90 |
10.71 |
2048 |
8.77 |
16.35 |
4096 |
5.92 |
23.80 |
8192 |
4.19 |
37.03 |
qwen-7b-chat |
512 |
7.43 |
18.01 |
1024 |
6.51 |
21.73 |
2048 |
4.31 |
27.74 |
4096 |
2.05 |
35.31 |
8192 |
1.34 |
48.41 |
qwen-14b-chat |
512 |
5.63 |
30.14 |
1024 |
4.36 |
34.43 |
2048 |
2.60 |
40.14 |
4096 |
1.17 |
47.95 |
8192 |
0.79 |
60.74 |
qwen-72b-chat (2*A100) |
512 |
1.41 |
67.68+73.07 |
1024 |
1.02 |
70.25+77.11 |
2048 |
0.59 |
73.71+78.54 |
4096 |
- |
OOM |
8192 |
- |
OOM |
chatglm3-6b |
512 |
6.72 |
13.94 |
1024 |
6.16 |
12.99 |
2048 |
4.20 |
17.20 |
4096 |
1.92 |
29.80 |
8192 |
1.24 |
66.82 |
yi-6b-chat |
512 |
5.27 |
13.72 |
1024 |
5.07 |
15.44 |
2048 |
3.84 |
16.95 |
4096 |
1.99 |
28.25 |
8192 |
1.35 |
43.81 |
yi-34b-chat |
512 |
2.32 |
66.72 |
1024 |
1.76 |
69.10 |
2048 |
1.05 |
71.34 |
4096 |
0.47 |
78.72 |
8192 |
0.31 (2*A100) |
47.01+65.03 |
openbuddy-zephyr-7b-chat |
512 |
5.17 |
14.99 |
1024 |
3.92 |
16.57 |
2048 |
3.08 |
19.89 |
4096 |
1.85 |
23.29 |
8192 |
0.92 |
52.14 |
baichuan2-7b-chat |
512 |
6.09 |
18.18 |
1024 |
5.36 |
17.45 |
2048 |
3.43 |
19.18 |
4096 |
1.69 |
34.22 |
8192 |
1.16 |
45.47 |
baichuan2-13b-chat |
512 |
5.32 |
31.01 |
1024 |
3.91 |
31.58 |
2048 |
1.77 |
32.40 |
4096 |
0.65 |
49.63 |
8192 |
0.36 |
76.17 |
Full
The test script is:
swift sft \
--model_type {MODEL_TYPE} \
--max_length {MAX_LENGTH} \
--sft_type full \
...
Model Type [FULL] |
Max Length |
Training Speed (samples/s) |
GPU Memory (GiB) |
qwen-1_8b-chat |
512 |
10.77 |
18.16 |
1024 |
10.39 |
18.62 |
2048 |
8.73 |
35.11 |
4096 |
5.45 |
31.62 |
8192 |
3.81 |
38.93 |
qwen-7b-chat |
512 |
5.96 |
73.37 |
1024 |
5.00 |
73.64 |
2048 |
3.30 |
74.26 |
4096 |
1.64 |
78.76 |
8192 |
1.11 (2*A100) |
61.34+73.00 |
qwen-14b-chat (2*A100) |
512 |
3.66 |
60.42+72.31 |
1024 |
2.98 |
60.61+74.37 |
2048 |
1.93 |
60.70+78.22 |
4096 |
0.92 |
75.59+78.64 |
8192 |
0.62 |
76.59+77.68 |
Batch Size
The test script is:
swift sft \
--batch_size {BATCH_SIZE} \
--model_type qwen-7b-chat \
--sft_type lora \
...
Model Type [LoRA] |
Batch Size |
Training Speed (samples/s) |
GPU Memory (GiB) |
qwen-7b-chat |
1 |
4.31 |
27.74 |
2 |
3.60 |
43.11 |
4 |
3.02 |
63.81 |
8 |
2.77 |
76.14 |
Use Flash Attn & Gradient Checkpointing
The test script is:
swift sft \
--use_flash_attn {USE_FLASH_ATTN} \
--gradient_checkpointing {GRADIENT_CHECKPOINTING} \
--model_type qwen-7b-chat \
--sft_type lora \
...
Model Type [LoRA] |
Use Flash Attn |
Gradient Checkpointing |
Training Speed (samples/s) |
GPU Memory (GiB) |
qwen-7b-chat |
✔ |
✔ |
4.31 |
27.74 |
✔ |
✘ |
6.19 |
37.70 |
✘ |
✔ |
3.13 |
27.71 |
✘ |
✘ |
4.45 |
57.67 |
LoRA Rank & LoRA Target Modules
The test script is:
swift sft \
--lora_rank {LORA_RANK} \
--lora_target_modules {LORA_TARGET_MODULES} \
--model_type qwen-7b-chat \
--sft_type lora \
...
Model Type [LoRA] |
LoRA Rank |
LoRA Target Modules |
Training Speed (samples/s) |
GPU Memory (GiB) |
Trainable Params (M) |
qwen-7b-chat |
2 |
DEFAULT (c_attn) |
4.27 |
27.72 |
1.05 |
8 |
DEFAULT |
4.31 |
27.74 |
4.19 |
64 |
DEFAULT |
4.19 |
27.85 |
33.55 |
8 |
ALL (all linear) |
3.22 |
27.87 |
17.89 |
Gradient Accumulation Steps
The test script is:
swift sft \
--gradient_accumulation_steps {GRADIENT_ACCUMULATION_STEPS} \
--model_type qwen-7b-chat \
--sft_type lora \
...
Model Type [LoRA] |
Gradient Accumulation Steps |
Training Speed (samples/s) |
GPU Memory (GiB) |
qwen-7b-chat |
1 |
4.26 |
27.73 |
2 |
4.32 |
27.74 |
4 |
4.31 |
27.74 |
8 |
4.32 |
27.74 |
16 |
4.33 |
27.74 |
32 |
4.30 |
27.74 |
64 |
4.32 |
27.74 |
Tuners
exp_name |
model_type |
dataset |
ms-bench mix ratio |
tuner |
tuner_params |
trainable params(M) |
flash_attn |
gradient_checkpointing |
hypers |
memory |
train speed(samples/s) |
infer speed(tokens/s) |
train_loss |
eval_loss |
gsm8k weighted acc |
arc weighted acc |
ceval weighted acc |
adalora |
qwen-7b-chat |
ms-agent |
2.0 |
adalora |
rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False |
26.8389(0.3464%) |
True |
True |
lr=5e-05/epoch=2 |
32.55GiB |
0.92(87543 samples/95338.71 seconds) |
17.33(2345 tokens/135.29 seconds) |
0.57 |
1.07 |
0.391 |
0.665 |
0.569 |
adapter |
qwen-7b-chat |
ms-agent |
2.0 |
adapter |
|
33.6896(0.4344%) |
True |
True |
lr=5e-05/epoch=2 |
32.19GiB |
1.48(87543 samples/59067.71 seconds) |
26.63(4019 tokens/150.90 seconds) |
0.55 |
1.03 |
0.438 |
0.662 |
0.565 |
dora |
qwen-7b-chat |
ms-agent |
2.0 |
lora |
rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=True |
19.2512(0.2487%) |
True |
True |
lr=5e-05/epoch=2 |
32.46GiB |
0.51(87543 samples/171110.54 seconds) |
4.29(2413 tokens/562.32 seconds) |
0.53 |
1.01 |
0.466 |
0.683 |
0.577 |
full+galore128 |
qwen-7b-chat |
ms-agent |
2.0 |
full |
galore_rank=128/galore_per_parameter=false/galore_with_embedding=false |
7721.3245(100.0000%) |
True |
True |
lr=5e-05/epoch=2 |
47.02GiB |
1.10(87543 samples/79481.96 seconds) |
28.96(2400 tokens/82.88 seconds) |
0.55 |
1.00 |
0.358 |
0.688 |
0.577 |
full+galore32 |
qwen-7b-chat |
ms-agent |
2.0 |
full |
galore_rank=32/galore_per_parameter=false/galore_with_embedding=false |
7721.3245(100.0000%) |
True |
True |
lr=5e-05/epoch=2 |
47.05GiB |
1.11(87543 samples/78989.74 seconds) |
29.17(2431 tokens/83.35 seconds) |
0.56 |
1.01 |
0.386 |
0.667 |
0.539 |
full+galore64 |
qwen-7b-chat |
ms-agent |
2.0 |
full |
galore_rank=64/galore_per_parameter=false/galore_with_embedding=false |
7721.3245(100.0000%) |
True |
True |
lr=5e-05/epoch=2 |
46.91GiB |
1.11(87543 samples/79200.36 seconds) |
28.94(2448 tokens/84.60 seconds) |
0.56 |
1.01 |
0.397 |
0.674 |
0.544 |
full+galore_emb |
qwen-7b-chat |
ms-agent |
2.0 |
full |
galore_rank=128/galore_per_parameter=false/galore_with_embedding=true |
7721.3245(100.0000%) |
True |
True |
lr=5e-05/epoch=2 |
44.53GiB |
1.10(87543 samples/79775.02 seconds) |
29.45(2433 tokens/82.62 seconds) |
0.55 |
1.00 |
0.398 |
0.670 |
0.568 |
full+galore_perparam |
qwen-7b-chat |
ms-agent |
2.0 |
full |
galore_rank=128/galore_per_parameter=true/galore_with_embedding=false |
7721.3245(100.0000%) |
True |
True |
lr=5e-05/epoch=2 |
47.02GiB |
1.25(87543 samples/69821.89 seconds) |
29.02(2478 tokens/85.39 seconds) |
0.54 |
1.00 |
0.372 |
0.669 |
0.524 |
full+no_mix |
qwen-7b-chat |
ms-agent |
0.0 |
full |
|
7721.3245(100.0000%) |
True |
True |
lr=5e-05/epoch=2 |
72.56GiB |
1.27(29698 samples/23356.97 seconds) |
30.31(11738 tokens/387.29 seconds) |
0.57 |
0.44 |
0.174 |
0.652 |
0.553 |
full |
qwen-7b-chat |
ms-agent |
2.0 |
full |
|
7721.3245(100.0000%) |
True |
True |
lr=5e-05/epoch=2 |
73.53GiB |
1.43(87543 samples/61022.97 seconds) |
29.51(3382 tokens/114.62 seconds) |
0.54 |
0.95 |
0.343 |
0.536 |
0.495 |
llamapro |
qwen-7b-chat |
ms-agent |
2.0 |
llamapro |
num_blocks=4 |
809.5826(9.4900%) |
True |
True |
lr=5e-05/epoch=2 |
38.11GiB |
1.53(87543 samples/57294.42 seconds) |
25.80(2374 tokens/92.02 seconds) |
0.53 |
1.00 |
0.434 |
0.645 |
0.357 |
lora+ |
qwen-7b-chat |
ms-agent |
2.0 |
lora |
rank=8/target=ALL/alpha=32/lr_ratio=16.0/use_rslora=False/use_dora=False |
17.8913(0.2312%) |
True |
True |
lr=5e-05/epoch=2 |
32.35GiB |
0.95(87543 samples/91923.80 seconds) |
18.81(3329 tokens/176.94 seconds) |
0.53 |
0.98 |
0.432 |
0.647 |
0.344 |
lora+neftune |
qwen-7b-chat |
ms-agent |
2.0 |
lora |
rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=Falseneftune_alpha=15.0 |
17.8913(0.2312%) |
True |
True |
lr=5e-05/epoch=2 |
32.35GiB |
0.96(87543 samples/91525.50 seconds) |
19.84(161792 tokens/8156.02 seconds) |
0.53 |
1.02 |
0.456 |
0.671 |
0.401 |
lora+no_mix |
qwen-7b-chat |
ms-agent |
0.0 |
lora |
rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False |
17.8913(0.2312%) |
True |
True |
lr=5e-05/epoch=2 |
30.86GiB |
0.91(29698 samples/32570.15 seconds) |
19.89(36308 tokens/1825.26 seconds) |
0.53 |
0.53 |
0.470 |
0.666 |
0.574 |
lora |
qwen-7b-chat |
ms-agent |
2.0 |
lora |
rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False |
17.8913(0.2312%) |
True |
True |
lr=5e-05/epoch=2 |
32.35GiB |
0.95(87543 samples/91974.29 seconds) |
18.11(2415 tokens/133.32 seconds) |
0.53 |
1.01 |
0.462 |
0.676 |
0.304 |
qwen-7b-chat-eval |
qwen-7b-chat |
None |
0.0 |
None |
|
None(None) |
|
|
|
None |
|
30.81(13765 tokens/446.83 seconds) |
|
|
0.517 |
0.679 |
0.568 |
rslora |
qwen-7b-chat |
ms-agent |
2.0 |
lora |
rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=True/use_dora=False |
17.8913(0.2312%) |
True |
True |
lr=5e-05/epoch=2 |
32.35GiB |
0.94(87543 samples/92758.63 seconds) |
18.87(2762 tokens/146.34 seconds) |
0.53 |
0.99 |
0.451 |
0.679 |
0.339 |
full+lisa_2 |
qwen-7b-chat |
ms-agent |
2.0 |
full |
lisa_activated_layers=2/lisa_step_interval=20 |
- |
True |
True |
lr=5e-05/epoch=2 |
31.11GiB |
2.66(76837 samples/28881.28 seconds) |
36.10(134469 tokens/3725.21 seconds) |
0.62 |
1.06 |
0.349 |
0.653 |
0.592 |
full+lisa_4 |
qwen-7b-chat |
ms-agent |
2.0 |
full |
lisa_activated_layers=4/lisa_step_interval=20 |
- |
True |
True |
lr=5e-05/epoch=2 |
31.87GiB |
2.63(76837 samples/29215.15 seconds) |
36.75(135477 tokens/3686.17 seconds) |
0.63 |
1.06 |
0.377 |
0.656 |
0.607 |
unsloth
exp_name |
model_type |
dataset |
ms-bench mix ratio |
tuner |
tuner_params |
trainable params(M) |
flash_attn |
gradient_checkpointing |
hypers |
memory |
train speed(samples/s) |
infer speed(tokens/s) |
train_loss |
eval_loss |
gsm8k weighted acc |
arc weighted acc |
ceval weighted acc |
unsloth+lora+q4 |
llama3-8b-instruct |
ms-agent |
2.0 |
lora |
|
4.7186(0.1038%) |
True |
True |
lr=5e-05/epoch=2 |
21.69GiB |
1.76(76839 samples/43763.01 seconds) |
15.22(160885 tokens/10570.90 seconds) |
0.58 |
1.03 |
0.668 |
0.755 |
0.501 |
Export
exp_name |
model_type |
calibration dataset |
quantization method |
quantization bits |
infer speed(tokens/s) |
gsm8k weighted acc |
arc weighted acc |
ceval weighted acc |
awq-ms-bench-mini |
qwen-7b-chat |
ms-bench-mini |
awq |
4 |
27.25(16501 tokens/605.47 seconds) |
0.494 |
0.665 |
0.571 |
awq-pileval |
qwen-7b-chat |
pileval |
awq |
4 |
26.92(12994 tokens/482.72 seconds) |
0.497 |
0.675 |
0.577 |
gptq-ms-bench-mini |
qwen-7b-chat |
ms-bench-mini |
gptq |
4 |
31.16(15349 tokens/492.54 seconds) |
0.482 |
0.642 |
0.556 |
gptq-pileval |
qwen-7b-chat |
pileval |
gptq |
4 |
31.67(15185 tokens/479.54 seconds) |
0.478 |
0.654 |
0.559 |
AWQ
exp_name |
model_type |
dataset |
ms-bench mix ratio |
tuner |
tuner_params |
trainable params(M) |
flash_attn |
gradient_checkpointing |
hypers |
memory |
train speed(samples/s) |
infer speed(tokens/s) |
train_loss |
eval_loss |
gsm8k weighted acc |
arc weighted acc |
ceval weighted acc |
qwen1half-7b-chat-awq |
qwen1half-7b-chat-awq |
ms-agent |
2.0 |
lora |
rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False |
19.9885(1.5802%) |
True |
True |
lr=5e-05/epoch=2 |
24.26GiB |
0.45(87543 samples/194746.58 seconds) |
16.08(2469 tokens/153.58 seconds) |
0.55 |
1.19 |
0.505 |
0.737 |
0.656 |
AQLM
exp_name |
model_type |
dataset |
ms-bench mix ratio |
tuner |
tuner_params |
trainable params(M) |
flash_attn |
gradient_checkpointing |
hypers |
memory |
train speed(samples/s) |
infer speed(tokens/s) |
train_loss |
eval_loss |
gsm8k weighted acc |
arc weighted acc |
ceval weighted acc |
llama2-7b-aqlm-2bit-1x16 |
llama2-7b-aqlm-2bit-1x16 |
dureader-robust-zh |
0.0 |
lora |
rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False |
19.9885(1.6510%) |
True |
True |
lr=5e-05/epoch=2 |
4.04GiB |
0.17(14994 samples/86140.71 seconds) |
|
0.48 |
0.74 |
|
|
|