Branch: release/2.0

22 KiB

Raw Permalink Blame History

Benchmark

Benchmark

Parameter Settings

Experimental environment:

A100
CUDA 11.8
python 3.10
torch 2.1.1
flash_attn 2.3.4
xformers 0.0.23
auto_gptq 0.5.1
bitsandbytes 0.41.3.post2

The following are the same command line settings for all experiments:

    --dataset_test_ratio 0 \
    --dataset cls-fudan-news-zh \
    --save_strategy no \
    --check_dataset_strategy warning \
    --preprocess_num_proc 4 \

If the following parameters are not specified, the following default values are used:

    --max_length 2048 \
    --batch_size 1 \
    --gradient_checkpointing true \
    --use_flash_attn true \
    --lora_rank 8 \
    --lora_target_modules DEFAULT \
    --quantization_bit 0 \
    --gradient_accumulation_steps 16 \

Token statistics of the corresponding test dataset (obtained by qwen's tokenizer): 3234.4±2547.5, min=91, max=19548.

The experimental script can be found in scripts/benchmark/test_memory_time/.

Quantization

The test script is:

swift sft \
    --model_type {MODEL_TYPE} \
    --quantization_bit {QUANTIZATION_BIT} \
    --sft_type lora \
    ...

Model Type [LoRA]	Quantization	Training Speed (samples/s)	GPU Memory (GiB)
qwen-7b-chat	bf16	4.31	27.74
	int4 (gptq)	2.05	19.21
	int8 (gptq)	1.97	22.20
	int4 (bnb)	2.41	23.85
qwen-14b-chat	bf16	2.60	40.14
	int4 (gptq)	1.15	23.30
	int8 (gptq)	1.08	29.13
	int4 (bnb)	1.36	30.05
qwen-72b-chat	bf16	0.59 (2*A100)	73.71+78.54
	int4 (gptq)	0.23	54.86
	int8 (gptq)	0.21	78.44
	int4 (bnb)	0.28	74.87

Model Type & Max Length

LoRA

The test script is:

swift sft \
    --model_type {MODEL_TYPE} \
    --max_length {MAX_LENGTH} \
    --sft_type lora \
    ...

Model Type [LoRA]	Max Length	Training Speed (samples/s)	GPU Memory (GiB)
qwen-1_8b-chat	512	9.88	6.99
	1024	9.90	10.71
	2048	8.77	16.35
	4096	5.92	23.80
	8192	4.19	37.03
qwen-7b-chat	512	7.43	18.01
	1024	6.51	21.73
	2048	4.31	27.74
	4096	2.05	35.31
	8192	1.34	48.41
qwen-14b-chat	512	5.63	30.14
	1024	4.36	34.43
	2048	2.60	40.14
	4096	1.17	47.95
	8192	0.79	60.74
qwen-72b-chat (2*A100)	512	1.41	67.68+73.07
	1024	1.02	70.25+77.11
	2048	0.59	73.71+78.54
	4096	-	OOM
	8192	-	OOM
chatglm3-6b	512	6.72	13.94
	1024	6.16	12.99
	2048	4.20	17.20
	4096	1.92	29.80
	8192	1.24	66.82
yi-6b-chat	512	5.27	13.72
	1024	5.07	15.44
	2048	3.84	16.95
	4096	1.99	28.25
	8192	1.35	43.81
yi-34b-chat	512	2.32	66.72
	1024	1.76	69.10
	2048	1.05	71.34
	4096	0.47	78.72
	8192	0.31 (2*A100)	47.01+65.03
openbuddy-zephyr-7b-chat	512	5.17	14.99
	1024	3.92	16.57
	2048	3.08	19.89
	4096	1.85	23.29
	8192	0.92	52.14
baichuan2-7b-chat	512	6.09	18.18
	1024	5.36	17.45
	2048	3.43	19.18
	4096	1.69	34.22
	8192	1.16	45.47
baichuan2-13b-chat	512	5.32	31.01
	1024	3.91	31.58
	2048	1.77	32.40
	4096	0.65	49.63
	8192	0.36	76.17

Full

The test script is:

swift sft \
    --model_type {MODEL_TYPE} \
    --max_length {MAX_LENGTH} \
    --sft_type full \
    ...

Model Type [FULL]	Max Length	Training Speed (samples/s)	GPU Memory (GiB)
qwen-1_8b-chat	512	10.77	18.16
	1024	10.39	18.62
	2048	8.73	35.11
	4096	5.45	31.62
	8192	3.81	38.93
qwen-7b-chat	512	5.96	73.37
	1024	5.00	73.64
	2048	3.30	74.26
	4096	1.64	78.76
	8192	1.11 (2*A100)	61.34+73.00
qwen-14b-chat (2*A100)	512	3.66	60.42+72.31
	1024	2.98	60.61+74.37
	2048	1.93	60.70+78.22
	4096	0.92	75.59+78.64
	8192	0.62	76.59+77.68

Batch Size

The test script is:

swift sft \
    --batch_size {BATCH_SIZE} \
    --model_type qwen-7b-chat \
    --sft_type lora \
    ...

Model Type [LoRA]	Batch Size	Training Speed (samples/s)	GPU Memory (GiB)
qwen-7b-chat	1	4.31	27.74
	2	3.60	43.11
	4	3.02	63.81
	8	2.77	76.14

Use Flash Attn & Gradient Checkpointing

The test script is:

swift sft \
    --use_flash_attn {USE_FLASH_ATTN} \
    --gradient_checkpointing {GRADIENT_CHECKPOINTING} \
    --model_type qwen-7b-chat \
    --sft_type lora \
    ...

Model Type [LoRA]	Use Flash Attn	Gradient Checkpointing	Training Speed (samples/s)	GPU Memory (GiB)
qwen-7b-chat	✔	✔	4.31	27.74
	✔	✘	6.19	37.70
	✘	✔	3.13	27.71
	✘	✘	4.45	57.67

LoRA Rank & LoRA Target Modules

The test script is:

swift sft \
    --lora_rank {LORA_RANK} \
    --lora_target_modules {LORA_TARGET_MODULES} \
    --model_type qwen-7b-chat \
    --sft_type lora \
    ...

Model Type [LoRA]	LoRA Rank	LoRA Target Modules	Training Speed (samples/s)	GPU Memory (GiB)	Trainable Params (M)
qwen-7b-chat	2	DEFAULT (c_attn)	4.27	27.72	1.05
	8	DEFAULT	4.31	27.74	4.19
	64	DEFAULT	4.19	27.85	33.55
	8	ALL (all linear)	3.22	27.87	17.89

Gradient Accumulation Steps

The test script is:

swift sft \
    --gradient_accumulation_steps {GRADIENT_ACCUMULATION_STEPS} \
    --model_type qwen-7b-chat \
    --sft_type lora \
    ...

Model Type [LoRA]	Gradient Accumulation Steps	Training Speed (samples/s)	GPU Memory (GiB)
qwen-7b-chat	1	4.26	27.73
	2	4.32	27.74
	4	4.31	27.74
	8	4.32	27.74
	16	4.33	27.74
	32	4.30	27.74
	64	4.32	27.74

Tuners

exp_name	model_type	dataset	ms-bench mix ratio	tuner	tuner_params	trainable params(M)	flash_attn	gradient_checkpointing	hypers	memory	train speed(samples/s)	infer speed(tokens/s)	train_loss	eval_loss	gsm8k weighted acc	arc weighted acc	ceval weighted acc
adalora	qwen-7b-chat	ms-agent	2.0	adalora	rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False	26.8389(0.3464%)	True	True	lr=5e-05/epoch=2	32.55GiB	0.92(87543 samples/95338.71 seconds)	17.33(2345 tokens/135.29 seconds)	0.57	1.07	0.391	0.665	0.569
adapter	qwen-7b-chat	ms-agent	2.0	adapter		33.6896(0.4344%)	True	True	lr=5e-05/epoch=2	32.19GiB	1.48(87543 samples/59067.71 seconds)	26.63(4019 tokens/150.90 seconds)	0.55	1.03	0.438	0.662	0.565
dora	qwen-7b-chat	ms-agent	2.0	lora	rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=True	19.2512(0.2487%)	True	True	lr=5e-05/epoch=2	32.46GiB	0.51(87543 samples/171110.54 seconds)	4.29(2413 tokens/562.32 seconds)	0.53	1.01	0.466	0.683	0.577
full+galore128	qwen-7b-chat	ms-agent	2.0	full	galore_rank=128/galore_per_parameter=false/galore_with_embedding=false	7721.3245(100.0000%)	True	True	lr=5e-05/epoch=2	47.02GiB	1.10(87543 samples/79481.96 seconds)	28.96(2400 tokens/82.88 seconds)	0.55	1.00	0.358	0.688	0.577
full+galore32	qwen-7b-chat	ms-agent	2.0	full	galore_rank=32/galore_per_parameter=false/galore_with_embedding=false	7721.3245(100.0000%)	True	True	lr=5e-05/epoch=2	47.05GiB	1.11(87543 samples/78989.74 seconds)	29.17(2431 tokens/83.35 seconds)	0.56	1.01	0.386	0.667	0.539
full+galore64	qwen-7b-chat	ms-agent	2.0	full	galore_rank=64/galore_per_parameter=false/galore_with_embedding=false	7721.3245(100.0000%)	True	True	lr=5e-05/epoch=2	46.91GiB	1.11(87543 samples/79200.36 seconds)	28.94(2448 tokens/84.60 seconds)	0.56	1.01	0.397	0.674	0.544
full+galore_emb	qwen-7b-chat	ms-agent	2.0	full	galore_rank=128/galore_per_parameter=false/galore_with_embedding=true	7721.3245(100.0000%)	True	True	lr=5e-05/epoch=2	44.53GiB	1.10(87543 samples/79775.02 seconds)	29.45(2433 tokens/82.62 seconds)	0.55	1.00	0.398	0.670	0.568
full+galore_perparam	qwen-7b-chat	ms-agent	2.0	full	galore_rank=128/galore_per_parameter=true/galore_with_embedding=false	7721.3245(100.0000%)	True	True	lr=5e-05/epoch=2	47.02GiB	1.25(87543 samples/69821.89 seconds)	29.02(2478 tokens/85.39 seconds)	0.54	1.00	0.372	0.669	0.524
full+no_mix	qwen-7b-chat	ms-agent	0.0	full		7721.3245(100.0000%)	True	True	lr=5e-05/epoch=2	72.56GiB	1.27(29698 samples/23356.97 seconds)	30.31(11738 tokens/387.29 seconds)	0.57	0.44	0.174	0.652	0.553
full	qwen-7b-chat	ms-agent	2.0	full		7721.3245(100.0000%)	True	True	lr=5e-05/epoch=2	73.53GiB	1.43(87543 samples/61022.97 seconds)	29.51(3382 tokens/114.62 seconds)	0.54	0.95	0.343	0.536	0.495
llamapro	qwen-7b-chat	ms-agent	2.0	llamapro	num_blocks=4	809.5826(9.4900%)	True	True	lr=5e-05/epoch=2	38.11GiB	1.53(87543 samples/57294.42 seconds)	25.80(2374 tokens/92.02 seconds)	0.53	1.00	0.434	0.645	0.357
lora+	qwen-7b-chat	ms-agent	2.0	lora	rank=8/target=ALL/alpha=32/lr_ratio=16.0/use_rslora=False/use_dora=False	17.8913(0.2312%)	True	True	lr=5e-05/epoch=2	32.35GiB	0.95(87543 samples/91923.80 seconds)	18.81(3329 tokens/176.94 seconds)	0.53	0.98	0.432	0.647	0.344
lora+neftune	qwen-7b-chat	ms-agent	2.0	lora	rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=Falseneftune_alpha=15.0	17.8913(0.2312%)	True	True	lr=5e-05/epoch=2	32.35GiB	0.96(87543 samples/91525.50 seconds)	19.84(161792 tokens/8156.02 seconds)	0.53	1.02	0.456	0.671	0.401
lora+no_mix	qwen-7b-chat	ms-agent	0.0	lora	rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False	17.8913(0.2312%)	True	True	lr=5e-05/epoch=2	30.86GiB	0.91(29698 samples/32570.15 seconds)	19.89(36308 tokens/1825.26 seconds)	0.53	0.53	0.470	0.666	0.574
lora	qwen-7b-chat	ms-agent	2.0	lora	rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False	17.8913(0.2312%)	True	True	lr=5e-05/epoch=2	32.35GiB	0.95(87543 samples/91974.29 seconds)	18.11(2415 tokens/133.32 seconds)	0.53	1.01	0.462	0.676	0.304
qwen-7b-chat-eval	qwen-7b-chat	None	0.0	None		None(None)				None		30.81(13765 tokens/446.83 seconds)			0.517	0.679	0.568
rslora	qwen-7b-chat	ms-agent	2.0	lora	rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=True/use_dora=False	17.8913(0.2312%)	True	True	lr=5e-05/epoch=2	32.35GiB	0.94(87543 samples/92758.63 seconds)	18.87(2762 tokens/146.34 seconds)	0.53	0.99	0.451	0.679	0.339
full+lisa_2	qwen-7b-chat	ms-agent	2.0	full	lisa_activated_layers=2/lisa_step_interval=20	-	True	True	lr=5e-05/epoch=2	31.11GiB	2.66(76837 samples/28881.28 seconds)	36.10(134469 tokens/3725.21 seconds)	0.62	1.06	0.349	0.653	0.592
full+lisa_4	qwen-7b-chat	ms-agent	2.0	full	lisa_activated_layers=4/lisa_step_interval=20	-	True	True	lr=5e-05/epoch=2	31.87GiB	2.63(76837 samples/29215.15 seconds)	36.75(135477 tokens/3686.17 seconds)	0.63	1.06	0.377	0.656	0.607

unsloth

exp_name	model_type	dataset	ms-bench mix ratio	tuner	tuner_params	trainable params(M)	flash_attn	gradient_checkpointing	hypers	memory	train speed(samples/s)	infer speed(tokens/s)	train_loss	eval_loss	gsm8k weighted acc	arc weighted acc	ceval weighted acc
unsloth+lora+q4	llama3-8b-instruct	ms-agent	2.0	lora		4.7186(0.1038%)	True	True	lr=5e-05/epoch=2	21.69GiB	1.76(76839 samples/43763.01 seconds)	15.22(160885 tokens/10570.90 seconds)	0.58	1.03	0.668	0.755	0.501

Export

exp_name	model_type	calibration dataset	quantization method	quantization bits	infer speed(tokens/s)	gsm8k weighted acc	arc weighted acc	ceval weighted acc
awq-ms-bench-mini	qwen-7b-chat	ms-bench-mini	awq	4	27.25(16501 tokens/605.47 seconds)	0.494	0.665	0.571
awq-pileval	qwen-7b-chat	pileval	awq	4	26.92(12994 tokens/482.72 seconds)	0.497	0.675	0.577
gptq-ms-bench-mini	qwen-7b-chat	ms-bench-mini	gptq	4	31.16(15349 tokens/492.54 seconds)	0.482	0.642	0.556
gptq-pileval	qwen-7b-chat	pileval	gptq	4	31.67(15185 tokens/479.54 seconds)	0.478	0.654	0.559

AWQ

exp_name	model_type	dataset	ms-bench mix ratio	tuner	tuner_params	trainable params(M)	flash_attn	gradient_checkpointing	hypers	memory	train speed(samples/s)	infer speed(tokens/s)	train_loss	eval_loss	gsm8k weighted acc	arc weighted acc	ceval weighted acc
qwen1half-7b-chat-awq	qwen1half-7b-chat-awq	ms-agent	2.0	lora	rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False	19.9885(1.5802%)	True	True	lr=5e-05/epoch=2	24.26GiB	0.45(87543 samples/194746.58 seconds)	16.08(2469 tokens/153.58 seconds)	0.55	1.19	0.505	0.737	0.656

AQLM

exp_name	model_type	dataset	ms-bench mix ratio	tuner	tuner_params	trainable params(M)	flash_attn	gradient_checkpointing	hypers	memory	train speed(samples/s)	infer speed(tokens/s)	train_loss	eval_loss	gsm8k weighted acc	arc weighted acc	ceval weighted acc
llama2-7b-aqlm-2bit-1x16	llama2-7b-aqlm-2bit-1x16	dureader-robust-zh	0.0	lora	rank=8/target=ALL/alpha=32/lr_ratio=None/use_rslora=False/use_dora=False	19.9885(1.6510%)	True	True	lr=5e-05/epoch=2	4.04GiB	0.17(14994 samples/86140.71 seconds)		0.48	0.74

22 KiB Raw Permalink Blame History

Benchmark

Table of Contents

Parameter Settings

Quantization

Model Type & Max Length

LoRA

Full

Batch Size

Use Flash Attn & Gradient Checkpointing

LoRA Rank & LoRA Target Modules

Gradient Accumulation Steps

Tuners

unsloth

Export

AWQ

AQLM

22 KiB

Raw Permalink Blame History