History

Zhihao Lin 81d66e6906 [Feature] Add conversion scripts for LLaVA-Llama-3-8B (#618 ) * update * update * fix typo * Update README.md * Update README.md		2 weeks ago
..
finetune	[Release] LLaVA-Phi-3-mini (#615)	3 weeks ago

pretrain	[Feature]Support Mircosoft Phi3 4K&128K Instruct Models (#603)	3 weeks ago

README.md	[Feature] Add conversion scripts for LLaVA-Llama-3-8B (#618)	2 weeks ago

convert_phi_to_llama.py	[Release] LLaVA-Phi-3-mini (#615)	3 weeks ago

convert_xtuner_weights_to_hf.py	[Release] LLaVA-Phi-3-mini (#615)	3 weeks ago

convert_xtuner_weights_to_llava.py	[Release] LLaVA-Phi-3-mini (#615)	3 weeks ago

README.md

LLaVA-Phi-3-mini

LLaVA-Phi-3-mini

Results

Model	MMBench Test (EN)	MMMU Val	SEED-IMG	AI2D Test	ScienceQA Test	HallusionBench aAcc	POPE	GQA	TextVQA	MME	MMStar	Configs
LLaVA-v1.5-7B	66.5	35.3	60.5	54.8	70.4	44.9	85.9	62.0	58.2	1511/348	30.3	-
LLaVA-Llama-3-8B	68.9	36.8	69.8	60.9	73.3	47.3	87.2	63.5	58.0	1506/295	38.2	Pretrain / Fine-tune
LLaVA-Llama-3-8B-v1.1	72.3	37.1	70.1	70.0	72.9	47.7	86.4	62.6	59.0	1469/349	45.1	Pretrain / Fine-tune
LLaVA-Phi-3-mini	69.2	41.4	70.0	69.3	73.7	49.8	87.3	61.5	57.8	1477/313	43.7	Pretrain / Fine-tune

Resources

Official LLaVA format model (xtuner/llava-phi-3-mini): 🤗 HuggingFace / 🤖 ModelScope
HuggingFace LLaVA format model (xtuner/llava-phi-3-mini-hf): 🤗 HuggingFace / 🤖 ModelScope
XTuner LLaVA format model (xtuner/llava-phi-3-mini-xtuner): 🤗 HuggingFace / 🤖 ModelScope
GGUF model (xtuner/llava-phi-3-mini-gguf): 🤗 HuggingFace / 🤖 ModelScope
Pretrained projector weights: 🤗 HuggingFace / 🤖 ModelScope

Data Preparation

Please refer to here.

Training

LLaVA-Phi-3-mini

Pretrain

NPROC_PER_NODE=8 xtuner train llava_phi3_mini_4k_instruct_clip_vit_large_p14_336_e1_gpu8_sharegpt4v_pretrain --deepspeed deepspeed_zero2 --seed 1024

Fine-tune

NPROC_PER_NODE=8 xtuner train llava_phi3_mini_4k_instruct_full_clip_vit_large_p14_336_full_e2_gpu8_internvl_finetune --deepspeed deepspeed_zero2 --seed 1024

Model Conversion

Step 0. Convert `.pth` file to LLaVA model in xtuner format (LLaVA-Phi-3-mini-xtuner)

After training, we will obtain a set of weights (i.e., iter_xxx.pth), which are not in the universal HuggingFace format. We first need to convert them to the LLaVA model in xtuner format.

xtuner convert pth_to_hf $FINETUNE_CFG $PTH_PATH $SAVE_PATH
# e.g., xtuner convert pth_to_hf llava_phi3_mini_4k_instruct_full_clip_vit_large_p14_336_full_e2_gpu8_internvl_finetune ./iter_39620.pth ./iter_39620_xtuner

./iter_39620_xtuner
├── added_tokens.json
├── config.json
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
├── model.safetensors.index.json
├── projector
│   ├── config.json
│   ├── configuration_projector.py
│   ├── modeling_projector.py
│   └── model.safetensors
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
├── tokenizer.model
└── visual_encoder
    ├── config.json
    ├── model.safetensors
    └── preprocessor_config.json

At this time, the LLaVA model of xtuner-format can engage in conversation using xtuner chat, by

xtuner chat ./iter_39620_xtuner \
  --llava ./iter_39620_xtuner \
  --prompt-template phi3_chat \
  --image $IMAGE_PATH

and in MMBench evaluation, by

xtuner mmbench ./iter_39620_xtuner \
  --llava ./iter_39620_xtuner \
  --prompt-template phi3_chat \
  --data-path $DATA_PATH \
  --work-dir $RESULT_PATH

Here, $DATA_PATH refers to one of the mmbench datasets. You can download the expected data by

wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_EN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_EN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_CN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_CN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/CCBench.tsv

Step 1. Convert LLaVA in xtuner format to official LLaVA format or HuggingFace LLaVA format

The official LLaVA format is structured similarly to the architecture of the liuhaotian/llava-v1.5-7b model.
The HuggingFace LLaVA format is structured similarly to the architecture of the llava-hf/llava-1.5-7b-hf model.

Since the official LLaVA format and the HuggingFace LLaVA format only support Llama architecture as the LLM, we need to first convert the phi-3 model to an equivalent Llama LLM.

python ./convert_phi_to_llama.py --phi_path ./iter_39620_xtuner --save_path ./iter_39620_xtuner_llama_llm

Here, --phi_path should specify the path to phi-3, which is the path obtained from Step.0 for the xtuner-format LLaVA model. --save_path should specify the save path for the converted Llama LLM.

To official LLaVA format (LLaVA-Phi-3-mini)

We can utilize the following command to obtain the LLaVA model in the official LLaVA format.

python ./convert_xtuner_weights_to_llava.py --text_model_id ./iter_39620_xtuner_llama_llm --vision_model_id ./iter_39620_xtuner/visual_encoder --projector_weight ./iter_39620_xtuner/projector/model.safetensors --save_path ./iter_39620_llava

Here, the converted LLaVA model in official LLaVA format is saved to ./iter_39620_llava.

./iter_39620_llava
├── added_tokens.json
├── config.json
├── generation_config.json
├── model-00001-of-00005.safetensors
├── model-00002-of-00005.safetensors
├── model-00003-of-00005.safetensors
├── model-00004-of-00005.safetensors
├── model-00005-of-00005.safetensors
├── model.safetensors.index.json
├── preprocessor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model

To HuggingFace LLaVA format (LLaVA-Phi-3-mini-hf)

We can utilize the following command to obtain the LLaVA model in the HuggingFace LLaVA format.

python ./convert_xtuner_weights_to_hf.py --text_model_id ./iter_39620_xtuner_llama_llm --vision_model_id ./iter_39620_xtuner/visual_encoder --projector_weight ./iter_39620_xtuner/projector/model.safetensors --save_path ./iter_39620_hf

Here, the converted LLaVA model in HuggingFace LLaVA format is saved to ./iter_39620_hf.

./iter_39620_hf
├── added_tokens.json
├── config.json
├── generation_config.json
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
├── model.safetensors.index.json
├── preprocessor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model

Chat

XTuner LLaVA format docs
Official LLaVA format docs
HuggingFace LLaVA format docs
GGUF format docs

No Description

Python Markdown other

36994684+LZHgrla@users.noreply.github.com 41630003+HIT-cwh@users.noreply.github.com 67539920+pppppM@users.noreply.github.com 72799392+xiaohangguo@users.noreply.github.com gjf_mail@126.com 34935911+KooSung@users.noreply.github.com 121340550+amulil@users.noreply.github.com 88702197+humu789@users.noreply.github.com humu@pjlab.org.cn khj.application@aliyun.com 1286304229@qq.com huanghaian@sensetime.com eltociear@gmail.com 95841578+JianxinDong@users.noreply.github.com wangnu_043@126.com 108643365+LKJacky@users.noreply.github.com 54879512+PommesPeter@users.noreply.github.com lyuchqi@gmail.com ws11579@gmail.com wolfsonliu@163.com dele.zhenwu@gmail.com 75657629+fanqiNO1@users.noreply.github.com 30570937+gzlong96@users.noreply.github.com 85681332+maxchiron@users.noreply.github.com 106524776+ooooo-create@users.noreply.github.com

How to access data resources in code

README.md

LLaVA-Phi-3-mini

Results

Resources

Data Preparation

Training

LLaVA-Phi-3-mini

Model Conversion

Step 0. Convert .pth file to LLaVA model in xtuner format (LLaVA-Phi-3-mini-xtuner)

Step 1. Convert LLaVA in xtuner format to official LLaVA format or HuggingFace LLaVA format

To official LLaVA format (LLaVA-Phi-3-mini)

To HuggingFace LLaVA format (LLaVA-Phi-3-mini-hf)

Chat

Contributors (25+) All

Step 0. Convert `.pth` file to LLaVA model in xtuner format (LLaVA-Phi-3-mini-xtuner)

Contributors (25+)
All