Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
Zhihao Lin 81d66e6906 | 2 weeks ago | |
---|---|---|
.. | ||
finetune | 3 weeks ago | |
pretrain | 3 weeks ago | |
README.md | 2 weeks ago | |
convert_phi_to_llama.py | 3 weeks ago | |
convert_xtuner_weights_to_hf.py | 3 weeks ago | |
convert_xtuner_weights_to_llava.py | 3 weeks ago |
Model | MMBench Test (EN) | MMMU Val | SEED-IMG | AI2D Test | ScienceQA Test | HallusionBench aAcc | POPE | GQA | TextVQA | MME | MMStar | Configs |
---|---|---|---|---|---|---|---|---|---|---|---|---|
LLaVA-v1.5-7B | 66.5 | 35.3 | 60.5 | 54.8 | 70.4 | 44.9 | 85.9 | 62.0 | 58.2 | 1511/348 | 30.3 | - |
LLaVA-Llama-3-8B | 68.9 | 36.8 | 69.8 | 60.9 | 73.3 | 47.3 | 87.2 | 63.5 | 58.0 | 1506/295 | 38.2 | Pretrain / Fine-tune |
LLaVA-Llama-3-8B-v1.1 | 72.3 | 37.1 | 70.1 | 70.0 | 72.9 | 47.7 | 86.4 | 62.6 | 59.0 | 1469/349 | 45.1 | Pretrain / Fine-tune |
LLaVA-Phi-3-mini | 69.2 | 41.4 | 70.0 | 69.3 | 73.7 | 49.8 | 87.3 | 61.5 | 57.8 | 1477/313 | 43.7 | Pretrain / Fine-tune |
xtuner/llava-phi-3-mini
): 🤗 HuggingFace / 🤖 ModelScopextuner/llava-phi-3-mini-hf
): 🤗 HuggingFace / 🤖 ModelScopextuner/llava-phi-3-mini-xtuner
): 🤗 HuggingFace / 🤖 ModelScopextuner/llava-phi-3-mini-gguf
): 🤗 HuggingFace / 🤖 ModelScopePlease refer to here.
NPROC_PER_NODE=8 xtuner train llava_phi3_mini_4k_instruct_clip_vit_large_p14_336_e1_gpu8_sharegpt4v_pretrain --deepspeed deepspeed_zero2 --seed 1024
NPROC_PER_NODE=8 xtuner train llava_phi3_mini_4k_instruct_full_clip_vit_large_p14_336_full_e2_gpu8_internvl_finetune --deepspeed deepspeed_zero2 --seed 1024
.pth
file to LLaVA model in xtuner format (LLaVA-Phi-3-mini-xtuner)After training, we will obtain a set of weights (i.e., iter_xxx.pth
), which are not in the universal HuggingFace format. We first need to convert them to the LLaVA model in xtuner format.
xtuner convert pth_to_hf $FINETUNE_CFG $PTH_PATH $SAVE_PATH
# e.g., xtuner convert pth_to_hf llava_phi3_mini_4k_instruct_full_clip_vit_large_p14_336_full_e2_gpu8_internvl_finetune ./iter_39620.pth ./iter_39620_xtuner
./iter_39620_xtuner
├── added_tokens.json
├── config.json
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
├── model.safetensors.index.json
├── projector
│ ├── config.json
│ ├── configuration_projector.py
│ ├── modeling_projector.py
│ └── model.safetensors
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
├── tokenizer.model
└── visual_encoder
├── config.json
├── model.safetensors
└── preprocessor_config.json
At this time, the LLaVA model of xtuner-format can engage in conversation using xtuner chat, by
xtuner chat ./iter_39620_xtuner \
--llava ./iter_39620_xtuner \
--prompt-template phi3_chat \
--image $IMAGE_PATH
and in MMBench evaluation, by
xtuner mmbench ./iter_39620_xtuner \
--llava ./iter_39620_xtuner \
--prompt-template phi3_chat \
--data-path $DATA_PATH \
--work-dir $RESULT_PATH
Here, $DATA_PATH
refers to one of the mmbench datasets. You can download the expected data by
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_EN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_EN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_DEV_CN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/MMBench_TEST_CN.tsv
wget https://opencompass.openxlab.space/utils/VLMEval/CCBench.tsv
Since the official LLaVA format and the HuggingFace LLaVA format only support Llama architecture as the LLM, we need to first convert the phi-3 model to an equivalent Llama LLM.
python ./convert_phi_to_llama.py --phi_path ./iter_39620_xtuner --save_path ./iter_39620_xtuner_llama_llm
Here, --phi_path
should specify the path to phi-3, which is the path obtained from Step.0 for the xtuner-format LLaVA model. --save_path
should specify the save path for the converted Llama LLM.
We can utilize the following command to obtain the LLaVA model in the official LLaVA format.
python ./convert_xtuner_weights_to_llava.py --text_model_id ./iter_39620_xtuner_llama_llm --vision_model_id ./iter_39620_xtuner/visual_encoder --projector_weight ./iter_39620_xtuner/projector/model.safetensors --save_path ./iter_39620_llava
Here, the converted LLaVA model in official LLaVA format is saved to ./iter_39620_llava
.
./iter_39620_llava
├── added_tokens.json
├── config.json
├── generation_config.json
├── model-00001-of-00005.safetensors
├── model-00002-of-00005.safetensors
├── model-00003-of-00005.safetensors
├── model-00004-of-00005.safetensors
├── model-00005-of-00005.safetensors
├── model.safetensors.index.json
├── preprocessor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model
We can utilize the following command to obtain the LLaVA model in the HuggingFace LLaVA format.
python ./convert_xtuner_weights_to_hf.py --text_model_id ./iter_39620_xtuner_llama_llm --vision_model_id ./iter_39620_xtuner/visual_encoder --projector_weight ./iter_39620_xtuner/projector/model.safetensors --save_path ./iter_39620_hf
Here, the converted LLaVA model in HuggingFace LLaVA format is saved to ./iter_39620_hf
.
./iter_39620_hf
├── added_tokens.json
├── config.json
├── generation_config.json
├── model-00001-of-00002.safetensors
├── model-00002-of-00002.safetensors
├── model.safetensors.index.json
├── preprocessor_config.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model
No Description
Python Markdown other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》