Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
liyankang 10208a1a34 | 2 years ago | |
---|---|---|
bert_poattention | 2 years ago | |
bert_use_poattention | 2 years ago | |
poattention | 2 years ago | |
use_poattention | 2 years ago | |
readme.md | 2 years ago |
poattention
(modified from Fairseq): Training the Position-Aware Embedding Generator for seq2seq models.use_poattention
(modified from Fairseq): Generating embeddings for unseen tokens as well as fine-tuning the seq2seq model with a vocabulary for downstream data under the downstream task.bert_poattention
(modified from Transformers): Training the Position-Aware Embedding Generator for bert-like models.bert_use_poattention
(modified from Fairseq): Generating embeddings for unseen tokens, converting parameters of bert-like model to seq2seq one, as well as fine-tuning the seq2seq model with a newly generated vocabulary under the downstream task.For seq2seq pretrained model
poattention
Preprocess upstream and downstream data (refer to Fairseq for details). Binarized data and vocabularies will be stored in data-bin
Move the seq2seq pretrained model (generated by Fairseq) to ./checkpoints
and rename it as checkpoint_last.pt
.
cp path_to_pretrained_model ./checkpoints/checkpoint_last.pt
Train the embedding generator
pip install .; bash train.sh
Stop training when model tends to coverage.
use_poattention
Preprocess upstream and downstream data (refer to Fairseq for details). Binarized data and vocabularies will be stored in data-bin
Get the mapping between upstream and downstream vocabulary.
python get_map_index.py
Note: please change the data name in get_map_index.py
Move the well-trained embedding genearator checkpoint (generated by poattention
) to ./checkpoints
and rename it as checkpoint_last.pt
.
cp path_to_embedding_generator ./checkpoints/checkpoint_last.pt
Generate unseen tokens and finetune the downstream model with downstream vocabulary.
pip install .; bash train.sh
For bert-like pretrained model
bert_poattention
Prepare the upstream data (plain text) at ./examples/language-modeling/data
.
Train the embedding generator
pip install .
cd ./examples/language-modeling
bash train_mlm.sh
bert_use_poattention
Preprocess upstream and downstream data (refer to Fairseq for details). Binarized data and vocabularies will be stored in data-bin
.
Note: Sentences should be cutted by WordPiece, I suggest the bert-vocab-builder for building the vocabulary of downstream data.
Get the mapping between upstream and downstream vocabulary.
python get_map_index.py
Note: please change the data name in get_map_index.py
Generate unseen tokens and finetune the downstream model with downstream vocabulary.
pip install path_to_bert_poattention
pip install .; bash train.sh
Code for "Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation" (ACL 2021)
Python C++ Text reStructuredText Jupyter Notebook other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》