Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
pengsl 6d8adbea75 | 2 years ago | |
---|---|---|
data | 2 years ago | |
src | 2 years ago | |
README.md | 2 years ago |
BioERP: a biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions.
BioERP is tested to work under:
Download the source code of BERT.
Manually replace the run_pretraining.py
The network representation model and training regime in BioERP are similar to the original implementation described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". Therefore, the code of network representation of BioERP can be downloaded from https://github.com/google-research/bert. But BERT uses a combination of two tasks, i.e,. masked language learning and the consecutive sentences classification. Nevertheless, different from natural language modeling, meta paths do not have a consecutive relationship. Therefore, BioERP does not involve the continuous sentences training. If you want to run BioERP, please manually replace the run_pretraining.py and run_classifier.py in BERT with these files.
Download the BERT-Base, Uncased model: 12-layer, 768-hidden, 12-heads.
You can construct a vocab file (vocab.txt) of nodes and modify the config file (bert_config.json) which specifies the hyperparameters of the model.
Run create_pretraining_data.py to mask metapath sample.
python create_pretraining_data.py \ --input_file=~path/metapath.txt \ --output_file=~path/tf_examples.tfrecord \ --vocab_file=~path/uncased_L-12_H-768_A-12/vocab.txt \ --do_lower_case=True \ --max_seq_length=128 \ --max_predictions_per_seq=20 \ --masked_lm_prob=0.15 \ --random_seed=12345 \ --dupe_factor=5
The max_predictions_per_seq is the maximum number of masked meta path predictions per path sample. masked_lm_prob is the probability for masked token.
python run_pretraining.py \ --input_file=~path/tf_examples.tfrecord \ --output_dir=~path/Local_RLearing_output \ --do_train=True \ --do_eval=True \ --bert_config_file=~path/uncased_L-12_H-768_A-12/bert_config.json \ --train_batch_size=32 \ --max_seq_length=128 \ --max_predictions_per_seq=20 \ --num_train_steps=20000 \ --num_warmup_steps=10 \ --learning_rate=2e-5
python run_classifier.py \ --task_name=CoLA \ --do_train=true \ --do_eval=true \ --data_dir=~path/all_path \ --vocab_file=~path/vocab.txt \ --bert_config_file=~path/bert_config.json \ --max_seq_length=128 \ --train_batch_size=256 \ --learning_rate=2e-5 \ --num_train_epochs=10 \ --output_dir=~path/Global_RLearing_output
python extract_features.py \ --input_file=~path/node.txt \ --output_file=~path/output.jsonl \ --vocab_file=~path/uncased_L-12_H-768_A-12/vocab.txt \ --bert_config_file=~path/uncased_L-12_H-768_A-12/bert_config.json \ --init_checkpoint=~path/Local_RLearing_output(or Global_RLearing_output)/model.ckpt \ --layers=-1,-2,-3,-4 \ --max_seq_length=7 \ --batch_size=8
python TDI_NeoDTI.py
@article{BioERP2021,
title = {BioERP: biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions},
author = {Wang Xiaoqi, and Yang Yaning, and Li Kenli, and Li Wentao, and Li Fei, and Peng Shaoliang},
journal = {Bioinformatics},
year = {2021},
doi = {10.1093/bioinformatics/btab565}
}
If you have any questions or comments, please feel free to email: xqw@hnu.edu.cn.
BioERP: a biomedical heterogeneous network-based self-supervised representation learning approach for entity relationship predictions.
Python Text
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》