Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
pengsl dbdbc824a2 | 2 years ago | |
---|---|---|
data | 2 years ago | |
src | 2 years ago | |
README.md | 2 years ago |
A deep representation on heterogeneous drug network, termed DeepR2cov, to discover potential agents for treating the excessive inflammatory response in COVID-19 patients.
DeepR2cov is tested to work under:
Download the source code of BERT.
Manually replace the run_pretraining.py
The network representation model and training regime in DeepR2cov are similar to the original implementation described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". Therefore, the code of network representation of DeepR2cov can be downloaded from https://github.com/google-research/bert. But BERT uses a combination of two tasks, i.e,. masked language learning and the consecutive sentences classification. Nevertheless, different from natural language modeling, meta paths do not have a consecutive relationship. Therefore, DeepR2cov does not involve the continuous sentences training. If you want to run DeepR2cov, please manually replace the run_pretraining.py in BERT with this file.
Download the BERT-Base, Uncased model: 12-layer, 768-hidden, 12-heads.
You can construct a vocab file (vocab.txt) of nodes and modify the config file (bert_config.json) which specifies the hyperparameters of the model.
Run create_pretraining_data.py to mask metapath sample.
python create_pretraining_data.py \ --input_file=../example_metapath.txt \ --output_file=../tf_examples.tfrecord \ --vocab_file=../uncased_L-12_H-768_A-12/vocab.txt \ --do_lower_case=True \ --max_seq_length=128 \ --max_predictions_per_seq=20 \ --masked_lm_prob=0.15 \ --random_seed=12345 \ --dupe_factor=5
The max_predictions_per_seq is the maximum number of masked meta path predictions per path sample. masked_lm_prob is the probability for masked token.
python run_pretraining.py \ --input_file=../tf_examples.tfrecord \ --output_dir=../RLearing_output \ --do_train=True \ --do_eval=True \ --bert_config_file=../uncased_L-12_H-768_A-12/bert_config.json \ --train_batch_size=32 \ --max_seq_length=128 \ --max_predictions_per_seq=20 \ --num_train_steps=20 \ --num_warmup_steps=10 \ --learning_rate=2e-5
python extract_features.py \ --input_file=../node.txt \ --output_file=../output.jsonl \ --vocab_file=../uncased_L-12_H-768_A-12/vocab.txt \ --bert_config_file=../uncased_L-12_H-768_A-12/bert_config.json \ --init_checkpoint=../RLearing_output/bert_model.ckpt \ --layers=-1,-2,-3,-4 \ --max_seq_length=128 \ --batch_size=8
python PDI_drug_cov.py
python top_rank.py
@article{DeepR2cov2021,
title = {DeepR2cov: deep representation learning on heterogeneous drug networks to discover anti-inflammatory agents for COVID-19},
author = {Wang, Xiaoqi and Xin, Bin and Tan, Weihong and Xu, Zhijian and Li, Kenli and Li, Fei and Zhong, Wu and Peng, Shaoliang},
journal = {Briefings in Bioinformatics},
year = {2021},
doi = {10.1093/bib/bbab226}
}
If you have any questions or comments, please feel free to email: xqw@hnu.edu.cn.
# DeepR2cov A deep representation on heterogeneous drug network, termed DeepR2cov, to discover potential agents for treating the excessive inflammatory response in COVID-19 patients.
Text Python
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》