Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
药物副作用挖掘代码
python 3.6
TensorFlow 1.12
Psytar数据集
训练集 | 1533 |
---|---|
测试集 | 326 |
NER结果(SAC lambda=0.1)
model | P | R | F | 运行路径 |
---|---|---|---|---|
BiLSTM+CRF | 0.6489 | 0.7007 | 0.6738 | /home/cs/NCRF-SAC/model/NCRFSAC/predict_bykt_v1.sh |
BERT+CRF | 0.6454 | 0.7913 | 0.7170 | /home/cs/bert/run_bykt_ner_psytar.sh |
BiLSTM+CRF+SAC | 0.6598 | 0.7072 | 0.6826 | /home/cs/NCRF-SAC/model/NCRFSAC/predict_bykt_v2.sh |
BERT+CRF+SAC(0.1) | 0.6533 | 0.8136 | 0.7247 | /home/cs/bert/run_bykt_ner_sac_psytar.sh |
BiLSTM+CRF+SAC+Drug_Emb | 0.6375 | 0.7425 | 0.6860 | /home/cs/NCRF-SAC/model/NCRFSAC/predict_bykt_v3.sh |
BERT+CRF+SAC+Drug_Emb | 0.6592 | 0.8123 | 0.7278 | /home/cs/bert/run_bykt_ner_sac_de_psytar.sh |
BERT+MRC | 0.7427 | 0.7320 | 0.7373 | /home/cs/bert/run_ner_qa.sh |
**Cadec数据集 **
训练集 | 5825 |
---|---|
测试集 | 1696 |
NER结果(SAC lambda=0.1)
model | P | R | F | 运行路径 |
---|---|---|---|---|
BiLSTM+CRF | 0.6399 | 0.6909 | 0.6645 | predict_bykt_v4.sh |
BERT+CRF | 0.6555 | 0.7192 | 0.6858 | run_bykt_ner_cadec.sh |
BiLSTM+CRF+SAC | 0.6647 | 0.6791 | 0.6719 | predict_bykt_v5.sh |
BERT+CRF+SAC(0.1) | 0.6519 | 0.7417 | 0.6939 | run_bykt_ner_sac_cadec.sh |
BiLSTM+CRF+SAC+Drug_Emb | 0.6685 | 0.6988 | 0.6833 | predict_bykt_v6.sh |
BERT+CRF+SAC+Drug_Emb | 0.6527 | 0.7561 | 0.7006 | run_bykt_ner_sac_de_cadec.sh |
BERT+MRC | 0.7415 | 0.6870 | 0.7132 | run_ner_qa.sh |
多种统计方法计算相似度生成候选:http://10.249.40.248:8888/notebooks/Work2/bykt/dataset_process_src/process_adr2mdr.ipynb
数据集 | 训练集 | 测试集 |
---|---|---|
cadec | 2137 | 535 |
psytar | 1933 | 483 |
基于分类
cadec /home/cs/bert/run_multicall_bykt_cadec.sh
psytar /home/cs/bert/run_multicall_bykt_psytar.sh
数据集 | 准确率 | 运行代码 | 备注 |
---|---|---|---|
cadec | 0.6018 | /home/cs/bert/run_multicall_bykt_cadec.sh | py文件 271注释掉使用272行239行注释掉使用240行 |
psytar | 0.7143 | /home/cs/bert/run_multicall_bykt_psytar.sh | py文件239行注释,使用240行注释271行,使用272行 |
基于排序
配合多分类结果,所以测试的时候要先运行多分类的结果
召回数目 | cadec | 备注 | psytar | 备注 |
---|---|---|---|---|
10 | 0.697196261682243 | 第一步:run_multicall_bykt_cadec.sh py文件 272注释掉使用271行240行注释掉使用239行第二步:run_rank_bykt_cadec.py注释243 使用244注释736-765,使用767-798阈值0.001 | 0.7577639751552795 | 第一步:run_multicall_bykt_cadec.sh py文件 272注释掉使用271行240行注释掉使用239行第二步:run_rank_bykt_cadec.py注释243 使用244注释736-765,使用767-798 |
30 | 0.6953271028037383 | 同上 | 0.7701863354037267 | 同上 |
50 | 0.7046728971962617 | 同上 | 0.7639751552795031 | 同上 |
100 | 0.7009345794392523 | 同上 | 0.7577639751552795 | 同上 |
运行代码:
cadec:run_rank_bykt_cadec.sh
更改命令行参数 和 python文件内部的阈值
CUDA_VISIBLE_DEVICES=3 python run_rank_bykt_cadec.py \
--task_name=match \
--do_train=False \
--do_eval=True \
--do_predict=True \
--data_dir=/home/cs/bykt/dataset_process_src/ \
--vocab_file=uncased_L-12_H-768_A-12/vocab.txt \
--bert_config_file=uncased_L-12_H-768_A-12/bert_config.json \
--init_checkpoint=uncased_L-12_H-768_A-12/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=128 \
--eval_batch_size=128 \
--predict_batch_size=128 \
--learning_rate=3e-5 \
--num_train_epochs=10 \
--recall_nums=100 \
--output_dir=./output/bykt_rank_cadec_v100/
排序的pipeline,需要先将测试集召回对应候选
基于pipeline的标准化结果
运行流程:
先输出标准化结果:cadec run_multicall_bykt_cadec.sh
文件保存于output/multiclass_cadec/multi_cadec_pred.txt
然后运行cadec联调
2)基于排序方法:
先输出标注化预测结果表:cadec:run_rank_bykt_cadec.sh
保存于./output/bykt_rank_cadec_v100/
output/bykt_rank_cadec_v100/rank_v100_cadec_pred.txt
然后运行联调代码
CADEC
模型 | P | R | F | 备注 |
---|---|---|---|---|
multiclass_new | 0.611810261374637 | 0.5668161434977579 | 0.5884543761638734 | |
rank_v10 | 0.6389157792836399 | 0.5919282511210763 | 0.6145251396648045 | http://localhost:8888/notebooks/Desktop/%E8%AF%BE%E9%A2%98/bykt/dataset_process_src/cadec_liantiao/evaluate.ipynb |
rank_v30 | 0.6389157792836399 | 0.5919282511210763 | 0.6145251396648045 | 同上 |
rank_v50 | 0.6418199419167473 | 0.5946188340807175 | 0.61731843575419 | 同上 |
rank_v100 | 0.6369796708615683 | 0.590134529147982 | 0.6126629422718808 | 同上 |
PSYTAR
模型 | P | R | F |
---|---|---|---|
multiclass_new | 0.6326259946949602 | 0.6235294117647059 | 0.6280447662936143 |
rank_v10 | 0.6538461538461539 | 0.6444444444444445 | 0.6491112574061884 |
rank_v30 | 0.6631299734748011 | 0.6535947712418301 | 0.6583278472679394 |
rank_v50 | 0.6551724137931034 | 0.6457516339869281 | 0.6504279131007241 |
rank_v100 | 0.6525198938992043 | 0.6431372549019608 | 0.6477946017116525 |
药物副作用抽取
Python Shell
Apache-1.0
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》