Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
xiongying 0d4587c1fa | 2 years ago | |
---|---|---|
OpenKE | 2 years ago | |
configs | 2 years ago | |
data | 2 years ago | |
data_processing | 2 years ago | |
jena | 2 years ago | |
networks | 2 years ago | |
scripts | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 2 years ago | |
check_kg.py | 2 years ago | |
converter.py | 2 years ago | |
dataset.py | 2 years ago | |
evaluate.py | 2 years ago | |
loader.py | 2 years ago | |
reader.py | 2 years ago | |
run.py | 2 years ago | |
utils.py | 2 years ago |
configs/ -- 存放配置文件
data/ -- 存放数据集以及外部知识数据
data_processing/ -- 存放数据处理代码
embeds/ -- 存放词向量
jena/ -- 存放Jena用于加载解析CDR的MeSH三元组数据
OpenKE/ -- 训练知识图谱嵌入表示
networks/ -- 存放主要的网络结构代码,以下列出主要的代码
| layers/ -- 存放网络结构层的代码
\ walks.py -- walk aggregation layer, EoG的核心
\ modules.py -- 词嵌入层、局部信息编码层和全局信息编码层
\ loss.py -- Soft F-Measure损失函数
\ attention.py -- MM边的表示中需要用到的Argument Based Attention计算
\ gcn.py -- 对比面向结点的图神经网络GCN
\ gat.py -- 对比面向结点的图神经网路GAT
\ trainer.py -- 数据处理、模型构建、训练、验证以及预测的流程控制
\ baseline.py -- 原始的EoG,改进了SS边的表示,以及修正了[bug](https://github.com/fenchri/edge-oriented-graph/issues/12)
\ nog.py -- 对比使用面向结点的图神经网络GCN和GAT
\ baseline_fm.py -- 加入Soft F-Measure Loss
\ baseline_fm_doc_sec_by_enc.py -- 根据配置文件可以加入文档结点和章节结点
\ baseline_fm_doc_sec_ne_by_enc_nonmissing.py -- 加入结构性知识,根据配置文件可选择: +KS/TransE或+KS/RESCAL
\ baseline_fm_doc_sec_ne_by_enc_cat.py -- 对比拼接方式加入结构性知识,+Cat/TransE
\ baseline_fm_doc_sec_ne_desc_by_max_nonmissing.py -- +KS/TransE+KD/Emb
\ baseline_fm_doc_sec_ne_desc_by_enc4_nonmissing.py -- +KS/TransE+KD/Enc
\ baseline_fm_doc_sec_ne_desc_by_enc_nonmissing_d2v -- +KS/TransE+KD/D2V
\ ... 其余代码可以不管
scripts/ -- 存放运行脚本
dataset.py -- 加载整个数据集
evaluate.py -- 验证结果文件
loader.py -- 用于获取mini-batch形式的数据
reader.py -- 用于读取数据集中的格式化后的数据
run.py -- 运行入口代码文件
utils.py -- 工具代码
所有实验结果都在10.249.145.243:/disk3/lt/Coding/G4RE
的outputs
目录中
以时间作为文件命名,对应的记录在:
https://docs.qq.com/sheet/DYmhuRWJ4VlR5ZExj?tab=BB08J2
克隆OpenKE
:
git clone https://github.com/thunlp/OpenKE
使用的虚拟环境
解压后放在anaconda安装目录的envs
目录下,注意替换里面的执行路径才能正常使用
也可以在111的/home/lt/anaconda3/envs
或247的相应目录下找到
cd data/MESH
wget ftp://ftp.nlm.nih.gov/online/mesh/rdf/2017/mesh2017.nt.gz
wget ftp://ftp.nlm.nih.gov/online/mesh/rdf/2015/mesh2015.nt.gz
gunzip mesh2017.nt.gz
gunzip mesh2015.nt.gz
利用Jena Fuseki的SPARQL接口查询mesh.nt中的数据,得到MeSH知识图谱的三元组数据,用于得到知识图谱嵌入表示
cd jena/apache-jena-3.13.0
bash main.sh
cd ..
cd apache-jena-fuseki-3.13.0
bash main.sh
cd jena
python connect_medic_to_mesh.py
cd -
cd data/MESH
python create_trans_data.py
这里对MeSH知识图谱进行了精简,抽取了3种关系,其对应的谓词为:
抽取了3种类型的实体:
conda activate openke
python data/MESH/create_trans_data.py
cd OpenKE
python train_cdr_transe.py # TransE得到的知识图谱嵌入表示
python train_cdr_rescal.py # RESCAL得到的知识图谱嵌入表示
cd data/CTD
gunzip CTD_chemicals.csv.gz
gunzip CTD_diseases.csv.gz
python merge_ctd.py # 将Chemicals和Diseases的描述合在一起使用
cd data_processing
bash process_cdr.sh
python statistics.py --data ../data/CDR/processed/train.data
python statistics.py --data ../data/CDR/processed/dev.data
python statistics.py --data ../data/CDR/processed/test.data
# 处理描述性知识
python description2pkl.py ../data/CTD/CTD.csv ../data/CTD/CTD.pkl
# 训练Doc2Vec
python desc_doc2vec.py
python get_doc2vec_pkl.py
处理CHR数据集,去除self-relation,下载Biochem4j图谱中化学实体构成的三元组数据,用于得到知识图谱嵌入表示
cd data/CHR
python process_chr.py
cd data/BioChem4j
python download.py
python create_trans_data.py
BioChem4j精简后包括的关系类型:
实体类型仅包含Chemicals
conda activate openke
python data/BioChem4j/create_trans_data.py
cd OpenKE
conda activate openke
python train_chr_transe.py # TransE得到的知识图谱嵌入表示
python train_chr_rescal.py # RESCAL得到的知识图谱嵌入表示
cd data_processing
bash process_chr.sh
python statistics.py --data ../data/CHR/processed/train.data
python statistics.py --data ../data/CHR/processed/dev.data
python statistics.py --data ../data/CHR/processed/test.data
配置文件在configs/
目录下
运行脚本在scripts/
目录下
运行,例如:
bash scripts/cdr/run_baseline_fm_doc_sec_ne_desc.sh
bash scripts/cdr/run_evaluate.sh
数据处理部分主要在219.223.251.111:/home/lt/Coding/G4RE
中
实验部分主要在10.249.145.243:/disk3/lt/Coding/G4RE
中
原始的代码来自https://github.com/fenchri/edge-oriented-graph
,对应的论文《 Connecting the Dots Document-level Neural Relation Extraction with Edge-oriented Graphs 》建议阅读之后会对本部分的代码有更好的了解
这里保留了数据预处理部分,但是重写了所有的网络部分,同时增加了CHR数据集,以及外部知识的预处理。
T. Li, W. Peng, Q. Chen, X. Wang and B. Tang, "KEoG: A knowledge-aware edge-oriented graph neural network for document-level relation extraction," 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020, pp. 1740-1747, doi: 10.1109/BIBM49941.2020.9313590.
Document-level named entity relation extraction
Text Python JavaScript SVG Java other
Apache-1.0
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》