Distantly supervised biomedical relation extraction using piecewise attentive convolutional neural network and reinforcement learning
This repository contains the source code, model file and dataset for the paper:
Tiantian Zhu, Yang Qin, Yang Xiang, Baotian Hu*, Qingcai Chen*, Weihua Peng. Distantly supervised biomedical relation extraction using piecewise attentive convolutional neural network and reinforcement learning. JAMIA.
Overview
Biomedical Relation Extraction (RE) is one of the most important tasks in bioinformatics, which plays a significant role in biomedical information extraction. Most of the existing methods are trained in a supervised manner and rely on large scale labeled data. In order to save the cost of data labeling, we propose a new method called PACNN+RL trained with distant supervision that can automatically generate labeled data. The predictor is constructed based on the Piecewise Attentive Convolutional Neural Network (PACNN) and a Reinforcement Learning (RL) agent.
Requirements
This repo was tested on Python 3.6.10 and Tensorflow 1.14.0 The main requirements are:
- python = 3.6.10
- tensorflow-gpu = 1.14.0
- cuda version = 10.0
Datasets
- [May-prevent]
- [May-treat]
- [DDI]
- [PPI]
- [PRETRAINED_FILE]
Usage
-
Get pre-trained word embedding file
Download pretrained word embedding files and decompress it under data/medicine/
.
-
Restore weight file and finetune model
Put trained mdoel files under root dir. Finally specify the trained model path in train.py
-
Train data
Put Datasets under data/medicine/
,Finally specify the train data name and test data name in training process
-
train model
python train.py data/medicine seed train_data_name test_data_name
The command parameter seed means random seed used for model initialization parameters, like 1234.