Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
huwenxing 6cb26819af | 4 years ago | |
---|---|---|
THUCNews | 4 years ago | |
models | 4 years ago | |
LICENSE | 4 years ago | |
README.md | 4 years ago | |
run.py | 4 years ago | |
train_eval.py | 4 years ago | |
utils.py | 4 years ago | |
utils_fasttext.py | 4 years ago |
中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention, DPCNN, Transformer, 基于pytorch,开箱即用。
模型介绍、数据流动过程:我的博客
数据以字为单位输入模型,预训练词向量使用 搜狗新闻 Word+Character 300d,点这里下载
python 3.7
pytorch 1.1
tqdm
sklearn
tensorboardX
我从THUCNews中抽取了20万条新闻标题,已上传至github,文本长度在20到30之间。一共10个类别,每类2万条。
类别:财经、房产、股票、教育、科技、社会、时政、体育、游戏、娱乐。
数据集划分:
数据集 | 数据量 |
---|---|
训练集 | 18万 |
验证集 | 1万 |
测试集 | 1万 |
python run.py --model TextCNN --word True
模型 | acc | 备注 |
---|---|---|
TextCNN | 91.22% | Kim 2014 经典的CNN文本分类 |
TextRNN | 91.12% | BiLSTM |
TextRNN_Att | 90.90% | BiLSTM+Attention |
TextRCNN | 91.54% | BiLSTM+池化 |
FastText | 92.23% | bow+bigram+trigram, 效果出奇的好 |
DPCNN | 91.25% | 深层金字塔CNN |
Transformer | 89.91% | 效果较差 |
bert | 94.83% | bert + fc |
ERNIE | 94.61% | 比bert略差(说好的中文碾压bert呢) |
bert和ERNIE模型代码我放到另外一个仓库了,传送门:Bert-Chinese-Text-Classification-Pytorch,后续还会搞一些bert之后的东西,欢迎star。
# 训练并测试:
# TextCNN
python run.py --model TextCNN
# TextRNN
python run.py --model TextRNN
# TextRNN_Att
python run.py --model TextRNN_Att
# TextRCNN
python run.py --model TextRCNN
# FastText, embedding层是随机初始化的
python run.py --model FastText --embedding random
# DPCNN
python run.py --model DPCNN
# Transformer
python run.py --model Transformer
模型都在models目录下,超参定义和模型定义在同一文件中。
[1] Convolutional Neural Networks for Sentence Classification
[2] Recurrent Neural Network for Text Classification with Multi-Task Learning
[3] Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification
[4] Recurrent Convolutional Neural Networks for Text Classification
[5] Bag of Tricks for Efficient Text Classification
[6] Deep Pyramid Convolutional Neural Networks for Text Categorization
[7] Attention Is All You Need
中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention, DPCNN, Transformer, 基于pytorch,开箱即用。
Text Pickle Python
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》