English | 简体中文
News
- [2021-10-12] PaddleNLP 2.1 has been officially relealsed! 🎉 For more information please refer to Release Note.
Introduction
PaddleNLP is a powerful NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.
Installation
Prerequisites
- python >= 3.6
- paddlepaddle >= 2.1
More information about PaddlePaddle installation please refer to PaddlePaddle's Website.
Python pip Installation
pip install --upgrade paddlenlp
Easy-to-use API
Taskflow:Off-the-shelf Industial NLP Pre-built Task
Taskflow aims to provide off-the-shelf NLP pre-built task covering NLU and NLG scenario, in the meanwhile with extreamly fast infernece satisfying industrial applications.
from paddlenlp import Taskflow
# Chinese Word Segmentation
seg = Taskflow("word_segmentation")
seg("第十四届全运会在西安举办")
>>> ['第十四届', '全运会', '在', '西安', '举办']
# POS Tagging
tag = Taskflow("pos_tagging")
tag("第十四届全运会在西安举办")
>>> [('第十四届', 'm'), ('全运会', 'nz'), ('在', 'p'), ('西安', 'LOC'), ('举办', 'v')]
# Named Entity Recognition
ner = Taskflow("ner")
ner("《孤女》是2010年九州出版社出版的小说,作者是余兼羽")
>>> [('《', 'w'), ('孤女', '作品类_实体'), ('》', 'w'), ('是', '肯定词'), ('2010年', '时间类'), ('九州出版社', '组织机构类'), ('出版', '场景事件'), ('的', '助词'), ('小说', '作品类_概念'), (',', 'w'), ('作者', '人物类_概念'), ('是', '肯定词'), ('余兼羽', '人物类_实体')]
# Dependency Parsing
ddp = Taskflow("dependency_parsing")
ddp("9月9日上午纳达尔在亚瑟·阿什球场击败俄罗斯球员梅德韦杰夫")
>>> [{'word': ['9月9日', '上午', '纳达尔', '在', '亚瑟·阿什球场', '击败', '俄罗斯', '球员', '梅德韦杰夫'], 'head': [2, 6, 6, 5, 6, 0, 8, 9, 6], 'deprel': ['ATT', 'ADV', 'SBV', 'MT', 'ADV', 'HED', 'ATT', 'ATT', 'VOB']}]
# Sentiment Analysis
senta = Taskflow("sentiment_analysis")
senta("这个产品用起来真的很流畅,我非常喜欢")
>>> [{'text': '这个产品用起来真的很流畅,我非常喜欢', 'label': 'positive', 'score': 0.9938690066337585}]
For more usage please refer to Taskflow Docs
Transformer API: Awesome Pre-trained Model Ecosystem
We provide 30 network architectures and over 100 pretrained models. Not only includes all the SOTA model like ERNIE, PLATO and SKEP released by Baidu, but also integrates most of the high quality Chinese pretrained model developed by other organizations. Use AutoModel to download pretrained mdoels of different architecture. We welcome all developers to contribute your Transformer models to PaddleNLP! 🤗
from paddlenlp.transformers import *
ernie = AutoModel.from_pretrained('ernie-1.0')
ernie_gram = AutoModel.from_pretrained('ernie-gram-zh')
bert = AutoModel.from_pretrained('bert-wwm-chinese')
albert = AutoModel.from_pretrained('albert-chinese-tiny')
roberta = AutoModel.from_pretrained('roberta-wwm-ext')
electra = AutoModel.from_pretrained('chinese-electra-small')
gpt = AutoModelForPretraining.from_pretrained('gpt-cpm-large-cn')
PaddleNLP also provides unified API experience for NLP task like semantic representation, text classification, sentence matching, sequence labeling, question answering, etc.
import paddle
from paddlenlp.transformers import *
tokenizer = AutoTokenizer.from_pretrained('ernie-1.0')
text = tokenizer('natural language understanding')
# Semantic Representation
model = AutoModel.from_pretrained('ernie-1.0')
sequence_output, pooled_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
# Text Classificaiton and Matching
model = AutoModelForSequenceClassification.from_pretrained('ernie-1.0')
# Sequence Labeling
model = AutoModelForTokenClassification.from_pretrained('ernie-1.0')
# Question Answering
model = AutoModelForQuestionAnswering.from_pretrained('ernie-1.0')
For more pretrained model usage, please refer to Transformer API
Dataset API: Abundant Dataset Integration and Quick Loading
from paddlenlp.datasets import load_dataset
train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])
For more dataset API usage please refer to Dataset API.
Embedding API: Quick Loading for Word Embedding
from paddlenlp.embeddings import TokenEmbedding
wordemb = TokenEmbedding("fasttext.wiki-news.target.word-word.dim300.en")
wordemb.cosine_sim("king", "queen")
>>> 0.77053076
wordemb.cosine_sim("apple", "rail")
>>> 0.29207364
For more TokenEmbedding
usage, please refer to Embedding API
More API Usage
Please find more API Reference from our readthedocs.
Wide-range NLP Task Support
PaddleNLP provides rich application examples covering mainstream NLP task to help developers accelerate problem solving.
NLP Basic Technique
NLP System
NLP Extented Applications
Tutorials
Please refer to our official AI Studio account for more interactive tutorials: PaddleNLP on AI Studio
Special Interest Group (SIG)
Welcome to join PaddleNLP SIG for contribution, eg. Dataset, Models and Toolkit.
Slack
To connect with other users and contributors, welcome to join our Slack channel.
QQ
Join our QQ Technical Group for technical exchange right now! ⬇️
ChangeLog
For more details about our release, please refer to ChangeLog
Acknowledge
We have borrowed from Hugging Face's Transformer🤗 excellent design on pretrained models usage, and we would like to express our gratitude to the authors of Hugging Face and its open source community.
License
PaddleNLP is provided under the Apache-2.0 License.