Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
liu zhengxi 7ac9971b80 | 3 years ago | |
---|---|---|
.github | 3 years ago | |
docs | 3 years ago | |
examples | 3 years ago | |
legacy | 3 years ago | |
paddlenlp | 3 years ago | |
tests | 3 years ago | |
.clang-format | 3 years ago | |
.clang_format.hook | 3 years ago | |
.gitignore | 3 years ago | |
.pre-commit-config.yaml | 3 years ago | |
.readthedocs.yaml | 3 years ago | |
.style.yapf | 3 years ago | |
LICENSE | 3 years ago | |
README.md | 3 years ago | |
README_en.md | 3 years ago | |
hubconf.py | 3 years ago | |
requirements.txt | 3 years ago | |
setup.py | 3 years ago |
简体中文 | English
PaddleNLP 2.0拥有覆盖多场景的模型库、简洁易用的全流程API与动静统一的高性能分布式训练能力,旨在为飞桨开发者提升文本领域建模效率,并提供基于PaddlePaddle 2.0的NLP领域最佳实践。
覆盖多场景的模型库
简洁易用的全流程API
动静统一的高性能分布式训练
pip install --upgrade paddlenlp -i https://pypi.org/simple
如果您想体验最新的版本,可以使用以下命令进行源码安装,支持GitHub和Gitee两种方式。
pip install --upgrade git+https://github.com/PaddlePaddle/PaddleNLP.git
pip install --upgrade git+https://gitee.com/PaddlePaddle/PaddleNLP.git
更多关于PaddlePaddle的安装和PaddleNLP安装详细教程请查看Installation
from paddlenlp.datasets import load_dataset
train_ds, dev_ds, test_ds = load_dataset("chnsenticorp", splits=["train", "dev", "test"])
可参考Dataset文档查看更多数据集。
from paddlenlp.embeddings import TokenEmbedding
wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643
内置50+中文词向量,更多使用方法请参考Embedding文档。
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel, GPTForPretraining
ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
gpt = GPTForPretraining.from_pretrained('gpt-cpm-large-cn')
import paddle
from paddlenlp.transformers import ErnieTokenizer, ErnieModel
tokenizer = ErnieTokenizer.from_pretrained('ernie-1.0')
model = ErnieModel.from_pretrained('ernie-1.0')
text = tokenizer('自然语言处理')
pooled_output, sequence_output = model(input_ids=paddle.to_tensor([text['input_ids']]))
请参考Transformer API文档查看目前支持的预训练模型。
PaddleNLP模型库整体介绍请参考文档PaddleNLP Model Zoo。
模型应用场景介绍请参考PaddleNLP Examples。
更多教程参见PaddleNLP on AI Studio。
PaddleNLP遵循Apache-2.0开源协议。
黑客松task_55,在PaddleNLP的Roberta中,新增 MultipleChoice,MaskedLM 和 CausalLM三个类,7个模型权重. ,新增BPETokenizer
Python C++ Cuda Text Shell other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》