@@ -16,3 +16,6 @@ git push -u origin master | |||
使用这个镜像 | |||
dockerhub.pcl.ac.cn:5000/user-images/xuym:lhl_cuda101_py36torch171gpu | |||
新增加项目:飞桨端到端FAQ智能问答系统 | |||
使用这个镜像:192.168.242.22:443/default-workspace/fccb038c23234b9e80105d4ccd152117/image:xmm |
@@ -0,0 +1,504 @@ | |||
{ | |||
"cells": [ | |||
{ | |||
"cell_type": "markdown", | |||
"id": "9a583d55", | |||
"metadata": {}, | |||
"source": [ | |||
"# 飞桨端到端FAQ智能问答系统\n", | |||
"文档:https://openi.pcl.ac.cn/PaddlePaddle/PaddleNLP/src/branch/develop/pipelines/examples/FAQ" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "ed49d413", | |||
"metadata": {}, | |||
"source": [ | |||
"## 系统特色\n", | |||
"\n", | |||
"* 端到端\n", | |||
"\n", | |||
"提供包括数据建库、模型服务部署、WebUI 可视化一整套端到端FAQ智能问答系统能力\n", | |||
"\n", | |||
"多源数据支持: 支持对 Txt、Word、PDF、Image 多源数据进行解析、识别并写入 ANN 数据库\n", | |||
"* 效果好\n", | |||
"\n", | |||
"依托百度领先的NLP技术,包括ERNIE语义理解技术与RocketQA开放域问答技术\n", | |||
"\n", | |||
"预置领先的深度学习模型\n", | |||
"\n", | |||
"## 首先环境配置\n", | |||
"\n", | |||
"镜像已经安装好飞桨和paddleNLP:192.168.242.22:443/default-workspace/fccb038c23234b9e80105d4ccd152117/image:xmm\n", | |||
"\n", | |||
"若没有安装好环境,可以参考下面步骤:\n", | |||
"\n", | |||
"升级飞桨到2.5.1" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "b2678514", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# !python -m pip install paddlepaddle-gpu==2.5.1.post102 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html\n", | |||
"!pip uninstall paddlepaddle-gpu -y\n", | |||
"!python -m pip install paddlepaddle-gpu==2.5.1.post117 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "ead664f5", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"# 如果没有git,就要安装git \n", | |||
"!apt update\n", | |||
"!apt install git -y" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "3c2223e6", | |||
"metadata": {}, | |||
"source": [ | |||
"### 安装pipelines非常容易卡住,建议分步分库安装" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "08a82b58", | |||
"metadata": { | |||
"scrolled": true | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# 下载PaddleNLP库文件\n", | |||
"!git clone https://openi.pcl.ac.cn/PaddlePaddle/PaddleNLP.git\n", | |||
"# !pip uninstall paddlenlp paddle-pipelines -y\n", | |||
"%cd /code/PaddleNLP/\n", | |||
"!pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple -q\n", | |||
"!python setup.py install " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "f8e6a204", | |||
"metadata": { | |||
"scrolled": true | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"%cd /code/PaddleNLP/pipelines\n", | |||
"!pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple \n", | |||
"!python setup.py install " | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "9675e387", | |||
"metadata": {}, | |||
"source": [ | |||
"### 将PaddleNLP/pipelines/requirements.txt 文件拆分成多个文件安装" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "642ea577", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install -r /code/work/rq1.txt -i https://mirror.baidu.com/pypi/simple " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "0f834717", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install preshed " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "3f0a2b2b", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install -r /code/work/rq2.txt -i https://mirror.baidu.com/pypi/simple " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "719231ae", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install -r /code/work/rq3.txt -i https://mirror.baidu.com/pypi/simple " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "0209100c", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install fastapi uvicorn markdown numba -i https://mirror.baidu.com/pypi/simple " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "d2b8eb6c", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install pymilvus>=2.1 wordcloud==1.8.2.2 boilerpy3 events -i https://mirror.baidu.com/pypi/simple \n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "29ccb336", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install sseclient-py==1.7.2 -i https://mirror.baidu.com/pypi/simple " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "096cbed5", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install typing_extensions==4.5 -i https://mirror.baidu.com/pypi/simple " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "10ee5ed0", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip install spacy -i https://mirror.baidu.com/pypi/simple " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "ea4fcede", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"现在看看飞桨相关库是否安装好\n", | |||
"\n", | |||
"有时候需要重启内核" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "9a1af1de", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"!pip list |grep paddle" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "2d04df77", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"import paddle" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "3458b6a2", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [ | |||
"paddle.randn??" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "1af08e2a", | |||
"metadata": {}, | |||
"source": [ | |||
"## 端到端FAQ智能问答系统一键启动\n", | |||
"若能启动,证明整个系统环境正常,就可以后面的学习实践了。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "95866015", | |||
"metadata": { | |||
"scrolled": true | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"!cd /code/PaddleNLP/pipelines && python examples/FAQ/dense_faq_example.py --device gpu\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "bba09de2", | |||
"metadata": {}, | |||
"source": [ | |||
"## 3.4 构建 Web 可视化FAQ智能问答\n", | |||
"\n", | |||
"整个 Web 可视化FAQ智能问答主要包含 3 大组件: 1. 基于 ElasticSearch 的 ANN 服务 2. 基于 RestAPI 构建模型服务 3. 基于 Streamlit 构建 WebUI,接下来我们依次搭建这 3 个服务并最终形成可视化的FAQ智能问答。\n", | |||
"\n", | |||
"3.4.1 启动 ANN 服务\n", | |||
"\n", | |||
"参考官方文档下载安装 elasticsearch-8.3.2 并解压。\n", | |||
"启动 ES 服务\n", | |||
"首先修改config/elasticsearch.yml的配置:\n", | |||
"xpack.security.enabled: false\n", | |||
"\n", | |||
"已下载并安装在work/elasticsearch-8.8.2目录\n", | |||
"\n", | |||
"然后启动:\n", | |||
"\n", | |||
"./bin/elasticsearch\n", | |||
"\n", | |||
"到安装目录执行上面命令" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "4553fbe6", | |||
"metadata": { | |||
"scrolled": true, | |||
"tags": [] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# 检查确保 ES 服务启动成功\n", | |||
"!curl http://localhost:9200/_aliases?pretty=true" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "13769e5e", | |||
"metadata": {}, | |||
"source": [ | |||
"## 3.4.2 文档数据写入 ANN 索引库\n", | |||
"\n", | |||
"以保险数据集为例建立 ANN 索引库\n", | |||
"\n", | |||
"python utils/offline_ann.py --index_name insurance \\\n", | |||
" --doc_dir data/insurance \\\n", | |||
" --split_answers \\\n", | |||
" --delete_index\n", | |||
"\n", | |||
"参数含义说明\n", | |||
"* \n", | |||
"* index_name: 索引的名称\n", | |||
"* doc_dir: txt文本数据的路径\n", | |||
"* host: Elasticsearch的IP地址\n", | |||
"* port: Elasticsearch的端口号\n", | |||
"* split_answers: 是否切分每一行的数据为query和answer两部分\n", | |||
"* delete_index: 是否删除现有的索引和数据,用于清空es的数据,默认为false\n", | |||
"\n", | |||
"\n", | |||
"打印几条数据\n", | |||
"curl http://localhost:9200/insurance/_search\n", | |||
"会输出如下的示例结果:\n", | |||
"\n", | |||
"{\"took\":2,\"timed_out\":false,\"_shards\":{\"total\":1,\"successful\":1,\"ski" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "03f238fd", | |||
"metadata": { | |||
"scrolled": true, | |||
"tags": [] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# 以保险数据集为例建立 ANN 索引库\n", | |||
"\n", | |||
"# !cd ~/PaddleNLP/pipelines && python utils/offline_ann.py --index_name insurance \\\n", | |||
"# --doc_dir data/insurance \\\n", | |||
"# --split_answers \\\n", | |||
"# --delete_index" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "673c8f59", | |||
"metadata": { | |||
"scrolled": true, | |||
"tags": [] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"!curl http://localhost:9200/insurance/_search" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "7f390f9e", | |||
"metadata": {}, | |||
"source": [ | |||
"## 3.4.3 启动 RestAPI 模型服务\n", | |||
"\n", | |||
" 指定FAQ智能问答系统的Yaml配置文件\n", | |||
"\n", | |||
"export PIPELINE_YAML_PATH=rest_api/pipeline/dense_faq.yaml\n", | |||
"\n", | |||
"使用端口号 8891 启动模型服务\n", | |||
"\n", | |||
"python rest_api/application.py 8891\n", | |||
"Linux 用户推荐采用 Shell 脚本来启动服务:\n", | |||
"\n", | |||
"sh examples/FAQ/run_faq_server.sh\n", | |||
"启动后可以使用curl命令验证是否成功运行:\n", | |||
"\n", | |||
"curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{\"query\": \"企业如何办理养老保险?\",\"params\": {\"Retriever\": {\"top_k\": 5}, \"Ranker\":{\"top_k\": 5}}}'3.4.3 " | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "bd90a0f4", | |||
"metadata": { | |||
"scrolled": true, | |||
"tags": [] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"# !export PIPELINE_YAML_PATH=rest_api/pipeline/dense_faq.yaml\n", | |||
"# # 使用端口号 8891 启动模型服务\n", | |||
"# !cd ~/PaddleNLP/pipelines && python rest_api/application.py 8891" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "244c35ac", | |||
"metadata": { | |||
"scrolled": true, | |||
"tags": [] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"!curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{\"query\": \"企业如何办理养老保险?\",\"params\": {\"Retriever\": {\"top_k\": 5}, \"Ranker\":{\"top_k\": 5}}}'\n" | |||
] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "1cfeb300", | |||
"metadata": {}, | |||
"source": [ | |||
"不明白为什么8891没连上" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "97ad68bc", | |||
"metadata": { | |||
"scrolled": true, | |||
"tags": [] | |||
}, | |||
"outputs": [], | |||
"source": [ | |||
"!netstat -an |grep 889" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "cef87ba1", | |||
"metadata": { | |||
"scrolled": true | |||
}, | |||
"outputs": [], | |||
"source": [] | |||
}, | |||
{ | |||
"cell_type": "markdown", | |||
"id": "ccaf3cee", | |||
"metadata": {}, | |||
"source": [ | |||
"# 调试\n", | |||
"## 8891链接报错curl: (7) Failed to connect to localhost port 8891: Connection refused\n", | |||
"启动 RestAPI 模型服务章节:\n", | |||
"```\n", | |||
"!curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{\"query\": \"企业如何办理养老保险?\",\"params\": {\"Retriever\": {\"top_k\": 5}, \"Ranker\":{\"top_k\": 5}}}'\n", | |||
"\n", | |||
"报错:curl: (7) Failed to connect to localhost port 8891: Connection refused\n", | |||
"\n", | |||
"\n", | |||
"```\n", | |||
"## 报错cannot import name 'deprecated' from 'typing_extensions'\n", | |||
"File \"/opt/conda/lib/python3.7/site-packages/fastapi-0.100.1-py3.7.egg/fastapi/params.py\", line 6, in <module>\n", | |||
" from typing_extensions import Annotated, deprecated\n", | |||
"ImportError: cannot import name 'deprecated' from 'typing_extensions' (/opt/conda/lib/python3.7/site-packages/typing_extensions.py)\n", | |||
" \n", | |||
" 把typing_extensions从4.4升级到4.5,问题解决\n", | |||
" \n", | |||
"## 报错\n", | |||
" from typing import (\n", | |||
"ImportError: cannot import name 'TypedDict' from 'typing' (/opt/conda/lib/python3.7/typing.py)\n", | |||
" \n", | |||
" 这个是python3.7下会出的问题,现在python3.10,问题应该是已经解决了。" | |||
] | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": null, | |||
"id": "6b9aca69", | |||
"metadata": {}, | |||
"outputs": [], | |||
"source": [] | |||
} | |||
], | |||
"metadata": { | |||
"kernelspec": { | |||
"display_name": "Python 3 (ipykernel)", | |||
"language": "python", | |||
"name": "python3" | |||
}, | |||
"language_info": { | |||
"codemirror_mode": { | |||
"name": "ipython", | |||
"version": 3 | |||
}, | |||
"file_extension": ".py", | |||
"mimetype": "text/x-python", | |||
"name": "python", | |||
"nbconvert_exporter": "python", | |||
"pygments_lexer": "ipython3", | |||
"version": "3.10.10" | |||
} | |||
}, | |||
"nbformat": 4, | |||
"nbformat_minor": 5 | |||
} |
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》