#1 master

Merged
jikuai merged 2 commits from skywalk163/airoot:master into master 9 months ago
  1. +3
    -0
      README.md
  2. +504
    -0
      faq.ipynb

+ 3
- 0
README.md View File

@@ -16,3 +16,6 @@ git push -u origin master

使用这个镜像
dockerhub.pcl.ac.cn:5000/user-images/xuym:lhl_cuda101_py36torch171gpu

新增加项目:飞桨端到端FAQ智能问答系统
使用这个镜像:192.168.242.22:443/default-workspace/fccb038c23234b9e80105d4ccd152117/image:xmm

+ 504
- 0
faq.ipynb View File

@@ -0,0 +1,504 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9a583d55",
"metadata": {},
"source": [
"# 飞桨端到端FAQ智能问答系统\n",
"文档:https://openi.pcl.ac.cn/PaddlePaddle/PaddleNLP/src/branch/develop/pipelines/examples/FAQ"
]
},
{
"cell_type": "markdown",
"id": "ed49d413",
"metadata": {},
"source": [
"## 系统特色\n",
"\n",
"* 端到端\n",
"\n",
"提供包括数据建库、模型服务部署、WebUI 可视化一整套端到端FAQ智能问答系统能力\n",
"\n",
"多源数据支持: 支持对 Txt、Word、PDF、Image 多源数据进行解析、识别并写入 ANN 数据库\n",
"* 效果好\n",
"\n",
"依托百度领先的NLP技术,包括ERNIE语义理解技术与RocketQA开放域问答技术\n",
"\n",
"预置领先的深度学习模型\n",
"\n",
"## 首先环境配置\n",
"\n",
"镜像已经安装好飞桨和paddleNLP:192.168.242.22:443/default-workspace/fccb038c23234b9e80105d4ccd152117/image:xmm\n",
"\n",
"若没有安装好环境,可以参考下面步骤:\n",
"\n",
"升级飞桨到2.5.1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2678514",
"metadata": {},
"outputs": [],
"source": [
"# !python -m pip install paddlepaddle-gpu==2.5.1.post102 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html\n",
"!pip uninstall paddlepaddle-gpu -y\n",
"!python -m pip install paddlepaddle-gpu==2.5.1.post117 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ead664f5",
"metadata": {},
"outputs": [],
"source": [
"# 如果没有git,就要安装git \n",
"!apt update\n",
"!apt install git -y"
]
},
{
"cell_type": "markdown",
"id": "3c2223e6",
"metadata": {},
"source": [
"### 安装pipelines非常容易卡住,建议分步分库安装"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "08a82b58",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# 下载PaddleNLP库文件\n",
"!git clone https://openi.pcl.ac.cn/PaddlePaddle/PaddleNLP.git\n",
"# !pip uninstall paddlenlp paddle-pipelines -y\n",
"%cd /code/PaddleNLP/\n",
"!pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple -q\n",
"!python setup.py install "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8e6a204",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"%cd /code/PaddleNLP/pipelines\n",
"!pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple \n",
"!python setup.py install "
]
},
{
"cell_type": "markdown",
"id": "9675e387",
"metadata": {},
"source": [
"### 将PaddleNLP/pipelines/requirements.txt 文件拆分成多个文件安装"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "642ea577",
"metadata": {},
"outputs": [],
"source": [
"!pip install -r /code/work/rq1.txt -i https://mirror.baidu.com/pypi/simple "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0f834717",
"metadata": {},
"outputs": [],
"source": [
"!pip install preshed "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f0a2b2b",
"metadata": {},
"outputs": [],
"source": [
"!pip install -r /code/work/rq2.txt -i https://mirror.baidu.com/pypi/simple "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "719231ae",
"metadata": {},
"outputs": [],
"source": [
"!pip install -r /code/work/rq3.txt -i https://mirror.baidu.com/pypi/simple "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0209100c",
"metadata": {},
"outputs": [],
"source": [
"!pip install fastapi uvicorn markdown numba -i https://mirror.baidu.com/pypi/simple "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2b8eb6c",
"metadata": {},
"outputs": [],
"source": [
"!pip install pymilvus>=2.1 wordcloud==1.8.2.2 boilerpy3 events -i https://mirror.baidu.com/pypi/simple \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29ccb336",
"metadata": {},
"outputs": [],
"source": [
"!pip install sseclient-py==1.7.2 -i https://mirror.baidu.com/pypi/simple "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "096cbed5",
"metadata": {},
"outputs": [],
"source": [
"!pip install typing_extensions==4.5 -i https://mirror.baidu.com/pypi/simple "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10ee5ed0",
"metadata": {},
"outputs": [],
"source": [
"!pip install spacy -i https://mirror.baidu.com/pypi/simple "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea4fcede",
"metadata": {},
"outputs": [],
"source": [
"现在看看飞桨相关库是否安装好\n",
"\n",
"有时候需要重启内核"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9a1af1de",
"metadata": {},
"outputs": [],
"source": [
"!pip list |grep paddle"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d04df77",
"metadata": {},
"outputs": [],
"source": [
"import paddle"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3458b6a2",
"metadata": {},
"outputs": [],
"source": [
"paddle.randn??"
]
},
{
"cell_type": "markdown",
"id": "1af08e2a",
"metadata": {},
"source": [
"## 端到端FAQ智能问答系统一键启动\n",
"若能启动,证明整个系统环境正常,就可以后面的学习实践了。"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "95866015",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!cd /code/PaddleNLP/pipelines && python examples/FAQ/dense_faq_example.py --device gpu\n"
]
},
{
"cell_type": "markdown",
"id": "bba09de2",
"metadata": {},
"source": [
"## 3.4 构建 Web 可视化FAQ智能问答\n",
"\n",
"整个 Web 可视化FAQ智能问答主要包含 3 大组件: 1. 基于 ElasticSearch 的 ANN 服务 2. 基于 RestAPI 构建模型服务 3. 基于 Streamlit 构建 WebUI,接下来我们依次搭建这 3 个服务并最终形成可视化的FAQ智能问答。\n",
"\n",
"3.4.1 启动 ANN 服务\n",
"\n",
"参考官方文档下载安装 elasticsearch-8.3.2 并解压。\n",
"启动 ES 服务\n",
"首先修改config/elasticsearch.yml的配置:\n",
"xpack.security.enabled: false\n",
"\n",
"已下载并安装在work/elasticsearch-8.8.2目录\n",
"\n",
"然后启动:\n",
"\n",
"./bin/elasticsearch\n",
"\n",
"到安装目录执行上面命令"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4553fbe6",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# 检查确保 ES 服务启动成功\n",
"!curl http://localhost:9200/_aliases?pretty=true"
]
},
{
"cell_type": "markdown",
"id": "13769e5e",
"metadata": {},
"source": [
"## 3.4.2 文档数据写入 ANN 索引库\n",
"\n",
"以保险数据集为例建立 ANN 索引库\n",
"\n",
"python utils/offline_ann.py --index_name insurance \\\n",
" --doc_dir data/insurance \\\n",
" --split_answers \\\n",
" --delete_index\n",
"\n",
"参数含义说明\n",
"* \n",
"* index_name: 索引的名称\n",
"* doc_dir: txt文本数据的路径\n",
"* host: Elasticsearch的IP地址\n",
"* port: Elasticsearch的端口号\n",
"* split_answers: 是否切分每一行的数据为query和answer两部分\n",
"* delete_index: 是否删除现有的索引和数据,用于清空es的数据,默认为false\n",
"\n",
"\n",
"打印几条数据\n",
"curl http://localhost:9200/insurance/_search\n",
"会输出如下的示例结果:\n",
"\n",
"{\"took\":2,\"timed_out\":false,\"_shards\":{\"total\":1,\"successful\":1,\"ski"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "03f238fd",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# 以保险数据集为例建立 ANN 索引库\n",
"\n",
"# !cd ~/PaddleNLP/pipelines && python utils/offline_ann.py --index_name insurance \\\n",
"# --doc_dir data/insurance \\\n",
"# --split_answers \\\n",
"# --delete_index"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "673c8f59",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"!curl http://localhost:9200/insurance/_search"
]
},
{
"cell_type": "markdown",
"id": "7f390f9e",
"metadata": {},
"source": [
"## 3.4.3 启动 RestAPI 模型服务\n",
"\n",
" 指定FAQ智能问答系统的Yaml配置文件\n",
"\n",
"export PIPELINE_YAML_PATH=rest_api/pipeline/dense_faq.yaml\n",
"\n",
"使用端口号 8891 启动模型服务\n",
"\n",
"python rest_api/application.py 8891\n",
"Linux 用户推荐采用 Shell 脚本来启动服务:\n",
"\n",
"sh examples/FAQ/run_faq_server.sh\n",
"启动后可以使用curl命令验证是否成功运行:\n",
"\n",
"curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{\"query\": \"企业如何办理养老保险?\",\"params\": {\"Retriever\": {\"top_k\": 5}, \"Ranker\":{\"top_k\": 5}}}'3.4.3 "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd90a0f4",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# !export PIPELINE_YAML_PATH=rest_api/pipeline/dense_faq.yaml\n",
"# # 使用端口号 8891 启动模型服务\n",
"# !cd ~/PaddleNLP/pipelines && python rest_api/application.py 8891"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "244c35ac",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"!curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{\"query\": \"企业如何办理养老保险?\",\"params\": {\"Retriever\": {\"top_k\": 5}, \"Ranker\":{\"top_k\": 5}}}'\n"
]
},
{
"cell_type": "markdown",
"id": "1cfeb300",
"metadata": {},
"source": [
"不明白为什么8891没连上"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "97ad68bc",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"!netstat -an |grep 889"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cef87ba1",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "ccaf3cee",
"metadata": {},
"source": [
"# 调试\n",
"## 8891链接报错curl: (7) Failed to connect to localhost port 8891: Connection refused\n",
"启动 RestAPI 模型服务章节:\n",
"```\n",
"!curl -X POST -k http://localhost:8891/query -H 'Content-Type: application/json' -d '{\"query\": \"企业如何办理养老保险?\",\"params\": {\"Retriever\": {\"top_k\": 5}, \"Ranker\":{\"top_k\": 5}}}'\n",
"\n",
"报错:curl: (7) Failed to connect to localhost port 8891: Connection refused\n",
"\n",
"\n",
"```\n",
"## 报错cannot import name 'deprecated' from 'typing_extensions'\n",
"File \"/opt/conda/lib/python3.7/site-packages/fastapi-0.100.1-py3.7.egg/fastapi/params.py\", line 6, in <module>\n",
" from typing_extensions import Annotated, deprecated\n",
"ImportError: cannot import name 'deprecated' from 'typing_extensions' (/opt/conda/lib/python3.7/site-packages/typing_extensions.py)\n",
" \n",
" 把typing_extensions从4.4升级到4.5,问题解决\n",
" \n",
"## 报错\n",
" from typing import (\n",
"ImportError: cannot import name 'TypedDict' from 'typing' (/opt/conda/lib/python3.7/typing.py)\n",
" \n",
" 这个是python3.7下会出的问题,现在python3.10,问题应该是已经解决了。"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b9aca69",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

Loading…
Cancel
Save