Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
DHengW f78d889ddd | 2 years ago | |
---|---|---|
.. | ||
demo | 2 years ago | |
feature_engineer | 2 years ago | |
img | 2 years ago | |
README.md | 2 years ago | |
README_EN.md | 2 years ago | |
__init__.py | 2 years ago | |
call_for_contribute.md | 2 years ago |
English | 简体中文
AutoX_nlp 是针对文本数据进行处理的辅助工具。
它的特点包括:
git clone https://github.com/4paradigm/autox.git
## github访问速度较慢时可以通过gitee地址 https://gitee.com/poteman/autox
pip install ./autox
## pip安装包可能更新不及时,建议用github安装方式安装最新版本
!pip install automl-x -i https://www.pypi.org/simple/
见效果对比和处理效率对比。
点击表格中的超链接可跳转至kaggle平台的线上demo,无需配置环境直接运行
Task type | Dataset name | Evaluation Metric | AutoX | AutoGluon | H2o |
---|---|---|---|---|---|
Regression | CommonlitReadability | RMSE | 0.597 | 1.022 | 1.023 |
Regression | Amazonbookprice | RMSE | 629.792 | 687.870 | 642.167 |
Regression | MercariPrice | RMSE | 32.042 | 34.500 | 43.960 |
Classification | Titanic | AUC | 0.794 | 0.780 | 0.768 |
Classification | Stumbleupon | AUC | 0.855 | 0.503 | 0.707 |
Classification | DisasterTweets | AUC | 0.786 | 0.746 | 0.721 |
使用文本处理工具,将数据集中特定文本列 (不同文本列的平均字符数不同) 转化为数值特征,统计整个流程所用时间并除以文本条数,得到处理效率 (TPS)。
注:不同工具处理方式不同,具体流程可以点击表格中的超链接访问对应demo查看。
Dataset | Text Column | Average Text Length | TPS | AutoX | AutoGluon | H2O |
---|---|---|---|---|---|---|
MercariPrice | BrandName | 6 | item/s | 3480.66 | 127.15 | 979.18 |
MercariPrice | CategoryName | 30 | item/s | 2215.40 | 118.92 | 656.80 |
MercariPrice | ItemDescription | 150 | item/s | 466.73 | 65.46 | 183.14 |
TMDBBoxOffice | Overview | 300 | item/s | 282.73 | 20.74 | 79.18 |
CommonlitReadability | Excerpt | 1000 | item/s | 103.99 | 12.39 | 30.30 |
AutoX is an efficient automl tool, which is mainly aimed at data mining tasks with tabular data.
Jupyter Notebook CSV Python Markdown Pickle other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》