Octopus is a one-stop computing fusion platform for multiple computing scenarios. the platform is mainly designed for the needs of computing and resource management in AI, HPC and other scenarios. It provides users with computing power management and use functions for data, algorithms, mirroring, models, and computing power, which is convenient for users to build a one-stop shop Computing environment, realizing calculation.
At the same time, cluster management personnel are provided with functions such as cluster resource management and monitoring, computing task management and monitoring, etc., to facilitate cluster management personnel to operate and analyze the overall system.
Octopus is based on the container orchestration platform Kubernetes , octopus makes full use of the agility, light weight, and isolation of containers to meet the needs of diverse computing scenarios.
For detailed documentation, please refer to here.
Octopus has the following characteristics:
Octopus is suitable for use in the following scenarios:
Octopus manages computing resources and optimizes computing tasks for scenarios such as AI and HPC. Decoupling computing hardware and software through mirroring and container technology (Docker) enables easy switching between different computing environments．
Octopus users usually have two different roles:
Octopus provides end-to-end manuals for cluster users and administrators.
Documents related to cluster administrators include the following:
Cluster Deployment Guide: the main contents provided in this part include: preparation and installation of cluster dependent environment and components, Octopus system deployment guide and follow-up system upgrade instructions to facilitate installation and maintenance. For details, please refer to here 。
Cluster Management Manual: This part mainly introduces the operations that the cluster administrator can perform after entering the Octopus management system through the management system page entrance. The main function descriptions include: platform monitoring, resource management, user management, machine time management, data management, algorithm management, development and training management And other functions. For details, please refer to here.
The main documents related to cluster users are as follows:
For detailed contribution guidelines, please refer to here.