Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
wenzhang.liu 5331a76ec3 | 1 year ago | |
---|---|---|
common | 1 year ago | |
configs | 1 year ago | |
environment | 1 year ago | |
xuance_ms | 1 year ago | |
xuance_torch | 1 year ago | |
.DS_Store | 1 year ago | |
README.md | 1 year ago | |
main.py | 1 year ago | |
requirements.txt | 1 year ago |
Version: 2.0
XuanPolicy is an open-source ensemble of Deep Reinforcement Learning (DRL) algorithm implementations.
We call it as XuanCe.
"Xuan" means magic box and "Ce" means the policy in Chinese.
DRL algorithms are sensitive to hyper-parameters tuning, varying in performance with different tricks,
and suffering from unstable training processes, therefore, sometimes DRL algorithms seems elusive and "Xuan".
This project gives a thorough, high-quality and easy-to-understand implementation of RL algorithms,
and hope this implementation can give a hint on the magics of reinforcement learning.
We expect it to be compatible with multiple deep learning toolboxes (torch, mindspore, and tensorlayer),
and hope it can really become a zoo full of DRL algorithms.
This project is supported by Peng Cheng Laboratory.
Step 1: Create and activate a new conda environment (python=3.7 is suggested):
$ conda create -n xuanpolicy python=3.7
$ conda activate xuanpolicy
Step 2: Install the python modules with:
$ pip install -r requirement.txt
Note: Some modules should be installed manually according to the difference devices.
You can block any last five tricks as you like by changing the default parameters in functions.
The following four lines of code are enough to start training an RL agent.
$ python main.py --method dqn
As our project support multiprocess communication by mpi4py, so you can run with the following command to start training with K sub-process.
mpiexec -n K python test_agent.py
You can use tensorboard to visualize what happened in the training process. After training, the log file will be automatically generated in the directory ".results/" and you should be able to see some training data after running the command.
tensorboard --logdir ./results/
If everything going well, you should get a similar display like below.
To visualize the training scores, training times and the performance, you need to initialize the environment as
env = MonitorVecEnv(DummyVecEnv(...))
then, after training terminated, two extra files "xxx.npy" and "xxx.gif" will be generated in the "./results/" directory. The "xxx.npy" record the scores and clock time for each episode in training. But we haven't provided a plotter.py to draw the curves for this.
We train our agents in MuJoCo benchmark (HalfCheetah,...) for 1M experience and compare with some other implementations (stable-baselines, stable-baselines3, ...). The performance is shown below. We noticed that the scale of reward in our experiment is different, and we reckon it is mainly because the version of mujoco and the timesteps for each episode. For fair comparsion, we use the same
hyperparameters for all the implementations.
Environments(1M,4 parallels) | Ours | Stable-baselines(tf) | Stable-baselines3(torch) |
---|---|---|---|
HalfCheetah-v3 | |||
Hopper-v3 | |||
Walker2d-v3 | |||
Ant-v3 | |||
Swimmer-v3 | |||
Humanoid-v3 |
Environments(1M,4 parallels) | Ours | Stable-baselines(tf) | Stable-baselines3(torch) |
---|---|---|---|
HalfCheetah-v3 | |||
Hopper-v3 | |||
Walker2d-v3 | |||
Ant-v3 | |||
Swimmer-v3 | |||
Humanoid-v3 |
Environments(1M,4 parallels) | Ours | Stable-baselines(tf) | Stable-baselines3(torch) |
---|---|---|---|
HalfCheetah-v3 | |||
Hopper-v3 | |||
Walker2d-v3 | |||
Ant-v3 | |||
Swimmer-v3 | |||
Humanoid-v3 |
Environments(1M,4 parallels) | Ours | Stable-baselines(tf) | Stable-baselines3(torch) |
---|---|---|---|
HalfCheetah-v3 | |||
Hopper-v3 | |||
Walker2d-v3 | |||
Ant-v3 | |||
Swimmer-v3 | |||
Humanoid-v3 |
Environments(1M,4 parallels) | Ours | Stable-baselines(tf) | Stable-baselines3(torch) |
---|---|---|---|
HalfCheetah-v3 | ~3283 | ~1336.76(std~133.12) | |
Hopper-v3 | ~2764.86(std~1090.03) | ||
Walker2d-v3 | ~3094.35(std~83.41) | ||
Ant-v3 | ~2508.44(std~106.25) | ||
Swimmer-v3 | ~43.13(std~1.58) | ||
Humanoid-v3 | ~549.35(std~92.78) | ||
Reacher-v3 | ~360.45(std~43.95) | ||
InvertedPendulum-v3 | |||
InvertedDoublePendulum-v3 |
Environments(1M,4 parallels) | Ours | Stable-baselines(tf) | Stable-baselines3(torch) |
---|---|---|---|
HalfCheetah-v3 | |||
Hopper-v3 | |||
Walker2d-v3 | |||
Ant-v3 | |||
Swimmer-v3 | |||
Humanoid-v3 |
Environments(1M,4 parallels) | Ours | Stable-baselines(tf) | Stable-baselines3(torch) |
---|---|---|---|
HalfCheetah-v3 | |||
Hopper-v3 | |||
Walker2d-v3 | |||
Ant-v3 | |||
Swimmer-v3 | |||
Humanoid-v3 |
Environments(1M,4 parallels) | Ours | Stable-baselines(tf) | Stable-baselines3(torch) |
---|---|---|---|
HalfCheetah-v3 | |||
Hopper-v3 | |||
Walker2d-v3 | |||
Ant-v3 | |||
Swimmer-v3 | |||
Humanoid-v3 |
XuanPolicy
OpenRelearnware
Sep. 21, 2022
A reinforcement learning library by OpenRelearnware Group of PCL.
Python
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》