OpenI/PARL: PARL 是一个高性能、灵活的强化学习框架 - examples/PPO/README.md at develop - PARL - OpenI

关于GCU、沐曦GPGPU、MLU、0卡V100资源4月7日恢复上架的公告>>> 关于共建具身智能开源数据集的倡议>>> 关于云脑任务中统一路径访问方式的公告>>> 关于将启智集群GPU资源迁移至智算集群的公告>>>

2.5 KiB

Raw Permalink Blame History

Reproduce PPO with PARL
How to use

Reproduce PPO with PARL

Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in mujoco benchmarks.

Paper: PPO in Proximal Policy Optimization Algorithms

Mujoco/Atari games introduction

PARL currently supports the open-source version of Mujoco provided by DeepMind, so users do not need to download binaries of Mujoco as well as install mujoco-py and get license. For more details, please visit Mujoco.

Benchmark result

1. Mujoco games results

mujoco-result

2. Atari games results

atari-result

Each experiment was run three times with different seeds

How to use

Mujoco-Dependencies:

python3.7+
paddle>=2.3.1
parl>=2.1.1
gym>=0.26.0
mujoco>=2.2.2

Atari-Dependencies:

paddle>=2.3.1
parl>=2.1.1
gym==0.18.0
atari-py==0.2.6
opencv-python

Training:

# To train an agent for discrete action game (Atari: PongNoFrameskip-v4 by default)
python train.py

# To train an agent for continuous action game (Mujoco)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000

Distributed Training

Accelerate training process by setting xparl_addr and env_num > 1 when environment simulation running very slow.
At first, we can start a local cluster with 8 CPUs:

xparl start --port 8010 --cpu_num 8

Note that if you have started a master before, you don't have to run the above
command. For more information about the cluster, please refer to our
documentation.

Then we can start the distributed training by running:

# To train an agent distributedly

# for discrete action game (Atari games)
python train.py --env "PongNoFrameskip-v4" --env_num 8 --xparl_addr 'localhost:8010'

# for continuous action game (Mujoco games)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000 --env_num 5 --xparl_addr 'localhost:8010'

2.5 KiB Raw Permalink Blame History