Reproduce PPO with PARL
Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in mujoco benchmarks.
Paper: PPO in Proximal Policy Optimization Algorithms
Mujoco/Atari games introduction
PARL currently supports the open-source version of Mujoco provided by DeepMind, so users do not need to download binaries of Mujoco as well as install mujoco-py and get license. For more details, please visit Mujoco.
Benchmark result
1. Mujoco games results
2. Atari games results
- Each experiment was run three times with different seeds
How to use
Mujoco-Dependencies:
Atari-Dependencies:
Training:
# To train an agent for discrete action game (Atari: PongNoFrameskip-v4 by default)
python train.py
# To train an agent for continuous action game (Mujoco)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000
Distributed Training
Accelerate training process by setting xparl_addr
and env_num > 1
when environment simulation running very slow.
At first, we can start a local cluster with 8 CPUs:
xparl start --port 8010 --cpu_num 8
Note that if you have started a master before, you don't have to run the above
command. For more information about the cluster, please refer to our
documentation.
Then we can start the distributed training by running:
# To train an agent distributedly
# for discrete action game (Atari games)
python train.py --env "PongNoFrameskip-v4" --env_num 8 --xparl_addr 'localhost:8010'
# for continuous action game (Mujoco games)
python train.py --env 'HalfCheetah-v4' --continuous_action --train_total_steps 1000000 --env_num 5 --xparl_addr 'localhost:8010'