Branch: develop

History

zenghsh3 a7ac16df12 fix bug of rpm (#359 ) * fix bug of rpm * fix bug of rpm * soft link		3 years ago
..
final_submit	NeurIPS2018-AI-for-Prosthetics-Challenge training code (#40)	5 years ago

image	update readme for competition folder (#42)	5 years ago

README.md	fix paddle version bug (#207)	4 years ago

args.py	NeurIPS2018-AI-for-Prosthetics-Challenge training code (#40)	5 years ago

env_wrapper.py	NeurIPS2018-AI-for-Prosthetics-Challenge training code (#40)	5 years ago

multi_head_ddpg.py	breaking changes#1 (#95)	4 years ago

opensim_agent.py	make job run task in a separate process (#170)	4 years ago

opensim_model.py	breaking changes#1 (#95)	4 years ago

pelvisBasedObs_scaler.npz	add testing module of NeurIPS2018-AI-for-Prosthetics-Challenge (#32)	5 years ago

replay_memory.py	fix bug of rpm (#359)	3 years ago

simulator_client.py	NeurIPS2018-AI-for-Prosthetics-Challenge training code (#40)	5 years ago

simulator_pb2.py	NeurIPS2018-AI-for-Prosthetics-Challenge training code (#40)	5 years ago

simulator_pb2_grpc.py	NeurIPS2018-AI-for-Prosthetics-Challenge training code (#40)	5 years ago

simulator_server.py	fix bug of rpm (#359)	3 years ago

test.py	NeurIPS2018-AI-for-Prosthetics-Challenge training code (#40)	5 years ago

utils.py	NeurIPS2018-AI-for-Prosthetics-Challenge training code (#40)	5 years ago

README.md

The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge

The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge

PARL

This folder contains the winning solution of our team Firework in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation.

For more technical details about our solution, we provide:

[Link] An interesting video demonstrating the training process visually.
[Link] A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop.
[Link] A poster briefly introducing our solution in NeurIPS2018 competition workshop.
(coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient.

Note: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person.

PARL

Dependencies

python3.6
parl==1.0
paddlepaddle==1.5.1
osim-rl
grpcio==1.12.1
tqdm
tensorflow (To use tensorboard)

Part1: Final submitted model

Result

For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds.

Avg reward of all episodes	Avg reward of complete episodes	Falldown %	Evaluate episodes
9968.5404	9980.3952	0.0026	5000

Test

How to Run
1. Enter the sub-folder final_submit
2. Download the model file from online storage service, Baidu Pan or Google Drive
3. Unpack the file by using:
  tar zxvf saved_model.tar.gz
4. Launch the test script:
  python test.py

Part2: Curriculum learning

PARL

1. Target: Run as fast as possible

PARL

# server
python simulator_server.py --port [PORT] --ensemble_num 1 

# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest

2. Target: run at 3.0 m/s

# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
           --restore_model_path [RunFastest model]

# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \
           --act_penalty_lowerbound 1.5

3. target: walk at 2.0 m/s

# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
           --restore_model_path [FixedTargetSpeed 3.0m/s model]

# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \
           --act_penalty_lowerbound 0.75

4. target: walk slowly at 1.25 m/s

PARL

# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
           --restore_model_path [FixedTargetSpeed 2.0m/s model]  

# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \
           --act_penalty_lowerbound 0.6

Part3: Training in random velocity environment for round2 evaluation

As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. (Baidu Pan or Google Drive)

# server
python simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \
           --restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head 

# client (Suggest: 100+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \
           --act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3

Test trained model

python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]

Other implementation details

PARL

Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :)

Acknowledgments

We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.

PARL 是一个高性能、灵活的强化学习框架

https://parl.readthedocs.io

ai开发工具

Python C++ JavaScript Shell Markdown other

2466956298@qq.com zenghongsheng@baidu.com likejiao@baidu.com 39279048+Banmahhhh@users.noreply.github.com lsb19@tsinghua.org.cn 68997378+swag1ong@users.noreply.github.com zhoubo01@baidu.com 76139596+ShuaibinLi@users.noreply.github.com 52879090+YuechengLiu@users.noreply.github.com wangzelong0663@gmail.com royxroy@163.com zenghsh3@gmail.com tan_ze@outlook.com 52879090+liuyuecheng-github@users.noreply.github.com 915647399@qq.com haonanyu@baidu.com cclauss@me.com yu239@users.noreply.github.com tangzhiyi11@users.noreply.github.com 50344320+ZiyuanMa@users.noreply.github.com 115619013+Aidilele@users.noreply.github.com 49400846+Jiukaishi@users.noreply.github.com 58016616+ljy2222@users.noreply.github.com bestwanglei@gmail.com skylian@users.noreply.github.com

How to access data resources in code

README.md

The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge

Dependencies

Part1: Final submitted model

Result

Test

Part2: Curriculum learning

1. Target: Run as fast as possible

2. Target: run at 3.0 m/s

3. target: walk at 2.0 m/s

4. target: walk slowly at 1.25 m/s

Part3: Training in random velocity environment for round2 evaluation

Test trained model

Other implementation details

Acknowledgments

Contributors (25+) All

Contributors (25+)
All