Are you sure you want to delete this task? Once this task is deleted, it cannot be recovered.
zenghsh3 a7ac16df12 | 3 years ago | |
---|---|---|
.. | ||
final_submit | 5 years ago | |
image | 5 years ago | |
README.md | 4 years ago | |
args.py | 5 years ago | |
env_wrapper.py | 5 years ago | |
multi_head_ddpg.py | 4 years ago | |
opensim_agent.py | 4 years ago | |
opensim_model.py | 4 years ago | |
pelvisBasedObs_scaler.npz | 5 years ago | |
replay_memory.py | 3 years ago | |
simulator_client.py | 5 years ago | |
simulator_pb2.py | 5 years ago | |
simulator_pb2_grpc.py | 5 years ago | |
simulator_server.py | 3 years ago | |
test.py | 5 years ago | |
utils.py | 5 years ago |
This folder contains the winning solution of our team Firework
in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation.
For more technical details about our solution, we provide:
Note: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person.
For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds.
Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes |
---|---|---|---|
9968.5404 | 9980.3952 | 0.0026 | 5000 |
How to Run
final_submit
tar zxvf saved_model.tar.gz
python test.py
# server
python simulator_server.py --port [PORT] --ensemble_num 1
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [RunFastest model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \
--act_penalty_lowerbound 1.5
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 3.0m/s model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \
--act_penalty_lowerbound 0.75
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 2.0m/s model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \
--act_penalty_lowerbound 0.6
As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. (Baidu Pan or Google Drive)
# server
python simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head
# client (Suggest: 100+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \
--act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3
python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]
Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :)
We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.
PARL 是一个高性能、灵活的强化学习框架
Python C++ JavaScript Shell Markdown other
Dear OpenI User
Thank you for your continuous support to the Openl Qizhi Community AI Collaboration Platform. In order to protect your usage rights and ensure network security, we updated the Openl Qizhi Community AI Collaboration Platform Usage Agreement in January 2024. The updated agreement specifies that users are prohibited from using intranet penetration tools. After you click "Agree and continue", you can continue to use our services. Thank you for your cooperation and understanding.
For more agreement content, please refer to the《Openl Qizhi Community AI Collaboration Platform Usage Agreement》