You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.
Hanlard 0e55ac941f update 3 months ago
..
__init__.py update 3 months ago
accelerate_base_trainer.py update 3 months ago
accelerate_copr_trainer.py update 3 months ago
accelerate_cppo_trainer.py update 3 months ago
accelerate_dpo_trainer.py update 3 months ago
accelerate_dvpo_trainer.py update 3 months ago
accelerate_ilql_trainer.py update 3 months ago
accelerate_ppo_trainer.py update 3 months ago
accelerate_rdpo_trainer.py update 3 months ago
accelerate_sft_trainer.py update 3 months ago
accelerate_spin_trainer.py update 3 months ago
nemo_ilql_trainer.py update 3 months ago
nemo_sft_trainer.py update 3 months ago

复现了offline对齐算法的一系列工作,欢迎大家交流。 包括DPO, PRO, RRHF和SPIN。还有团队发表在ICLR2024的CPPO,以及最新的研究工作COPR。

Python

Contributors (2)