大模型优化利器:RLHF之PPO、DPO

You are here:
Go to Top