Abstract: Preference-based reinforcement learning (PBRL) enables policy learning through simple queries comparing trajectories from a single policy. While human responses to these queries make it possible to learn policies aligned with human preferences, PBRL suffers from low query efficiency, as policy bias limits trajectory diversity and reduces the number of … [Read more...] about DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition

