**Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification**

NeurIPS 2021, Oral (<1%)

Benjamin Eysenbach,   Sergey Levine,   Ruslan Salakhutdinov

Paper,   Code,   Blog post

![](images/teaser.png) *__tldr__: In many scenarios, the user is unable to describe the task in words or numbers, but can readily provide examples of what the world would look like if the task were solved. Motivated by this observation, we derive a control algorithm that learns a policy for solving tasks, given only examples of successful outcome states. Our method, based on recursive classification, learns a value function directly from transitions and success examples. Our method outperforms prior methods at learning from success examples. The key difference from prior work is that our method does not learn an auxiliary reward function, and therefore requires fewer hyperparameters to tune and lines of code to debug. We show that our method satisfies a new data-driven Bellman equation, where examples take the place of the typical reward function term.* Videos of Learned Policies =============================================================================== Below, we visualize examples of the behavoir learned by our method. The green images shown on the left are examples of the success examples our method uses to learn these tasks. Note that these success examples are not expert trajectories, but rather examples of states where the task is solved (e.g., where the nail is hammered into the wall). We emphasize that our method does not use any reward function. **TASK:** Hammer the nail into the board. ![Success Examples.
Note that the nail has already been inserted in all examples.](images/hammer.gif width="100%") ![SQIL (best prior method)](videos/sqil_hammer.mp4 width="100%") ![RCE (our method)](videos/hammer.mp4 width="100%") **TASK:** Put the green object in the blue bin. ![Success Examples](images/sawyer_bin_picking.gif width="100%") ![SQIL (best prior method)](videos/sqil_sawyer_bin_picking.mp4 width="100%") ![RCE (our method)](videos/sawyer_bin_picking.mp4 width="100%") **TASK:** Place the lid on the box. ![Success Examples](images/sawyer_box_close.gif width="100%") ![SQIL (best prior method)](videos/sqil_sawyer_box_close.mp4 width="100%") ![RCE (our method)](videos/sawyer_box_close.mp4 width="100%") **TASK:** Open the door. ![Success Examples](images/door.gif width="100%") ![SQIL (best prior method)](videos/sqil_door.mp4 width="100%") ![RCE (our method)](videos/door.mp4 width="100%") **TASK:** Open the drawer. ![Success Examples](images/sawyer_drawer_open.gif width="100%") ![SQIL (best prior method)](videos/sqil_sawyer_drawer_open.mp4 width="100%") ![RCE (our method)](videos/sawyer_drawer_open.mp4 width="100%") **TASK:** Lift the object. (The colored spheres are irrelevant.) ![Success Examples](images/sawyer_lift.gif width="100%") ![SQIL (best prior method)](videos/sqil_sawyer_lift.mp4 width="100%") ![RCE (our method)](videos/sawyer_lift.mp4 width="100%") **TASK:** Push the red object to the green sphere. ![Success Examples](images/sawyer_push.gif width="100%") ![SQIL (best prior method)](videos/sqil_sawyer_push.mp4 width="100%") ![RCE (our method)](videos/sawyer_push.mp4 width="100%") Additional videos of behaviors learned by RCE -------------------------------------------------------------------- **TASK:** Close the drawer. ![Success Examples](images/sawyer_drawer_close.gif width="100%") ![RCE (our method)](videos/sawyer_drawer_close.mp4 width="100%") **TASK:** Clear the object from the table. (image observations) ![Success Examples](images/sawyer_clear_image.gif width="100%") ![RCE (our method)](videos/sawyer_clear_image.mp4 width="100%") **TASK:** Reach for the red object. (image observations) ![Success Examples](images/sawyer_reach_image.gif width="100%") ![RCE (our method)](videos/sawyer_reach_image.mp4 width="100%") Failure Cases ------------------------------------------------------------------------------- For the task below, the agent makes some headway on solving the task, but is unable to keep the object in the desired location. **TASK:** Pick up the ball. ![Success Examples](images/relocate.gif width="100%") ![RCE (our method)](videos/relocate.mp4 width="100%") --------------------------