**C-Learning: Learning to Achieve Goals via Recursive Classification** *__tldr__: We reframe goal-conditioned RL as the problem of predicting and controlling the future state distribution of an autonomous agent. We solve this problem indirectly by training a classifier to predict whether an observation comes from the future. Importantly, an off-policy variant of our algorithm allows us to predict the future state distribution of a new policy, without collecting new experience. While conceptually similar to Q-learning, our approach provides a theoretical justification for goal-relabeling methods employed in prior work and suggests how the goal-sampling ratio can be optimally chosen. Empirically our method outperforms these prior methods.* **Code**: https://github.com/google-research/google-research/tree/master/c_learning Videos of Learned Policies =============================================================================== Below, we visualize examples of the behavoir learned by our method. We emphasize that our method does not use any reward shaping or hand-crafted distance functions. Note that the robot has learned to retry if it initially fails to solve the task, and has learned to finely adjust the position of the objects if its initial attempt slightly missed the goal. Sawyer Pushing ------------------------------------------------------------------------------- In this task, the robot is supposed to move the red puck to the goal, which is indicated by the green circle. ![1](videos/sawyer_push_1.mp4 width="100%") ![2](videos/sawyer_push_2.mp4 width="100%") ![3](videos/sawyer_push_3.mp4 width="100%") ![4](videos/sawyer_push_4.mp4 width="100%") ![5](videos/sawyer_push_5.mp4 width="100%") ![6](videos/sawyer_push_6.mp4 width="100%") ![7](videos/sawyer_push_7.mp4 width="100%") ![8](videos/sawyer_push_8.mp4 width="100%") ![9](videos/sawyer_push_9.mp4 width="100%") ![10](videos/sawyer_push_10.mp4 width="100%") Sawyer Pick ------------------------------------------------------------------------------- In this task, the robot is supposed to pick up the puck and lift it to the green circle. ![1](videos/sawyer_lift_1.mp4 width="100%") ![2](videos/sawyer_lift_2.mp4 width="100%") ![3](videos/sawyer_lift_3.mp4 width="100%") ![4](videos/sawyer_lift_4.mp4 width="100%") ![5](videos/sawyer_lift_5.mp4 width="100%") ![6](videos/sawyer_lift_6.mp4 width="100%") Sawyer Window ------------------------------------------------------------------------------- In this task, the robot is supposed to slide the window so that the handle is aligned with the green circle. ![1](videos/sawyer_window_1.mp4 width="100%") ![2](videos/sawyer_window_2.mp4 width="100%") ![3](videos/sawyer_window_3.mp4 width="100%") ![4](videos/sawyer_window_4.mp4 width="100%") ![5](videos/sawyer_window_5.mp4 width="100%") ![6](videos/sawyer_window_6.mp4 width="100%") Sawyer Drawer ------------------------------------------------------------------------------- In this task, the robot is supposed to pull or push the drawer so the handle is aligned with the green circle. ![1](videos/sawyer_drawer_1.mp4 width="100%") ![2](videos/sawyer_drawer_2.mp4 width="100%") ![3](videos/sawyer_drawer_3.mp4 width="100%") ![4](videos/sawyer_drawer_4.mp4 width="100%") ![5](videos/sawyer_drawer_5.mp4 width="100%") ![6](videos/sawyer_drawer_6.mp4 width="100%") Sawyer Faucet ------------------------------------------------------------------------------- In this task, the robot is supposed to turn the faucet to the handle is aligned with the (tiny) green circle. ![1](videos/sawyer_faucet_1.mp4 width="100%") ![2](videos/sawyer_faucet_2.mp4 width="100%") ![3](videos/sawyer_faucet_3.mp4 width="100%") ![4](videos/sawyer_faucet_4.mp4 width="100%") ![5](videos/sawyer_faucet_5.mp4 width="100%") ![6](videos/sawyer_faucet_6.mp4 width="100%") ![7](videos/sawyer_faucet_7.mp4 width="100%") ![8](videos/sawyer_faucet_8.mp4 width="100%") Sawyer Push Two ------------------------------------------------------------------------------- In this task, the robot is supposed to push two pucks to their respective goals (green goal for green puck, red goal for red puck). ![1](videos/sawyer_push_two_1.mp4 width="100%") ![2](videos/sawyer_push_two_2.mp4 width="100%") ![3](videos/sawyer_push_two_3.mp4 width="100%") ![4](videos/sawyer_push_two_4.mp4 width="100%") ![5](videos/sawyer_push_two_5.mp4 width="100%") ![6](videos/sawyer_push_two_6.mp4 width="100%") Predictions of the Future State Distribution =============================================================================== Our method predicts where an agent is likely to be in the time-discounted future. Below, we give our model an observation (top) and action (not shown) and it predicts a distribution over future states (bottom). As expected, the model predicts more distant states when we increase the discount factor $\gamma$. ![Ant-v2, $\gamma = 0.5$](images/Ant-v2_gamma_0.50.gif) ![Ant-v2, $\gamma = 0.9$](images/Ant-v2_gamma_0.90.gif) ![Ant-v2, $\gamma = 0.99$](images/Ant-v2_gamma_0.99.gif) ![HalfCheetah-v2, $\gamma = 0.5$](images/HalfCheetah-v2_gamma_0.50.gif) ![HalfCheetah-v2, $\gamma = 0.9$](images/HalfCheetah-v2_gamma_0.90.gif) ![Hopper-v2, $\gamma = 0.5$](images/Hopper-v2_gamma_0.50.gif) ![Hopper-v2, $\gamma = 0.9$](images/Hopper-v2_gamma_0.90.gif) ![Walker2d-v2, $\gamma = 0.5$](images/Walker2d-v2_gamma_0.50.gif) ![Walker2d-v2, $\gamma = 0.9$](images/Walker2d-v2_gamma_0.90.gif) ![Walker2d-v2, $\gamma = 0.99$](images/Walker2d-v2_gamma_0.99.gif) --------------------------