# COS 435 / ECE 433: Introduction to Reinforcement Learning syllabus / schedule / Ed / Gradescope / Spring '24 Slides * **Q**: How do I join the class/waitlist? * If you are unable to enroll on the registrar website, then we have likely hit the course cap. Please add yourself to the [waitlist](https://docs.google.com/forms/d/e/1FAIpQLSdkSgQRW2X5hp-aqphmzIP09d2VDiM80XJHtGP8LJo3dN2rAw/viewform) * **Q**: Can I audit the course? No, but you are welcome to attend the lectures if you are not enrolled in the course. * **Lecture**: Tuesday and Thursday, 1:30pm – 2:50pm. Location: Computer Science 104 * **Precepts**: Fridays 11:00 – 11:50 and 12:30 – 1:20. Cathy Ji and Jiayi Geng will lead concurrent precepts during both times, in Friend 004. * **Office Hours**: * Amanda Wang: Monday at 4:30pm -- 6:30pm in CS Building 105 * Amanda won't hold office hour on March 24. * Cathy Ji: Monday at 6:30pm -- 7:30pm CS Building 105. * Ben Eysenbach: Tuesday at 3:00pm -- 5:00pm in CS Building 416 * Ben's Feb 25 office hours will be held on Feb 28 at 2:00pm -- 4:00 pm. * Yulai Zhao: Wednesday at 7:00pm -- 9:00pm Friend Center 010; Monday at 8:00pm -- 9:00pm via [Zoom](https://princeton.zoom.us/j/6061324339) * Kaixuan Huang: Thursday at 4:00pm -- 6:00pm Friend Center 010 * Zihan Ding: Friday at 3:00pm -- 5:00pm Friend Center 308; Sunday at 10:00am -- 12:00 pm via [Zoom](https://princeton.zoom.us/j/7311017948?omn=92927588850) * Prerequisites: * Intro to ML: COS 324, ECE 435 or equivalent * Probability: ORF 309, or equivalent * Linear Algebra * **Textbook**: None are required, but see [Syllabus](https://docs.google.com/document/d/1zI8TsqTqfQRoEDaaPzeVCcdVwV0UHn6D/edit?usp=sharing&ouid=114713110862674988879&rtpof=true&sd=true) for some books that might be useful if you're ever confused about any of the material in the course. * **Questions?** Ask on [Ed](https://edstem.org/us/courses/54890/discussion/). We will not be using Canvas. Assignments will be submitted on Gradescope. ![...](ideogram.jpeg width=200px) _**Reinforcement learning (RL)** is a machine learning technique that teaches agents how to make decisions that lead to good outcomes. This course will introduce fundamental concepts, important RL algorithms, and key challenges (e.g., exploration and generalization). The course will also highlight applications of RL to real-world problems, including health care and molecular science. Assignments will entail implementation of RL algorithms and mathematical analysis of these algorithms. Students will complete an open-ended final group project._ ### Assignments We have provided both the assignment and the TeX file for use as a template. Unless explicitly specified, please type up your solutions using TeX and submit both your compiled PDF and finished .ipynb on Gradescope. * **Homework 0**: [latex](./hw/s25/hw0.tex); [ipynb](./hw/s25/hw0.ipynb); [sample pdf](./hw/s25/hw0.pdf) Due Date: **2/3/2025** * **Homework 1**: [latex](./hw/s25/hw1.tex); [ipynb](./hw/s25/hw1.ipynb); [sample pdf](./hw/s25/hw1.pdf); (Please Read the Notice [here)](https://edstem.org/us/courses/69451/discussion/6133716) Due Date: **2/10/2025** * **Homework 2**: [latex](./hw/s25/hw2.tex); [sample pdf](./hw/s25/hw2.pdf); [dataset](https://drive.google.com/file/d/1wkmRlYrBszwzgRNNG0t-YZul1RDsFuGS/view?usp=drive_link) Due Date: **2/19/2025** (two-day extention) * **Homework 3**: [latex](./hw/s25/hw3.tex); [sample pdf](./hw/s25/hw3.pdf); [ipynb](./hw/s25/hw3.ipynb) Due Date: **2/24/2025** * **Homework 4**: [latex](./hw/s25/hw4.tex); [sample pdf](./hw/s25/hw4.pdf); [ipynb](./hw/s25/hw4.ipynb) Due Date: **3/4/2025** * **Homework 5**: [latex](./hw/s25/hw5.tex); [sample pdf](./hw/s25/hw5.pdf); [ipynb](./hw/s25/hw5.ipynb) Due Date: **3/28/2025** * **Homework 6 (50%)**: [latex](./hw/s25/hw6.tex); [sample pdf](./hw/s25/hw6.pdf); [ipynb](./hw/s25/hw6.ipynb) Due Date: **4/4/2025** * **Homework 7 (50%)**: [latex](./hw/s25/hw7.tex); [sample pdf](./hw/s25/hw7.pdf); [ipynb](./hw/s25/hw7.ipynb) Due Date: **4/11/2025** * **Final Project**: [guideline](./s25_material/final_project.pdf); [final report template](https://www.overleaf.com/read/vtyyzmnzfdhc#75d1d2) ### Solutions Solutions will be posted after each assignment is due. * **Homework 0**: [solution](./hw/s25/hw0-solution.pdf) * **Homework 1**: [solution](./hw/s25/hw1_solution.pdf); [coding_solution](./hw/s25/hw1_solution.ipynb) * **Homework 2**: [solution](./hw/s25/hw2-solution.pdf); * **Homework 3**: [solution](./hw/s25/hw3-solution.pdf); * **Homework 4**: [solution](./hw/s25/hw4_sol.pdf); * **Midterm**: [solution](./hw/s25/midterm_sol.pdf); * **Homework 5**: [solution](./hw/s25/hw5_sol.pdf); * **Homework 6**: [solution](./hw/s25/hw6_sol.ipynb); * **Homework 7**: [solution](./hw/s25/hw7_sol.ipynb); ### Lecture Notes * **Lecture 1**: What is RL [lecture note](./s25_material/lecture-1-what-is-rl.pdf) * **Lecture 2**: MDP [lecture note](./s25_material/lecture-2-mdp.pdf); [code](./s25_material/Lecture-2-CartPole.ipynb) * **Lecture 3**: Bandits [lecture note](./s25_material/lecture-3-bandits.pdf) * **Lecture 4**: CEM & MPC [lecture note](./s25_material/lecture-4-cem-mpc.pdf) * **Lecture 5**: Imitation Learning [lecture note](./s25_material/lecture-5-imitation.pdf) * **Lecture 6**: Policy Gradient [lecture note](./s25_material/lecture-6-policy-gradient.pdf) * **Lecture 7**: Value Functions [lecture note](./s25_material/lecture-7-value-functions.pdf) * **Lecture 8**: Policy Iteration, Value Iteration [lecture note](./s25_material/lecture-8-policy-value-iteration.pdf) * **Lecture 9**: Learning Value Functions [lecture note](./s25_material/lecture-9-learning-value-functions.pdf) * **Lecture 10**: Q-learning [lecture note](./s25_material/lecture-10-q-learning.pdf) * **Lecture 11**: Deep Q-Learning [lecture note](./s25_material/lecture-11-dqn.pdf) (updated) * **Lecture 12**: PPO (by Cathy) [lecture note](./s25_material/lecture-12-ppo.pdf) (updated) * **Lecture 13**: PPO [lecture note](./s25_material/lecture-13-ppo.pdf) [slide](https://docs.google.com/presentation/d/1pN_I6daIS_tQqrQirKj1z1Yb8tziQAf5Rh-LuAE3HHI/edit?usp=sharing) * **Lecture 14**: DDPG [lecture note](./s25_material/lecture-14-ddpg.pdf) * **Lecture 15**: TD3 [lecture note](./s25_material/lecture-15-td3.pdf) * **Lecture 16**: Stochastic Actors [lecture note](./s25_material/lecture-16-stochastic-actors.pdf) * **Lecture 17**: Maximum Entropy RL [lecture note](./s25_material/lecture-17-maxent-rl.pdf) * **Lecture 18**: Exploration [lecture note](./s25_material/lecture-18-exploration.pdf) * **Lecture 19**: Exploration (part2) [lecture note](./s25_material/lecture-19-exploration.pdf) * **Lecture 20**: * **Lecture 21**: Multi-Agent RL (Jiayi Geng) [lecture note](./s25_material/lecture-21-multi-agent.pdf) * **Lecture 22**: ### Precept Notes * **Week 1**: [precept note](./s25_material/precept-1-notes.pdf) * **Week 2**: [precept note](./s25_material/precept-2-notes.pdf) * **Week 3**: [precept note](./s25_material/precept-3-notes.pdf) * **Week 4**: [precept note](./s25_material/precept-4-notes.pdf) * **Week 5**: [precept note](./s25_material/precept-5-notes.pdf) * **Week 6**: No precept * **Week 7**: No precept * **Week 8**: [precept note](./s25_material/precept-8-notes.pdf) * **Week 9**: [precept note](./s25_material/precept-9-notes.pdf) ### Course Staff ![[Ben Eysenbach](https://ben-eysenbach.github.io/)](assets/s25/ben.jpg height=150) ![[Yulai Zhao](https://yulaizhao.com/)](assets/s25/yulai.png height=150) ![[Zihan Ding](https://quantumiracle.github.io/webpage/)](assets/s25/zihan.jpg height=150) ![[Jiayi Geng](https://jiayigeng.github.io/)](assets/s25/jiayi.jpeg height=150) ![[Cathy Ji](mailto:cj7280@princeton.edu)](assets/s25/cathy.png height=150) ![[Kaixuan Huang](https://hackyhuang.github.io/)](assets/s25/kaixuan.jpg height=150) ![[Alice Hou](mailto:ah5087@princeton.edu)](assets/s25/alice.jpg height=150) ![[Amanda Wang](mailto:aw4309@princeton.edu )](assets/s25/amanda.jpg height=150) ![[Leo Yu](mailto:ly4431@princeton.edu)](assets/s25/leo.jpg height=150) ------------