# COS 435 / ECE 433: Introduction to Reinforcement Learning
syllabus / schedule / Ed / Canvas+Gradescope
* Q: How do I join the waitlist? * [https://forms.gle/4hBk8NPkWaiWfk847](https://forms.gle/4hBk8NPkWaiWfk847). No need to email us. * Q: Can I audit the course? No. * Lecture: Tuesday and Thursdays 11:00am - 12:20pm. (Thomas Lab 003) * Precepts: * **Zihao Li**: Thursday from 1:30pm - 2:20pm. (Friend Center 006) * **Yulai Zhao**: Thursday from 3:30pm - 4:20pm. (Sherrerd Hall 101) * **Kurtland Chua**: Friday from 11:00pm - 11:50pm. (Friend Center 006) * Office Hours (no office hours during Spring Break): * **Ben Eysenbach**: Tuesday at 4:30pm – 5:30pm in CS Building 416 * **Mengdi Wang**: Sign up for OH: [https://calendly.com/wangmd03/ece433-cos435-1-1-oh-mengdi-wang](https://calendly.com/wangmd03/ece433-cos435-1-1-oh-mengdi-wanghttps://calendly.com/wangmd03/ece433-cos435-1-1-oh-mengdi-wang) * **Kurtland Chua**: Friday at 2:30pm – 4:30pm in CS Building 302 * **Yulai Zhao**: Thursday at 7:30pm – 9:30pm in Friend Center 010 * **Zihao Li**: Wednesday at 1:30pm – 3:30pm in 4th Floor CS Space (02/14 and 02/21), in EQuad J401 (02/28 and onward) * Prerequisites: linear algebra (e.g., [MAT202](https://registrar.princeton.edu/course-offerings/course-details?courseid=004150&term=1242), [COS302](https://registrar.princeton.edu/course-offerings/course-details?courseid=015411&term=1242)), machine learning (e.g., [COS324](https://registrar.princeton.edu/course-offerings/course-details?courseid=014294&term=1242), [ECE435](https://registrar.princeton.edu/course-offerings/course-details?courseid=014725&term=1242)) * Questions? Ask on [Ed](https://edstem.org/us/courses/54890/discussion/). See the instructors during class to get added to Ed. * Final Project! The specs can be found [here](https://docs.google.com/document/d/16mR1vU1GXz0mh5d0IKbuWOKcRWyUreq0WYwSFahQY0I/edit#heading=h.kpt5f3i8l3go). ![...](ideogram.jpeg width=200px) _**Reinforcement learning (RL)** is a core technology at the heart of modern intelligent systems that learn to make good decisions in complex environments. It encompasses technologies such as continuous variable optimization, Q learning, neural networks, policy search, and bandit exploration. In this course, we aim to give an introductory overview of reinforcement learning, its core challenges, and approaches, including exploration and generalization. In parallel, we will present a collection of case studies from intelligent systems, games and healthcare. Through a combination of lectures, written assignments and coding assignments, students will become well-versed in key ideas and techniques for RL._ ### Assignments We have provided both the assignment and the TeX file for use as a template. Please type up your solutions using TeX and submit both your compiled PDF and finished .ipynb on Gradescope. * **Homework 0**: [[PDF](hw/hw0/hw0.pdf)] and [[TeX](hw/hw0/hw0.tex)] and [[.ipynb](hw/hw0/hw0.ipynb)] [due 2/4 11:59 pm] * **Homework 1**: [[PDF](hw/hw1/hw1.pdf)] and [[TeX](hw/hw1/hw1.tex)] and [[.ipynb](hw/hw1/hw1.ipynb)] [due 2/12 11:59 pm] * **Homework 2**: [[PDF](hw/hw2/HW2.pdf)] and [[TeX](hw/hw2/HW2.tex)] and [[.ipynb](hw/hw2/HW2.ipynb)] and [[datasets](hw/hw2/HW2_datasets.zip)] [due 2/19 11:59 pm] * **Homework 3**: [[PDF](hw/hw3/HW3_v2.pdf)] and [[TeX](hw/hw3/HW3_v2.tex)] and [[.ipynb](hw/hw3/HW3.ipynb)] [due 2/26 11:59 pm] * **Homework 4**: [[PDF](hw/hw4/HW4.pdf)] and [[TeX](hw/hw4/HW4.tex)] and [[.ipynb](hw/hw4/HW4.ipynb)] [due 3/4 11:59 pm] * **Homework 5**: [[PDF](hw/hw5/HW5.pdf)] and [[TeX](hw/hw5/HW5.tex)] and [[.ipynb](hw/hw5/HW5.ipynb)] [due 3/25 11:59 pm] * **Homework 6**: [[.ipynb](hw/hw6/HW6.ipynb)] [due 4/1 11:59 pm] * **Homework 7**: [[PDF](hw/hw7/HW7.pdf)] and [[TeX](hw/hw7/HW7.tex)] and [[.ipynb](hw/hw7/HW7_MaxEnt.ipynb)] [due 4/9 11:59 pm] * **Homework 8**: [[PDF](hw/hw8/HW8.pdf)] and [[TeX](hw/hw8/HW8.tex)] and [[.ipynb](hw/hw8/HW8_PPO.ipynb)] [due 4/19 11:59 pm] ### Solutions * **Homework 0**: [[Written](hw/hw0/hw0sol.pdf)] * **Homework 1**: [[Written](hw/hw1/hw1sol.pdf)] * **Homework 2**: [[Code](hw/hw2/HW2_solns.ipynb)] * **Homework 3**: [[Written](hw/hw3/hw3sols.pdf)] * **Homework 4**: [[Written](hw/hw4/hw4sols.pdf)] [[Code](hw/hw4/HW4_Sols.ipynb)] * **Homework 5**: [[Code](hw/hw5/HW5_solns.ipynb)] * **Homework 6**: [[Code](hw/hw6/HW6_solns.ipynb)] * **Homework 7**: [[Code](hw/hw7/hw7_soln.ipynb)] * **Midterm** [[Solutions](exam/midterm_sol.pdf)] ### Lecture Notes * **Lecture 1**: [Introduction Slides](https://docs.google.com/presentation/d/1Rx_9yWElFHeQlW1Rg1oQ1yDy_lsr-ODINKXDp3RdcuU/edit#slide=id.g292866793b8_0_8) * **Lecture 2**: [Multi-arm Bandits, e-Greedy, UCB](notes/lecture2.pdf) * **Lecture 2 (math)**: [Math Review](notes/lecture2_math_review.pdf) * **Lecture 3**: [Contextual Bandits, Markov Chains, MDPs](notes/lecture3.pdf) * **Lecture 4**: [The RL Problem](notes/lecture_4_rl_problem.pdf) * **Lecture 5**: [Imitation Learning](notes/lecture_5_imitation_updated.pdf) * **Lecture 6**: [Value Functions and Dynamic Programming](notes/lecture6.pdf) * **Lecture 7**: [Policy Gradient](notes/lecture7.pdf) * **Lecture 8**: [REINFORCE, Value Iteration](notes/lecture8.pdf) * **Lecture 9**: [Q-learning](notes/lecture9.pdf) * **Lecture 10**: [DQN](notes/lecture10.pdf) * **Lecture 11**: [Actor-Critic](notes/lecture11.pdf) * **Lecture 13**: [Actor-Critic Continued](notes/lecture13.pdf) * **Lecture 14**: Model-based RL (Kurtland's Lecture) [Slides](notes/lecture14.pdf) [Draft Notes (WIP)](notes/lecture14-draft-notes.pdf) * **Lecture 15**: [Model-based RL for Actor Critic Methods](notes/lecture15.pdf) [Slides](https://docs.google.com/presentation/d/1Jy7FRMnWS5CZPTjaYqDWRssWMNfiKlCeKtVdx7ZGYKQ/edit#slide=id.g2c601cf71ff_0_40) * **Lecture 16**: [Guest lecture](notes/princeton_talk_yuandong_tian.pdf) * **Lecture 17**: [MaxEnt Methods](notes/lecture17.pdf) [Slides](https://docs.google.com/presentation/d/1GRr1Lsy5q28pytX6RBmW9tDHkaObJpQRsYT_Y-mVxuc/edit#slide=id.g2c7a1190c7d_0_48) * **Lecture 18**: [Inverse RL and Intent Inference](notes/lecture18.pdf) * **Lecture 19**: [Advanced Policy Gradient Methods](notes/lecture19_notes.pdf) [Slides](notes/lecture19.pdf) * **Lecture 20**: [Game Theory and MARL](notes/lecture20.pdf) * **Lecture 21**: [Guest Lecture on RLHF by Leqi Liu](notes/leqi_princeton_rl_lec.pdf) * **Lecture 22**: [Goal-conditioned RL](notes/lecture22.pdf) ### Precept Notes * **Week 2**: [MDP Review](notes/week2_precept.pdf) * **Week 3**: [Imitation Learning and Value Functions](notes/week3_precept.pdf) * **Week 4**: [Dynamic Programming](notes/week4_precept.pdf) * **Week 5**: [Q-learning+DQN](notes/week5_precept.pdf) * **Week 6**: [A2C+DDPG+TD3](notes/week6_precept.pdf) * **Week 7**: [MBRL](notes/Week_7_Precept_Notes.pdf) * **Week 8**: [MaxEnt + InvRL](notes/Week_8_Precept_Notes.pdf) * **Week 9**: [PPO](notes/Week_9_Precept_Notes.pdf) ### Course Staff ![[Mengdi Wang](https://mwang.princeton.edu/)](https://ece.princeton.edu/sites/g/files/toruqf1836/files/styles/3x4_750w_1000h/public/people/wang01.jpeg height=150) ![[Ben Eysenbach](https://ben-eysenbach.github.io/)](https://ben-eysenbach.github.io/assets/img/prof_pic.jpg height=150) ![[Yulai Zhao](https://yulaizhao.com/)](https://yulaizhao.com/images/headshot.jpg height=150) ![[Kurtland Chua](https://kchua.github.io/)](kurtland.jpg height=150) ![[Zihao Li](https://zihaoli0629.github.io/)](zihao.jpg height=150) ![[Ananya Parashar](https://jrc.princeton.edu/people/ananya-parashar)](https://avatars.githubusercontent.com/u/53277109?v=4 height=150) ![[Alex Zhang](https://alexzhang13.github.io)](alzhang.jpg height=150) ------------