CS 4803/7643 Deep Learning - Coding Questions for HW4
In this homework, we will implement algorithms for two kinds - (1) dynamic programming and (2) reinforcement learning with Q-Learning for solving Markov Decision Processes (MDPs).
Note that this homework is adapted from the Stanford CS234: Reinforcement Learning Winter 2019 course.
Download the starter code here.
Setup
Python 3.7.X is required for this assignment. Either install it directly or create a virtual environment with conda:
conda create -n hw4 python=3.7
First, install dependencies in requirements.txt
pip install -r requirements.txt
Then, install PyTorch 1.2 from pytorch.org - either the CPU or GPU version depending on what your machine supports.
Part 1: Dynamic Programming (30 points)
Open the jupyter notebook dynamic_programming/dp.ipynb
and follow the instructions to implement policy iteration (policy evaluation + policy improvement) and value iteration.
Part 2: Q-Learning and Deep Q-Networks (30 points + 5 bonus points)
Open the jupyter notebook q_learning/q_learning.ipynb
and follow the instructions to implement parts of the Q-Learning training procedure and two types of functions for Q networks - a linear Q network and a convolutional Q network.
Submission
Step 1: In dynamic_programming/dp.ipynb
, make sure you run the entire notebook with the RENDER_ENV
variable set to False
, before proceeding to the next step.
Step 2: Convert your notebook files to PDF
Option 1: Install wkhtmltopdf and run bash convert_to_pdf.sh
to generate the PDF files.
Option 2: If option 1 installation doesn’t work, run bash convert_to_pdf.sh --no-pdf
and manually generate the pdf files by opening the each intermediate HTML file (dynamic_programming/dp.html
and q_learning/q_learning.html
) in your browser and saving them as PDFs with the name (dynamic_programming/dp.pdf
and q_learning/q_learning.pdf
).
Step 3: Submit the two PDF files generated in Step 2 to the assignment titled “HW4” in Gradescope. Assign all pages of dynamic_programming/dp.pdf
to Question 5.1, and all pages of q_learning/q_learning.pdf
to Question 5.2.
Step 4: Run bash collect_submission.sh
to generate hw4.zip
. Submit this zip file to the assignment titled “HW4 Code” in Gradescope.