Brief
- Due: Tuesday Mar 19 11:55pm
- Starter code: starter code
- Submit to Gradescope
- This portion (HW3) counts 12% of your total grade
In this (short) homework, we will implement vanilla recurrent neural networks (RNNs) and Long-Short Term Memory (LSTM) RNNs and apply them to image captioning on COCO.
Note that this homework is adapted from the Standford CS231n course.
Setup
Assuming you already have homework 2 dependencies installed, here is some prep work you need to do. First, download the data:
cd cs231n/datasets
./get_assignment_data.sh
We will use PyTorch (v1.0) to complete the problems in this homework, and has been tested with python2.7 on Linux and Mac.
Part 1: Captioning with Vanilla RNNs (25 points)
Open the RNN_Captioning.ipynb
Jupyter notebook, which will walk you through implementing the forward and backward pass for a vanilla RNN, first 1) for a single timestep and then 2) for entire sequences of data. Code to check gradients has already been provided.
You will overfit a captioning model on a tiny dataset and implement sampling from the softmax distribution and visualize predictions on the training and validation sets.
Part 2: Captioning with LSTMs (25 points)
Open the LSTM_Captioning.ipynb
Jupyter notebook, which will walk you through the implementation of Long-Short Term Memory (LSTM) RNNs, and apply them to image captioning on MS-COCO.
Part 3: Train a good captioning model (15 points Extra Credit for CS4803DL, 5 points Regular Credit and 10 points Extra Credit for CS7643)
Using the pieces you implement in parts 1 and 2, train a captioning model that gives decent qualitative results (better than the random garbage you saw with the overfit models) when sampling on the validation set.
Code for evaluating models using the BLEU unigram precision metric has already been provided. Feel free to use PyTorch for this section if you’d like to train faster on a GPU. The start up code is provided in LSTM_Captioning.ipynb
.
Here is how the scoring is going to work for this section. For a BLEU score of 0.20-0.25 you get 5 points, 0.25-0.30 you get 10 points, and >0.30 gets you 15 points. For 4803, this section is completely Extra Credit. For CS7643, we want you to achieve a score of >0.20 which gives you 5 points Regular Credit. If you beat 0.25, you get Extra Credit according to the score you achieve.
Here are a few pointers:
-
Attention-based captioning models
-
Discriminative captioning
-
Novel object captioning
Summary Deliverables
Code Submission
Submit the results by uploading a zip file called hw3.zip
created with the following command
cd assignment/
./collect_submission.sh
Write-Up Submission
Convert all IPython notebooks to PDF files with the following command**
jupyter-nbconvert --to pdf filename.ipynb
Notes
- You should only upload ONE PDF file to the HW3 Writeup section, and then assign the pages properly as you did for PS3.
- You should upload
hw3.zip
, which includes no PDF file, to the HW3 Code section.
References: