CS 4641 Project 3: Markov Chains and Reinforcement Learning

Due: 2012-08-01

Markov Chains

The Resources/Project 3 folder on T-Square contains a file named markov-chains.jar and several sample corpus files. The markov-chains.jar program reads corpus files contained in a directory on startup to train language models, then presents the user with a prompt at which to enter sentences for prediction. Assuming you have corpus files contained in a data/ directory, you can run the markov-chains.jar program like this:

$ ls data/
english.corpus       french.corpus   hiphop.corpus   lisp.corpus     spanish.corpus
$ java -jar markov-chains.jar -d data/
Training lisp model.
Training hiphop model.
Training english model.
Training french model.
Training spanish model.
Enter Q to quit or a sentence to get a source prediction.
> rollin with my homies
List((hiphop,0.6932792936519998), (english,0.30666894659680843),
(lisp,3.769545575443409E-5), (french,8.521692909800597E-6),
(spanish,5.542602527486509E-6))
> q
$

markov-chains.jar uses corpora files ending in .corpus to train language models that are used to predict the source of sentences typed at the prompt. Each corpus file contains sentences that end in "." characters. (A future version will recognize arbitrary punctuation.) Your task in this part of Project 3 is to create a language corpus for another "language." Run markov-chains.jar with your corpus and the suplied corpora and report the resulting predictions.

Note: a nice extra credit project would train markov-chains.jar on corpora that represent different kinds of text, that is, not languages but text that comes from identifiable sources.

The source for markov-chains is on GitHub: https://github.com/csimpkins/markov-chains

Reinforcement Learning

Borrow or create an MDP solver and Q-Learning implementations and run them on a world of your design. The world you create should contain less than 1000 states, and it's easy to make an interesting grid world with around 25 states. Report the optimal policy found by your MDP solver (value or policy iteration) and the learning curve of your Q-learner.

Project Submission

Submit a Zip archive or gzipped tarball called <yourgtaccount>-project3.{zip|.tar.gz} with the following contents: