A Data-Driven Emotional Lexicon

Many tasks of natural language processing requires the understanding of emotions manifested in texts. However, the same word may carry different emotional color in different contexts. In this work, I attempt to build an emotional lexicon by utilizing a large corpus of fictional works, in the hope that such a lexicon can facilitate the understanding and generation of fictional texts.

Our corpus includes more than 9000 books (the complete list) from Project Gutenberg that are labeled as fiction and are written in English. Based on the principle that neighboring words carry similar sentiments, we propagate sentiment values from SentiWordNet using a Gaussian kernel function. We find this procedure can correct errors in SentiWordNet and allow us to generate better stories.

The resulted sentiment values, which are used in my paper in the Social Believability Workshop, can be found here.