The Joint Effects of Tweet Content Similarity and Tweet Interactions for Topic Derivation
Robertus Nugroho, Weiliang Zhao, Jian Yang, Cecile Paris and Surya Nepal
Macquarie University, Macquarie University, Macquarie University, CSIRO – ICT Centre, CSIRO

Interactions among tweets, i.e., mentions, retweets, replies, are important factors contributing to the quality of topic derivation on Twitter. If applied correctly, the incorporation of tweet interactions can significantly improve the quality of topic derivation in comparison with approaches that are mainly based on the content similarity analysis. However, how interactions can be measured and integrated with content similarity for topic derivation remains a challenge. In previous work, the strength of tweet-to-tweet relationship has been computed by simply adding measures for content similarity, mentions, and reply-retweets. This simple linear addition does not accurately reflect the various impacts these factors have on tweet relationships. In order to address this issue, we propose a joint probability model that can effectively integrate the effects of the content similarity, mentions, and reply-retweets to measure the tweet relationship for the purpose of topic derivation. The proposed method is based on matrix factorization techniques, which enables a flexible implementation on a distributed system in an incremental manner. Experimental results show that the proposed model results in a significant improvement in the quality of topic derivation over existing methods.