Networked Stochastic Multi-Armed Bandits with Combinatorial Strategies
Shaojie Tang, Yaqin Zhou, Kai Han, Zhao Zhang, Jing Yuan and Weili Wu
University of Texas at Dallas, SUTD, University of Science and Technology of China, Zhejiang Normal University, University of Texas at Dallas, University of Texas at Dallas

In this paper, we investigate a largely extended version of classical MAB problem, called networked combinatorial bandit problems. In particular, we consider the setting of a decision maker over a networked bandits as follows: each time a combinatorial strategy, e.g., a group of arms, is chosen, and the decision maker receives a reward resulting from her strategy and also receives a side bonus resulting from that strategy for each arms neighbor. This is motivated by many real applications such as on-line social networks where friends can provide their feedback on shared content, therefore if we promote a product to a user, we can also collect feedback from her friends on that product. To this end, we consider two types of side bonus in this study: side observation and side reward. Upon the number of arms pulled at each time slot, we study two cases: single-play and combinatorial-play. Consequently, this leaves us four scenarios to investigate in the presence of side bonus: Single-play with Side Observation, Combinatorial-play with Side Observation, Single-play with Side Reward, and Combinatorial-play with Side Reward. For each case, we present and analyze a series of zero regret polices where the expect of regret over time approaches zero as time goes to infinity. Extensive simulations validate the effectiveness of our results.