Segmentation of Time Series based on Kinetic Characteristics for Storage Consumption Prediction
Beibei Miao, Yu Chen, Xuebo Jin, Bo Wang, Xianping Qu, Dong Wang, Shimin Tao and Zhi Zang
Baidu, Inc, Baidu, Inc, School of Computer and Information Engineering, Beijing Technology and Business University, Baidu, Inc, Baidu, Inc, Baidu, Inc, Baidu, Inc, Baidu, Inc

The Internet services generate huge amount of data, which require large space for storage. Determining device purchase plan turns out to be very important for the service providers. Under-purchasing might lead to data loss, while over-purchasing would result in waste. In this paper, we propose a linear regression based approach to predict the storage demand according to the time series of the storage consumption. We partitioned the storage consumption time series into several linear segments, and perform prediction on the last segment using linear regression. Since the position of turning points between adjacent segments and the total number of the segments are both unknown, how to achieve the online segmentation becomes a big challenge. Aiming to solve this problem, we carried out the KalmanAnova segmentation method. Experiment results show that our method has good accuracy in precision, recall and F-measure values. Moreover, the method is able to segment nonlinear time series as well, suggesting a potential wider application. The proposed method has been deployed in Baidu Inc. and saves about 45 thousand dollars in one of its device purchase program.