With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high-performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the effectiveness of proposed approach.