Existing graph processing frameworks greatly improve the performance of memory subsystem, but they are still subject to the underlying modern processor, resulting in the potential inefficiencies for graph processing in the sense of low instruction level parallelism and high branch misprediction. These inefficiencies, in accordance with our comprehensive micro-architectural study, mainly arise out of a wealth of data dependencies, serial semantic of instruction streams, and complex conditional instructions in graph processing. In this paper, we propose that a fundamental shift of approach is necessary to break through the inefficiencies of the underlying processor via the dataflow paradigm. It is verified that the idea of applying dataflow approach into graph processing is extremely appealing for the following two reasons. First, as the execution and retirement of instructions only depend on the availability of input data in dataflow model, a high degree of parallelism can be therefore provided to relax the heavy dependency and serial semantic. Second, dataflow is guaranteed to make it possible to reduce the costs of branch misprediction by simultaneously executing all branches of a conditional instruction. Consequently, we make the preliminary attempt to develop the dataflow insight into a specialized graph accelerator. We believe that our work would open a wide range of opportunities to improve performance of computation and memory access for large-scale graph processing.