How to Cite the Downloads

If you use these data in your research, please cite the original source and/or one of the following:

David A. Bader, Andrea Kappes, Henning Meyerhenke, Peter Sanders, Christian Schulz and Dorothea Wagner. Benchmarking for Graph Clustering and Partitioning. In Encyclopedia of Social Network Analysis and Mining, pages 73-82. Springer, 2014.

David A. Bader, Henning Meyerhenke, Peter Sanders, Dorothea Wagner (eds.): Graph Partitioning and Graph Clustering. 10th DIMACS Implementation Challenge Workshop. February 13-14, 2012. Georgia Institute of Technology, Atlanta, GA. Contemporary Mathematics 588. American Mathematical Society and Center for Discrete Mathematics and Theoretical Computer Science, 2013.

Downloads

The testbed for the Challenge comprises a large number of different graphs, both synthetic and real-world ones, coming from different applications. Please follow the individual links to reach their respective download pages with more information. The complete testbed, for example downloadable with the command wget -r -np http://www.cc.gatech.edu/dimacs10/archive/data/, has a size of roughly 20GB.

We are still looking for more instances, in particular those based on real-world applications. If you can share your data, please contact us. Your contribution is very much appreciated!

Note that Tim Davis also included the current state (Apr 15, 2011) in the University of Florida Sparse Matrix Collection. At this site he also made his MATLAB® code available with which graphs can be imported to MATLAB from the Metis format.

File Formats

We use the popular Metis input/output formats for I/O (and extend them in certain cases). This applies to files for input graphs and solutions (partition/cluster assignments). All files are simple ASCII text files. In the following descriptions, n denotes the number of vertices in the graph.

Graph Format

Note that for the Challenge the input graphs might be seen as unweighted, i. e. weighted instances might be seen as having unit weights for both vertices and edges. Details on this aspect will be announced later.

Following the Metis User's Guide, Version 4.0, the graph files may contain comment lines that begin with the character '%'. Such lines are ignored. At its simplest a correct input contains n+1 uncommented lines. The first of these lines contains either two, three, or four integers, separated by space. The first two obligatory entries are the number of vertices and the number of edges in the graph. Note that in this case the number of edges is only half of the sum of the vertex degrees. The third and fourth parameter in the first line are optional and control input of weighted graphs. For details on weighted graphs see Section 4.5.1 of the Metis User Guide, Version 4.0.

Important: There is one exception to the use of the format described in Section 4.5.1. If the graph contains self-loops and/or multiple edges, the third parameter in the first line is set to 100. Since Metis does not allow such graphs, this extension of the format is necessary. To exclude these graphs upfront would mean to exclude many real-world instances. Also note that in this case the second parameter is the actual number of edge entries in the file, i. e., the sum of the vertex degrees. This deviation is necessary due to self-loops. They do not appear twice in the file, as undirected (= bidirected) edges do.

In the unweighted case, the remaining n lines contain neighbor lists for each vertex from 1 to n in order. These lists are sets of integers separated by spaces and contain all the neighbors of a given vertex. The neighbors may be listed in any order. Note that vertices are numbered 1 to n, not from 0 to n-1. For possible extensions, please refer again to Section 4.5.1 in the Metis User Guide, Version 4.0. You will find there examples of small graphs and their respective representations, too.

Output Format

The partition/cluster assignment files have n uncommented lines. In line i the assignment value for vertex i is given as an integer between 0 and p-1, where p is the number of partitions.

Geometry Format

Vertex coordinates can optionally be supplied in a separate file. This file should have the same name as the graph file. But instead of the file extension .graph, the coordinate file has the extension .xyz. These coordinate files must have n uncommented lines, with line i containing the coordinates of vertex i. Each line must contain one, two, or three floating-point values, depending on the geometry type.