CS 6500 - First Programming Project - Due June 15th This project asks you to systematically explore and analyze a class of algorithms. First, pick one of the following four topics: + String matching: Chapter 34; + Computational geometry: Chapter 35; + Hashing: Chapter 12; + Graph algorithms: Chapter 23. After selecting a topic and reading the corresponding chapter, you should select the algorithms and their variants that you choose to study. For example, if you choose to study String Matching, you should study the following approaches: + Naive algorithm + Rabin-Karp + Finite automata + Knuth-Morris-Pratt + Boyer-Moore Some chapters, like String Matching and Hashing, are devoted to a set of comparable algorithms. The others, Computational Geometry and Graph Algorithms, study several algorithms. In the latter case, you should select the most central problem described in the chapter. For example, Computational Geometry focuses on finding the Convex Hull of a set of points, and Graph algorithms focuses on Search. You should also look for variations concerning representation. For example, graphs can be represented with adjacency matrices and with edge lists. For your problem, you should end up with a set of variants to compare. After you have selected the variants, you should perform a complexity analysis on their running times using the approaches described in the class. All variants should be treated the same way. For example, in comparing sorting algorithms, you could choose to look at all operations or just comparisons. (Of course, sorting is not one of the options for this assignment.) But you should do the same for all variants. In the cases where the chapter presents an analysis, you should recreate it in your own words. Summarize the results in a table which, for each variant, describes its performance growth as the input increases in size. After you have analyzed the variants, you should implement them. In the implementation, you should strive to produce comparable results. For example, it would be unfair to include I/O in one variant and not in another. Part of the implementation includes testing your algorithms extensively enough that you have confidence that they are correct. Next, you should prepare some data sets with which you will compare the performance of your variants. The data sets should be of two types. In the first type, you should have pathological data. In the sorting example, you might have data that is already sorted, data that is completely out of order, and data that is all the same. The second type of data is random data of various sizes. In the sorting example, you might have data sets of size 1, 10, 100, 1000, 10000, 100000, and 1000000. Now decide how you are going to measure your tests. For example, you might choose, to time an entire test execution using the UNIX time command. Or you might embed calls to the system timer in your code. Or you might simply count the number of times a given operation or function call is made. You might also wish to consider averaging several runs of the same size and compute the variance. Run your experiments. For each variant and each size, your output should be a pair indicating the size of the input and the execution time of the variant. Finally, prepare an analysis of your results. Do the measured times conform to the analytical predictions? If not, why not. (In this case you might want to vary your experimental design and try again.) What happened with the pathological data? What practical considerations compromise theoretical ones? Which variants would you choose for which kinds of problems. To summarize, here is what you should turn in. Part 1 List and describe the variants that you intend to compare. Part 2 For each variant, provide an analysis that predicts the running time as a function of the size of the input. Summarize the results in a table. Part 3 Implement and test your variants. Turn in a listing of the code, a list of the test cases, and the output from running the tests on the code. You don't need to turn in a listing of your scaffolding code for doing I/O or otherwise setting up the execution, just include the code for the algorithm itself. Part 4 Prepare pathological and random data sets of various sizes with which to compare your variants. Turn in a short description of why you chose the pathological cases that you did. Also, include a description of your strategy for randomization and your method of generating the random data. Note that these two are different. In the sorting example, the former might indicate that you chose uniformly distributed integers in the range [-1000..+1000] and why. The latter would indicate that you used the C library rand function with a predetermined seed. Part 5 Provide a short description of how you are going to measure the performance of your variants. Part 6 Prepare a summary table, indicating for each variant and each size, your measurement of the performance of the variant. Also graph the results in a common graph. The X axis should indicate size. Your may want to use a logarithmic scale. The Y axis should indicate performance. Use different colors or line styles to denote the different variants. Include a legend to annote your choices. Excel or gnuplot are two packages that can perform the plotting for you. Part 7 Turn in your analysis. -------------------- Note: you may work in groups of three on this project. Also, you are not limited to the variants described in the book. If you are aware of others, you may try them as well.