The main component of this course will be an original research project.

Students will organize into groups and choose to implement a project that is:

  • Relevant to the topics discussed in class,
  • Requires a significant programming effort from all team members, and
  • Unique (i.e., two groups may not choose the same project topic).

The projects will vary in both scope and topic, but they must satisfy these criteria. Students are expected to address a novel problem or address an existing problem in a novel way. Students should actively consider submitting a paper at the end of the course to a top-tier conference in data management (SIGMOD, VLDB, etc.), like the ones in your reader. Since time is limited, however, the above goal is hard to reach, and I will reward those that aim high even if they do not completely succeed. The key is insuring that some aspects of your work are completely done; it is very hard to grade a project where the program did not quite work.

Each project group will present their project proposals to the class to get feedback from their peers. They will then meet individually with the instructor afterwards for additional discussion and clarification of the project idea. Late submissions will not be accepted.

Project Ideas

Projects typically fall under one of these categories:

  • Novel problem / task/ application.
  • Formulation: formulate a new model or algorithm for a new/old problem.
  • Application/survey: compare a bunch of algorithms on an application domain of interest. These make most sense if you are expecting some interesting trend or finding in the analysis.
  • Analysis: analyze an existing algorithm.


You are encouraged to come up with your own topic. Ideally, the topic will be related to your current research interests.

There can be value in combining your project with other work in one of two ways. First, if you have active research in a related area (e.g, an RA), you can do a deeper project by combining your current (supported) research with your course project. Second, it is also possible to do a single multi-disciplinary project to satisfy multiple courses. However, this must be coordinated with the other instructor(s) and you will be expected to do two (or more) projects worth of work. In all such cases, it is imperative that such overlaps be disclosed at the proposal stage so that we can address the scope of research that will be required.


Everyone has to work in a team of three (preferred) or two people (depending on enrollment). Groups are allowed to discuss high-level details about the project with others.

No more than 15 teams in the class total. You may combine this with another course project but must delineate the different parts.

Project Milestones

Each project is comprised of several tasks that are due at different times during the semester:

  • Project Proposal + Presentation: (Week 4) Each group will meet with the instructor to discuss their plans for the project and prepare a proposal.
  • Project Progress Report + Presentation:(Week 8) Each group will submit a revised version of the proposal describing accomplishments so far. Concentrate on describing sub-tasks completed, rather than the tasks started.
  • Code Reviews: (Weeks 6, 10) Each group is required to review the code of another group and provide feedback on system architecture, correctness, coding style, and assumptions.
  • Final Report + Presentation:(Week 12) Each group will present the final status of their project and submit a technical report along with the relevant source code and documentation.

Each group must use a single Github repository for all development.

Project Proposal + Presentation

Each group will meet with the instructor in private during office hours to discuss about project ideas. Each group will need to turn in a three page project proposal. This proposal should contain the following information:

  • What is the problem being addressed by the project?
  • Why is this problem important?
  • How will the team solve this problem?
  • How will you validate your implementation?
  • How will you evaluate its performance?
  • What resources will you need? (e.g., software, hardware, data sets, or workloads)

Your proposal should also provide three types of goals: 75% goals, 100% goals, and 125% goals. Think of these as the equivalent of a C grade, a B grade, and an A grade. The goals can be dependent or independent of the prior goals. Each group can meet individually with the instructor during office hours afterwards for additional discussion and clarification of the project idea. The instructor's experience can aid you in not re-discovering many past mistakes. Each group should submit their proposal report to autolab.

In addition to the written proposal, you will also present your project idea to the class in order to receive feedback from your peers. This 5-minute presentation should convince the instructor that your idea will pertain to the course, that you will be able to complete it, and that we will be able to evaluate it. Each group should send their slides to autolab.

Project Progress Report + Presentation

Each group will meet with the instructor in private and discuss the current status of the project. This will be a preview of the group's status update presentation in the subsequent class. Students should bring up any unexpected challenges or issues with their project implementation.

Each group will need to turn in a six page interim report on the the current status of their implementation. You should update the introduction and motivation based upon feedback from your proposal and any changes you've made as a result of your research so far. A re-iteration of your proposed goals, with explicit discussion about what progress you have made to date on those goals and what your time-line is for accomplishing the rest of them by the end of the semester. A well-filled-out background and related work section citing the appropriate work from the literature. A sketch of your evaluation section outlining your evaluation plan and listing (as simple as an itemized list if you wish) the major components you plan for your evaluation. Feel free to include additional material in your interim report: If you wish to include, e.g., your design section or preliminary results, we will give you feedback to help ensure that your final report is great.

The 5-minute presentation presentation should contain the following information:

  • An overview of the development status of their project as related to the goals discussed in the initial proposal.
  • Any information about whether the groups' original plans have changed and an explanation as to why.
  • Commentary about any surprises or unexpected issues that the group encountered during coding.

Concentrate on describing sub-tasks completed, rather than the tasks started. For example, say completed component X rather than saying started component modifications.

Code Reviews

Each group will be paired with two other groups and provide feedback on their code. The development group (i.e., the group that implemented the project) will provide the reviewing group with a brief summary of the files they want the reviewing group to examine. The reviewing group will later post their pull request URL on the course spreadsheet.

The Pull Request Date due date is when the development group should provide the reviewing group their pull request. The Review Date is when the reviewing group must complete their review and provide feedback. The development group will then have until either the next Code Review due date or the Final Code Drop due date to update their project in response to the last code review.

The code reviews do not need to be all done exactly on the due date but they must be done by the due date. The groups are free to schedule with each other when they are ready for the review. The grading for this will be based on participation in terms of both providing a useful review to other students as well as incorporating the feedback into their implementation. The review will be completed on Github.

Each group should consider the following questions when examining the code:

General Questions:

  • Does the code work?
  • Is the code as modular as possible?
  • Is there any redundant or duplicate code?
  • Is all the code easily understood?

Documentation Questions:

  • Do comments describe the intent of the code?
  • Are all functions commented?
  • Is any unusual behavior described?
  • Is the use of 3rd-party libraries documented?

Testing Questions:

  • Do tests exist and are they comprehensive?
  • Are the tests actually testing the feature?

Final Report + Presentation

Each group will need to turn in a ten page final report and give a 5-minute presentation on the final status of their project. This presentation should contain the following information:

  • A re-iteration of your proposed goals, with explicit discussion about what progress you have made to date on those goals.
  • A discussion of how you tested the correctness of your implementation.
  • A discussion of the experimental results that the group collected to evaluate their implementation.
  • An outline of concrete tasks for future work to expand or improve your implementation.

All group members should deliver part of the talk. The talk should give highlights of the final report, including the problem, motivation, results, conclusions, and possible future work. Time limits will be enforced to let everyone present. Please, practice your talk to make it better and see how long it is. Have a plan for what slides to skip if you get behind.

Formatting Guidelines

All reports should be written using the following format guidelines:

  • Double-column
  • Single-spaced
  • 10pt font
  • Reasonable margins

I suggest that you write all of your documents using LaTeX. It is the de-facto tool in which most CS research papers are written. Although it has a bit of start up cost, it is much easier to collaboratively write complex research papers using LaTeX than with Microsoft Word. You can use the ACM SIG Proceedings Templates alternate Latex style for your reports.


The above ideas are graciously borrowed from courses developed by Andy Pavlo.