HOMEWORK 4: Implementing Recognizers

This is an INDIVIDUAL assignment.


In this assignment we'll shift gears away from low-level Swing stuff, and explore how to build recognizers for digital ink and gestures. You'll learn how to implement a simple recognizer that's robust enough to work for a wide range of gestures, and how to integrate such a recognizer into your application.

This assignment is to do an implementation of the SiGeR recognizer discussed in class and the lecture notes, and integrate it into the Photo Album application. You'll use the recognizer to detect command gestures that will let you do things like tag photos using the pen/mouse, and move and delete on-screen annotations, and delete photos, without having to use the regular GUI controls.

The learning goals for this assignment are:

Please see the note flagged IMPORTANT in the deliverables section at the end of this document.


In this homework, we'll implement the SiGeR recognizer and integrate it into your Photo Album application as a way to perform commands (meaning: certain gestures will be recognized as commands to control the application, rather than as simple digital ink for annotation).

You should extend your application to allow gestures to be performed either on the photo (to tag and/or delete the photo), or the flipped annotation side (to allow tagging, deletion of the photo, or movement or deletion of annotations). We'll use a modal method of telling the system which strokes should be interpreted as command gestures. Strokes made with the "normal" (left) mouse button down should be interpreted as ink, while strokes made with the "context/menu" (right) button should be interpreted as command gestures. This means that in the code that handes strokes you'll now need to look at the modifiers to figure out whether to process the strokes as ink or pass them off to the recognizer.

Command gestures should have a different visual appearance than ink, and then disappear once the command has been completed. For example, if your ink is black, you might draw command gestures in red, and then have them disappear once the gesture has been completed.

You should define several distinct templates for different sorts of command gestures. There are two gestures that should work on either the flipped or unflipped photo, and two gestures that only need to work on the flipped (annotation) side of the photo:

These gestures should let you do the following:

Implementing the Recognizer

See the slides for details on how to implement SiGeR. Here are a few additional tips.

Decide on the representations you want to use first. By this I mean, figure out how you'll specify templates in your code, and how you'll represent the direction vector of input strokes.

I'd suggest defining a bunch of constants for the 8 ordinal directions SiGeR uses (N, S, E, W, NE, NW, SE, SW). Both direction vectors and templates will be defined as lists of these. You may also want to define some special "combination" constants (a "generally east" constant that means either E, SE, or NE for example). These latter combination constants will only be used in the definition of templates, not in the vector strings you will produce from input gesture strokes. In other words, they allow you to define your templates a bit more loosely than just the specific 8 directions.

While defining such a set of human-readable constants isn't specifically necessary (you could just do everything in terms of the strings and regexp patterns described below), it can be very helpful for debugging to be able to write a template as a set of directions, rather than a raw regexp pattern.

Next, write a routine that takes an input gesture and produces the direction vector from it. In other words, given a series of points, it'll spit out a vector containing a list of the 8 ordinal direction constants. This direction vector represents the true shape of the input gesture.

Here's the only tricky part: you'll need to write a routine that turns the direction vector into an actual string of characters that contain the same information as in the vector, and another routine that takes the template info and produces a regexp pattern from it. The idea is that you'll see if the regexp pattern for the template actually matches the stringified representation of the direction vector.

There's a lot of flexibility in how you define the symbols in these strings. For the direction vector string, you'll probably just have a series of combinations of 8 separate letters, each representing one of the 8 ordinal directions.

For the regexp pattern, you'll want to generate a pattern that can match occurrences of any of these 8 letters, as well as "either-or" combinations of them ("generally easy" for example, might be a pattern that matches either the letters representing E, SE, or NE). You'll also need to generate a pattern that can deal with "noise" at the first and end of the input string. The slides have some examples that show how to do this.

The actual matching process will then just compare an input stroke string to the list of template patterns, and report which (if any) it matches.

Defining the Templates

Your templates will be defined in your code (unless you get really fancy and save them to a config file or something), most likely as a set of declarations that look something like this:
Remember that you'll need to define templates for each of your tagging buttons, as well as photo delete, content delete, and move. It may take a bit of tweaking to come up with a gesture set that's distinguishable, and may also require some tweaks to define the templates at the proper level of specificity.

Integrating the Recognizer

Remember that we're using a mode to distinguish ink (left mouse button) input versus gesture input (right mouse button). Gesture input should be drawn on screen while the gesture is being made, so that it provides feedback to the user. The gesture should disappear once the mouse is released.

One way to get this effect is to augment your BasicPhotoUI slightly, to keep a reference to the single current gesture being drawn (which may be null if no gesture is in progress). The paint code then draws the stroke and text display lists from the model, then the current gesture (if there is one), so that the gesture appears over the top of everything else.

Note that since the gesture is only a transient feature, it's ok to put this in the BasicPhotoUI, and not the model.

When the gesture is complete, it can be removed from the set of items to be displayed, and handed off to the recognizer to be processed.

If the gesture is not recognized, you should indicate this by displaying a message in the status bar that says "unrecognized gesture" or something.

If the gesture is recognized, what you do next depends on exactly what command was recognized.

For one of the tagging gestures, you should just update the tag data associated with that photo; make sure that any changes are reflected in the state of the status buttons also. Making a tag gesture on a photo that already has that tag should remove it from the photo.

The delete photo gesture should work just the same as the Delete Photo menu item.

For the delete content gesture, once you've recognized the gesture you need to figure out what object(s) the gesture was drawn over. The way I'd suggest doing this is to take the bounding box of the gesture and then identifying which objects have bounding boxes that are strictly contained in the gesture's box. Deleted items should simply be taken out of the display list(s) so that they do not appear; be sure to notify your listeners when the model changes so that the display is updated correctly.

The select-to-move gesture is perhaps the weirdest, because it introduces a new mode into the UI. First, when items are selected (perhaps through a circling gesture), they should be drawn differently on screen to indicate that they are selected. Again, you probably want to compare the bounding box of the gesture to the bounding boxes of the objects to determine what objects the user intends to select.

You need to keep track of the selected objects in some way; I'd suggest adding yet another item to the BasicPhotoUI, which is a list of the selected objects (strokes or text); the paint() code will then display anything in this list differently (through color or a highlighted bounding box or whatever). Again, since this is transient state it does not need to go in the model, so can live in the BasicPhotoUI.

As long as something is selected, you're potentially in "move mode." My suggestion for how to implement this is to look at any mouse press that happens; if something is selected, and the mouse press is inside one of the selected objects, then dragging the mouse moves the object (which should just be a matter of updating the X,Y coordinates of the items in the model, and notifying any listeners). If the press happens outside of a selected item, you can "de-select" the selected stuff (take it out of the selected list and just draw it normally). This ends "move mode." The basic behavior here should be much like any paint program--when something is selected you can click into it to drag it; but as soon as you click outside, the object is de-selected.

Extra Credit

As usual, there are a lot of ways to make this fancier than described.

As usual, if you do something else that's above and beyond the call of duty, let us know in your README file and we may assign some extra credit for it.


This is an INDIVIDUAL assignment; while you may ask others for help on Java or Swing details, please write the code on your own. While you can use any development environment you choose, you are not allowed to use "GUI builder" type tools (like JBuilder or Eclipse GUI Builder).

To turn in the assignment, please follow the same process as last time:

0. Make sure your program is runnable from the command line using the command "j ava PhotoAlbum". Use exactly this name (with no package) to make things easier on me and the TAs.

1. Create a new directory using your last name as the name of the directory.

2. Compile your application and place both the sources and classfiles into this directory (they can be at the top-level or in a subdirectory, whatever you want).

3. Put a README.txt file (described below) into the top level of the directory. This file should contain your name and email address, the version of Java you used (1.5.x or 1.6.x, please) as well as any special info I might need in order to run your code (command line arguments, etc.)

4. ZIP this directory and submit it via T-Square (instructions are here).

If you do any extra credit work, be sure to let us know about it in the README file so that we don't miss it!!

IMPORTANT: I'm letting people come up with their own gesture vocabulary for this project. What this means though is that YOU MUST provide a description to us of what that gesture set is. Remember that SiGeR gestures are directional also (a square bracket started at the top is a different gesture than a square bracket started at the bottom). You need to provide us with enough detail that we're not having to reverse engineer your code to figure out how to make your gestures. A short graphical cheat-sheet that shows the actual gestures you're using would be a great way to do this. We will deduct points from people that make us spend lots of time to figure out your gestures!

Please take care to remove any platform dependencies, such as hardcoded Windows path names or dependence on a particular look-and-feel that may not exist on all platforms. Also, if you use any images in your application, please make sure that you include these in your ZIP file and that your code will refer to them and load them properly when from from inside the directory that's created when I unZIP your code.

Grading for this assignment, and future assignments, will roughly follow this breakdown:

Please let the TA or me know if you have any questions.