HOMEWORK 4: Interpreting and Processing Digital Ink

This is an INDIVIDUAL assignment.

Objective

In this assignment we'll continue our focus on digital ink. Here, we'll explore how to use and extend the recognizer we created in the previous assignment in order to tell the Courier application to "interpret" pre-existing regions of digital ink in certain ways.

The learning goals for this assignment are:

Description

In this homework, we'll extend the SiGeR recognizer built in the previous assignment to do some new things. In particular, we'll add some new gestures that tell the system to interpret pre-existing content in some interesting ways that allows the system to support more intelligent list behavior:

Drawing a "left square bracket" gesture ([) to the left of a region of ink indicates that that ink should be interpreted as a list of lines, such as a to-do list. Such an "interpreted" list should appear then appear differently, and should acquire additional behaviors related to lists. For example, imagine that you have a shopping list written in your journal. You want the system to treat these lines of digital ink as items in a list, so you draw the square bracket gesture to the side. This causes the list to change appearance to indicate that it is now being treated as a list, not just "dumb" digital ink (this might be done by a special border around the list, or shading its background, or some other graphical effect). Once the region has been interpreted as a list, new gestures become available that let you do list-like things on it. These should include:

Handling the Content Interpretation

There are several things you need to do in order to complete this assignment.

First, you need to be able to detect the gesture input. Just as in the previous assignment, we're using a mode to distinguish ink (left mouse button) input versus gesture input (right mouse button). Gesture input should be drawn on screen while the gesture is being made, so that it provides feedback to the user. The gesture should disappear once the mouse is released. The basic interaction pattern with these new gestures should thus be the same as in the previous assignment, and so this should be relatively straightforward.

Second, once you recognize the bracket gesture you then need to determine what region of the inked content the list behavior should be applied to. You don't have to be fancy with this: assume that the user will draw the bracket to the left of the content to be "listified;" the bracket establishes the top and bottom (Y axis) bounds of the list, as well as the left (X axis) bounds of the list. To get the right (X axis) bound, simply go until you encounter a band of whitespace that runs the vertical dimensions of your region. You can figure this out by examining the bounding boxes of everything contained within the vertical region marked out by the bracket. Once you have the bounds of the contained ink, you can render it differently to indicate that it is now being interpreted as a list.

NOTE: You *don't* have to worry about what happens if the bracket's bounds cut through one or more strokes--in other words, you don't have to be concerned with splitting a single stroke, shape, or block of text into "listified" and "non-listified" pieces. We won't test for this.

Third, you'll need to examine the ink in this region to break it out into individual list items. This is called "segmenting," and is a common need in dealing with lots of digital ink-type applications. The tricky part is that, while ideally, users would write in clean, evenly-spaced lines (so that you could just search for spans of horizontal whitespace that delineate lines), in actuality people write much more sloppily. Hand-drawn ascenders and descenders, for example, may mean that there is no clean whitespace between items.

Note that this is the most complex part of this assignment, so make sure you give yourself plenty of time to work on it.

Below, I'm presenting a multi-stage algorithm for dealing with this problem that should work pretty well. You're free to use this, or another algorithm of your own devising. To start to segment the ink into line items, first walk across each horizontal row of pixels in the contained area, counting ink pixels versus whitespace pixels, and then say that rows with ink pixels below some threshold are part of the gaps between lines. Rather than having an absolute threshold in terms of numbers of pixels--which won't cope well with lines of varying length--you'll likely want to have some sort of ratio threshold (perhaps: "all rows with only 10% of the inked pixels of the most row line are considered whitespace," although you'll want to play around with this).

How do you count the pixels? You can render the region to an off-screen image, then simply iterate over it examining color values of the pixels. There are lots of tutorials on the web about how to render swing components to offscreen images, such as here).

You can also take advantage of the fact that horizontal rows will tend to be in clusters of either items or gaps. In other words, you're probably not going to alternate item/gap/item/gap in every row of pixels. So you could say you're in a gap region only if you see multiple consecutive rows that are below threshold, and likewise, that you're in an item only if you see multiple consecutive rows that are above the threshold.

At this point, you should have created some list-like data structure that contains the coordinate ranges for each item and each gap. Note, however, that this is still just a rough cut at plausible items and gaps in your list. We'll call these "tentative items" because there still may be errors here (for example, thin ascenders and descenders that pass through a gap region may confuse the algorithm). While you could stop here, and this would likely work to some degree, there's a way to make it much more robust. So for the next phase of the algorithm you're going to work from the assumption that every shape is contained in exactly one item of the list, and each item in the list has at least one shape inside of it.

To implement this phase of the segmenting algorithm, you'll walk through all of the shapes contained within your bounded list region. Figure out which of your tentative list items (rows) the majority portion of the shape lies in. An easy way to do this is to just see what percentage of the shape's bounding box lies in each row, and then assign it to the row that contains the maximum percentage. This has the effect of dealing with weird problems that may occur if a shape has an ascender or descender that sticks into a gap region a bit, since you're analyzing items based on whole shapes. Finally, once you've assigned all shapes to list items, you can walk down your list of items and gaps, and merge any items without any shape into white space.

Finally, you need to actually implement the list item movement and deletion behavior. Once you've got the segmenting algorithm above working, this part should be pretty easy. The output of the segmenting algorithm will be a data structure that indicates which sets of display list shapes are in which list items, and from here, it's pretty easy to do the rest of the list manipulations: recognize the up and down caret, and the deletion gestures. When a gesture happens over a list item, you implement the recognized action by simply translating the bounds of the inked shapes in that list item (to move up or down), or take the ink out of the display list (to delete).

Extra Credit

As usual, there are a lot of ways you might make this assignment much fancier than described:

Deliverable

See here for instructions on how to submit your homework. These instructions will be the same for each assignment.