Painting the Data

Outside inquiries into his research were nothing new for John Stasko, but when he hit the pages of Fraud, his phone really started ringing.

In spring 2010, the bimonthly magazine—the periodical of record for those whose professional interests lie in preventing, detecting and investigating fraud—printed an article on using suspicious activity reports (SARs) to battle fraud. Figuring prominently in the article and its sidebar were Stasko and his information visualization tool, Jigsaw. That’s when the calls started coming, from law enforcement, bankers, insurance companies—you name it.

“This morning I got a call from NCIS,” says Stasko, professor in the School of Interactive Computing and director of the Information Interfaces Research Group. “The real NCIS.”


John Stasko

All the callers are interested in what Jigsaw has to offer: an interactive tool that can enable them to process and analyze large data sets. Not massive data sets, which fall more in the realm of visual analytics (more on which later), but data sets on the order of 5,000 to 10,000 documents, each of which contains several paragraphs. Just about the right size for a few years’ worth of sales reports and other financial documents for a large company. More than adequate for a murder investigation.

Indeed, Jigsaw has its genesis in law enforcement and intelligence, Stasko says, but recently other academic disciplines have started to see the value of “InfoVis” as it’s called, in making sense of the imposing mountains of data at their disposal.

“Say you’re a genomicist, and you study this one particular gene,” he says. “There are probably a gajillion medical papers out there where this gene is mentioned. You can’t keep up with everything.”

But what you can do is use a tool like Jigsaw, which processes the data and begins to connect the dots. First it identifies the entities involved—whether they’re genes, people, companies, locations, etc.—and then it really goes to work. What are the relationships? Who’s been talking to whom, and where were they talking? What was the context of these interactions? What other factors were common? And so on.

Jigsaw then displays the information with 10 different views, using a range of visualization tools. And now the user’s expertise comes back into play. A random person off the street may not be able to identify a person of interest from a Jigsaw display of drug-related arrests in a certain neighborhood of Atlanta. But a narcotics detective, who knows the area and is familiar with the major players, could use the tool to take enforcement to another level.

John Stasko

John Stasko has been invited by law enforcement to apply his Jigsaw tool for information visualization to actual criminal investigations. "[Recently] I got a call from NCIS," he says. "The real NCIS."

 

In fact, Stasko was invited in recent years by a jurisdiction in the Pacific Northwest to use Jigsaw to examine some cold-case evidence. The case, they told him, was one the local police would throw to new detectives to see what they came up with; typically, in a month or so, the newbies arrived at the same dead-end as the original investigation. Without telling him anything more, the police gave Stasko all the case documents (digitized for Jigsaw’s consumption, of course). Within a couple hours, he’d figured out what the crime was. A few hours later, he’d identified the three main hypotheses pursued by the original investigators. And then the next day, the same dead-end. He’d never spoken to anyone.

Now imagine that kind of a sleuthing on a truly massive scale, and you’ve got something approaching the field of data and visual analytics (DAVA). This emerging field, which combines techniques from information visualization and high-dimensional data analysis", is drawing big interest and pulling in large research dollars from agencies like the U.S. Department of Homeland Security, which hopes to use DAVA to process and act upon possible security threats.

Stasko, along with Professor Haesun Park of the School of Computational Science & Engineering (CSE), is helping to define the discipline through FODAVA (Foundations of Data and Visual Analytics), a joint NSF/DHS initiative to explore the field and its applications not just in homeland security, but in meteorology, bioinformatics, network security and other fields. FODAVA is a collaborative effort involving researchers from nearly a dozen universities, with Georgia Tech taking the lead.

“The two camps [of information visualization and computational data analysis] are sometimes in competition, and FODAVA puts them together in a rich combination,” Stasko says. “The CSE faculty are more versed in computational analysis, and FODAVA brings us together. It focuses more on the underlying mathematics of visualization.

“These days,” he quips, “I talk more to the CSE folks than I do to the [Interactive Computing] folks.”

“John’s expertise in information visualization has been invaluable to the FODAVA effort,” says Park. “Interactive visualization is a key component that distinguishes FODAVA from established areas such as data mining and machine learning. John’s work in FODAVA, through the visual analytics systems such as Jigsaw that he has developed and the FODAVA research test-bed system that is being developed, has contributed greatly to its success.”