A More Clever Query: Hongyuan Zha Tunes Up the Search Engine

Wednesday, January 5, 2011


Every day, hundreds of millions of people around the world use Internet search engines. As they type, the vast majority of those people are blissfully unaware of the sophisticated mathematics deployed by their fingers. Hongyuan Zha, on the other hand, has spent a sizable part of his career on it.

Zha, a professor in the School of Computational Science & Engineering, works in information retrieval and machine learning applications, trying to design algorithms that can process and learn from data faster and more efficiently. To that end, he looks at the world of Google, Bing and Yahoo! (not to mention predecessors like Inktomi, Lycos, WebCrawler, HotBot, etc.) as one big test track.

“Web search is a fertile ground for machine learning techniques,” Zha says. “It’s data on a large scale—a massive scale—and it’s extremely noisy data. It’s a very good real-life laboratory.”

Zha has been working with search engines since before he came to Georgia Tech nine years ago. From 1999 to 2001, while working for Inktomi Corp., he developed search ranking algorithms and document classification techniques (Inktomi eventually was acquired by Yahoo!).

Nowadays Zha investigates how to use social computing, particularly the collaborative filtering methods, to improve web searches. Not to be confused with social media, social computing refers to the simple intersection of human social behavior with computation. When viewed through the lens of web search, social computing can raise some interesting questions: Instead of simply using page content to recommend documents for search queries, what other data sources could be used? How about authors of documents? Or links among documents? Or perhaps author affiliations?

“Really, web search and recommendation systems are the same kind of thing. All these relationships among multiple types of entities tell you more about the connections among documents, queries, users and their activities,” Zha says. “This type of general methodology allows you to make connections by making use of other connections, and it’s useful for lots of applications beyond search.”

Modern web search heavily use massive amounts of user interaction data collected over  long periods of time. Begin typing anything into Google’s search, and immediately the engine will supply a number of possible terms to complete the query. This is generated by past searches that began with the same characters.

“How do you know what people are looking for? You can look at click patterns,” Zha says. “It’s a fruitful direction for search, to link contextual information to the query. You’re really examining user behavior.”

Internet user behavior is not something Zha might have imagined studying back in the early 1980s, when he was an undergraduate majoring in mathematics at Shanghai’s Fudan University. Back then he spent a lot of time on abstract algebra, but after graduating he moved more into the concrete world of computational mathematics. Soon he was off to Stanford, working toward the Ph.D. in scientific computing he earned in 1993.

“A math education doesn’t always necessarily have to lead to a math profession; it enables you to do lots of things,” he says of his career arc. “Machine learning is a very diverse field, with contributors coming in from all over the place. A lot of people are dealing with the massive data problem, so it’s a very active field simply because there are so many applications.”

Zha also contributes to the future of machine learning and computer science by directing the School of Computational Science & Engineering’s graduate programs, a task he likens to tinkering with an web search engine: It’s practical, complicated and definitely has an impact. CSE is one of the first schools of its kind; by raising it to the level of a school, Georgia Tech is taking a leadership role in defining the discipline of computational science and engineering for the rest of the world.

It also accepts the challenge of designing a rigorous, computation- based curriculum for students whose backgrounds are quite diverse. For every CSE grad student who majored in computer science and needs little introduction to computation, there’s another who majored in biology, or mechanical engineering, or some other non-computing field of the sciences and engineering. And though the School of CSE is housed in the College of Computing, its degrees are offered in partnership with a number of other schools and colleges across Georgia Tech.

“CSE is inherently interdisciplinary, combining principles from mathematics, science and engineering in addition to computing,” says Richard Fujimoto, Regents’ Professor and chair of the School of CSE. “The CSE graduate programs reflect this interdisciplinary nature and include eight schools across three colleges. Managing the CSE program is a challenging coordination problem.”

“As director of the CSE graduate programs,” he continues, “Professor Zha takes on this task with great enthusiasm, and gives much attention to defining and implementing well-defined processes to ensure that the participating units have real ownership of both the program and its students.”

“CSE, as a discipline, is fairly new, and a lot of people, including students, don’t know exactly what it is,” Zha says. “So we have a somewhat clean slate, and we have to design a program that can tailor to all these people’s needs. A typical computer science curriculum has fairly homogenous subjects. For us, it’s different—some things, our students have to learn on the fly. That can be extremely challenging.”