GVU Technical Report Number:
GIT-GVU-96-10
Title:
Silk from a Sow's Ear: Extracting Usable Structures from the Web
Authors:
Peter Pirolli
James Pitkow
Ramana Rao
Abstract:
In its current implementation, the World-Wide Web lacks much of the explicit
structure and strong typing found in many closed hypertext systems. While
this property probably relates to the explosive acceptance of the Web, it
further complicates the already difficult problem of identifying usable
structures and aggregates in large hypertext collections. These reduced
structures, or localities, form the basis for simplifying visualizations of
and navigation through complex hypertext systems. Much of the previous
research into identifying aggregates utilize graph theoretic algorithms based
upon structural topology, i.e., the linkages between items. Other research
has focused on content analysis to form document collections. This paper
presents our exploration into techniques that utilize both the topology and
textual similarity between items as well as usage data collected by servers
and page meta-information lke title and size. Linear equations and spreading
activation models are employed to arrange Web pages based upon functional
categories, node types, and relevancy.
Keywords:
Information visualization, World Wide Web, hypertext
You can access this technical report via:
PDF
Postscript
 
|