Research 

 

      Introduction

         Representation

Similarity Metric

Clustering

Typical Class Member

Activity Classification

Anomaly Detection

Anomaly Explanation

Experiments

 
 

 

On Typicality of A Class Member

Having clustered a given set of activities, we are now looking for a way of representing the characteristics of a particular cluster in terms of some general, tractable and concise form. This idea is similar to that of having a mental model for a particular class which essentially describes the general characteristics of the class in an optimal way. Such a class model is required to: a- classify some new instance of information as to which class does it belong to and, b- compare and analyze the new class member in terms of how similar/dissimilar it is with respect to the "general characteristics" of the class, so that decision about the normality of a new class member could be made.

There are many ways to approach the problem of creating a representative model of a class. One could assume that the process which creates class instances to be stochastic, and then try to learn the particular distribution which dictates that underlying process. This approach has two main difficulties - firstly, it requires one to assume an analytical form of the underlying process (e.g. Gaussian, Mixture of Gaussians, etc.) which is not always known. Secondly, the number of class instances needed to learn a general distribution is far more than the number of samples that we have at hand.

Therefore, we resort to other methods of finding a general representation of a cluster. More specifically we choose the idea of instance based approach where we formulate the problem as finding the node which is the "best representative" of the rest of the cluster nodes - essentially converting the problem of learning, into one of search. From now on we will call the best representative node of a cluster as the "Typical Node".

But before we could move on with this approach, we need to define precisely what we mean by being a typical member of a cluster. Essentially, we believe that the question of typicality is closely related to the idea of how similar a node is to the other members of the cluster. There are many ways in which this idea of the similarity of a node with respect to other nodes could be exploited to find the typical node. The classic graph theoretic literature provides a potential answer to this problem in terms of finding the "centeroid" of the cluster, i.e. finding the node which minimizes the maximum distance (inverse of similarity) between the rest of the nodes and itself (also known as the min max algorithm). While this method is theoretically sound, it is prune to noisy clusters and would work well only in cases where the clusters are well-behaved.

Another method proposed in graph theory for such a problem relates to finding the maximum in-degree of every node of the cluster, labeling the node with maximum in-degree as the typical node. We could consider the number of nodes maximally close to a particular node as it in-degree, transforming our undirected graph into a directed one. Stated otherwise, the idea is to consider that node as the typical member of the cluster, to which most nodes are maximally similar. Indeed, the approach of labeling a node as typical or not, based on its in-degree usually works very well. It still however retains some major problems in terms being completely agnostic about the more global structure of the cluster. More specifically, due to the maximization operation which we have to do to transform of our undirected graph to a directed one, we are forced to look at a very local view of our landscape which of course could lead to problems.

The idea of fining the best representative member of a cluster has been studied in some other fields such as Computer Networks, where the problem is very similar to finding the web-page which best represents a collection of web-pages (see e.g. [17]).  Along the lines of [17],we propose the idea of Typical nodes (mentioned as "authoritative sources" in [17] )  and "Similar to Typical (STT)" nodes (mentioned as "hubs" in [17]). Typical and  STT nodes exhibit a mutually reinforcing relationship - a good STT node is one which is closer to a Typical node, while a Typical node is one closer to more STT nodes. Like [17], we associate a non-negative Typicality weight (xp) and a non-negative STT weight (yp) to each node in the cluster where p denotes the index of nodes in a cluster. Naturally, if p is closer to many nodes with large x values, it should receive a large y value. On the other hand if p is closer to nodes with large y values, it should receive large x value. We define two coupled processes to update the weights xp and yp iteratively, i.e.

As we iterate the above two equations k times in the limit k–›∞, xp and yp converge to x* and y*. The node which has the largest component in the converged vector x* would correspond to the node which has the greatest Typical weight and hence is the best representative of the nodes of clusters. It has been proven in [17] that x* can be computed from the Eigen Analysis of the matrix ATA where A is the symmetric similarity matrix of all the nodes of the cluster. Essentially x* is the principal eigenvector (the one with greatest corresponding eigen value) of ATA, the largest component of which corresponds to the Typical Node of the cluster (for the proof of this result, please consult to [17]).

Essentially the above mentioned algorithm (a.k.a. hits algorithm) finds the node which has the greatest sum of similarity to the rest of the nodes. Notice that it does not consider the "number" of nodes a particular node is close to. The result is that a node could be labeled as the Typical node even if it is very close to only a few of the nodes. Such a result would be ineffectual in cases of 1/0 graphs (as in the case of Computer Networks) since the total weight would in itself incorporates the number of nodes close to a particular node. Since ours is the case of (continuous) weighted graphs, we need to modify the hits algorithm to incorporate the effect of number of nodes close to a particular node.

A simple modification that we could make to convert our continuous weighted graph to a 1/0 graph is to threshold the weights, i.e. consider all the weights above a particular value to be 1 and those less than that threshold as zero. A better solution (in terms of robustness) would be an approach similar to what is known as 'robust estimation', where the weights are transferred according to an exponentially decreasing function of the number of nodes to which a particular node is minimally close to.

Recap: In this section we have introduced the notion of typical node of a cluster, and considered some methods from graph theory to determine the typical node. Finally, we have proposed to adapt a method used in solving a similar problem in Computer Networks as a potential solution of our problem of finding the Typicality of a node.

 

[17] J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997.