Quo Vadis..
Home
About Me
Research
Resume

Try these...
Saki and more
Captain Haddock's curses
 

My Stuff..
Blogger
Mathematica - Wolfram research



RESEARCH PROJECTS

Architectures and Techniques to Improve Availability in Large Scale Storage Systems (Aug 2005 onwards)

Enterprise storage systems are the foundations of most data centers today and extremely high availability is expected as a basic requirement from these systems. With rapid and exponential growth of digital information and the increasing popularity of multi-core architectures, the demand for large scale storage systems of extremely high availability (moving close to 7 nines) continues to grow. On the other hand, embedded storage software systems (controllers) are becoming much more complex and difficult to test especially given concurrent development and quality assurance processes and the fact that legacy systems are being adapted to newer hardware. With software failures and bugs becoming an accepted fact, focusing on recovery and reducing time to recovery has become essential in many modern storage systems today. In current system architectures, even with redundant controllers, most microcode failures trigger system-wide recovery causing the system to lose availability for at least a few seconds, and then wait for higher layers to redrive the operation. This unavailability is visible to customers as service outage and will only increase as the platform continues to grow using the legacy architecture. How can we improve the availability of a highly concurrent storage controller and scale the recovery process without re-architecting legacy code?

Optimization Techniques for Large Scale Distributed Stream Query Processing Services (Aug 2005 onwards)

This project addresses the problem of optimizing multiple distributed stream queries that are executing i simultaneously in distributed data stream systems. A static query optimization approach of "plan, then deployment" is inadequate for handling distributed queries involving multiple streams and node dynamics faced in distributed data stream systems and applications. Thus, the selection of an optimal execution plan in such dynamic and networked computing systems must consider operator ordering, reuse, network placement, and search space reduction. How do we quickly choose efficient plans from the large space of possibilities while taking into consideration both network and processing costs?
Next, it is observed that stream queries are typically processed by a selection of collaborative nodes and often share similar stream filters (such as stream selection or stream projection filters). The ability to reuse existing operators during query deployment, especially for long running queries, is critical to the performance and scalability of a distributed stream query processing service. Concretely, we argue that by taking advantage of opportunities to reuse the same distributed operators for multiple and different concurrent queries and intelligently consolidate operator computation across multiple queries, we can reduce the cost of query deployment and minimize duplicated in-network processing. The technical challenges of reuse in streaming systems include dealing with large and time-varying workloads, dynamically exploiting similarities between queries and the runtime application of network knowledge. We believe that an effective reuse approach to providing high performance and high scalability for distributed stream query services should embody both network locality awareness and operator semantic awareness of stream queries in reuse decisions.

InfoBeacons: Guiding Users to Internet Information Sources (Aug 2004 - Aug 2005 )

The Internet provides a wealth of useful information in a vast number of dynamic information sources, but it is difficult to determine which sources are useful for a given query. Most existing techniques either require explicit source cooperation (for example, by exporting data summaries), or build a relatively static source characterization (for example, by assigning a topic to the source). We present a system, called InfoBeacons that takes a different approach: data and sources are left 'as is', and a peer-to-peer network of beacons uses past query results to guide queries to sources, who do the actual query processing. This approach has several advantages, including requiring minimal changes to sources, tolerance of dynamism and heterogeneity, and the ability to scale to large numbers of sources. We present the architecture of the system, and discuss the advantages of our design. We then focus on how a beacon can choose good sources for a query despite the loose coupling of beacons to sources. Beacons cache responses to previous queries and adapt the cache to changes at the source. The cache is then used to select good sources for future queries. We discuss results from a detailed experimental study using our beacon prototype which demonstrates that our loosely coupled approach is effective; a beacon only has to contact sixty percent or less of the sources contacted by existing, tightly coupled approaches, while providing results of equivalent or better relevance to queries.


Clustering Models for High-Availability in Scale-out Storage Systems (May 2005 - Aug 2005)

Worked on the design and analysis of clustering models for high-availability in scale-out storage systems. Analyzed flat and hierarchical clustering models focusing on system availability, complexity and scalability. Clustering is an obvious solution for maximizing availability and performance and balancing load in storage systems. Most often, high-availability and reliability is achieved through redundancy at software, hardware and network levels. However, as system components increase only linearly, the system.s state space increases exponentially, thereby making it very difficult to test and verify the system. Potentially, these systems are more prone to failures arising from software defects that escape undetected during testing and are also difficult to reproduce. In this project we investigate the trade-offs between system complexity and availability from the perspective of clustered storage systems. We evaluate designs and policies that help us maximize availability while minimizing the number of system-states.


CubeCache: Semantic Caching in Peer-to-Peer OLAP Networks (Aug 2004 onwards)

Peer-to-peer systems are an area of much recent activity due to their cost-effectiveness, scalability and ability to distribute the overhead of sharing and storing data and performing computations. These characteristics make such systems ideal for OLAP query processing. In this project, we describe the framework for a system that utilizes a peer-to-peer network to efficiently process OLAP queries. Specifically, we consider the problem of searching for data in such a network and present techniques to locate data and perform query processing with load balancing.