SANGEETHA SESHADRI

Contact: College of Computing, 801 Atlantic Drive
KACB 3201
Atlanta, GA-30332
Work Phone: 1-817-287-1299
Homepage: http://www.cc.gatech.edu/~sangeeta
email:

RESEARCH INTERESTS

Storage systems, distributed middleware overlay systems, distributed data stream systems, especially techniques and architectures for improving availability, performance and scalability of these systems.

EDUCATION

PhD in Computer Science
College of Computing, Georgia Tech (GPA: 3.94/4.00)

(expected graduation: Jan 2009)
Fall 2004 onwards, Atlanta, GA

M.Sc (Hons) Mathematics

Birla Institute of Technology and Science (BITS) (GPA:3.61/4.00)

2002
1997-2002, Pilani, India

Bachelor of Engineering (Honors) in Computer Science

Birla Institute of Technology and Science (BITS) (GPA:3.61/4.00)

2002
1997-2002, Pilani, India


RESEARCH AND PROFESSIONAL EXPERIENCE

Research Assistant, College of Computing, Georgia Tech

Atlanta, GA

Architectures and techniques for improving availability of large scale storage systems and services.

Optimization techniques for large scale distributed stream query processing services.

Information retrieval based routing techniques in peer-to-peer systems for searching the deep web.

 

2005-2008


2005-2007


2004-2005

Summer Intern, IBM Almaden Research Center

San Jose, CA

Architectures for flexible and scalable recovery and state restoration in storage software.

Enhancing storage system availability on multi-core platforms through recovery conscious scheduling.

Improving availability and reliability in scale-out storage systems using hierarchical architectures.

Analyzed clustering models for high-availability in scale-out storage systems.

 

Summer 2008


Summer 2007


Summer 2006


Summer 2005


Senior Applications Engineer, Oracle Corporation, IDC

Hyderabad, India

Member of the Oracle Exchange development team, responsible for online Catalog, Spot Purchase and XML transactions modules of this B2B product.

 

2002-2004

Intern, Motorola Inc.

Bangalore, India

Involved in design and coding of Resource parser for ENTITE (ENhanced Tool for Integrated Test Execution).

Jan-Jun 2002

JOURNAL PUBLICATIONS

[1] Sangeetha Seshadri, Vibhore Kumar, Brian F. Cooper and Ling Liu. A Distributed Stream Query Optimization Framework Through Integrated Planning and Deployment. To appear in the IEEE Trans. On Parallel and Distributed Systems (TPDS).

[2] Sangeetha Seshadri, Ling Liu and Lawrence Chiu. Recovery scopes, recovery groups, and fine-grained recovery in enterprise storage controllers with multi-core processors. To appear in the IBM Systems Journal.

[3] Sangeetha Seshadri, Brian F. Cooper. Routing Queries through a Peer-to-Peer InfoBeacons Network Using Information Retrieval Techniques. In IEEE Trans. On Parallel and Distributed Systems (TPDS). 18(12): 1754-1765, 2007.

CONFERENCE PUBLICATIONS

[4] Sangeetha Seshadri, Lawrence Chiu and Ling Liu. A Systematic Approach to System State Restoration during Storage Controller Micro-Recovery. To appear in FAST 2009.

[5] Sangeetha Seshadri, Bhuvan Bamba, Brian F. Cooper, Vibhore Kumar, Ling Liu, Karsten Schwan and Gong Zhang (in alphabetical order), "Grouping Distributed Stream Query Services by Operator Similarity and Network Locality", To Appear In the Proceedings of IEEE Congress on Services 2008 (SCC 2008), Honolulu, Hawaii, USA, July 2008.

[6] Bhuvan Bamba, Sangeetha Seshadri and Ling Liu "Scaling Location-based Services with Dynamically Composed Location Index", To Appear In the Proceedings of IEEE International Conference on Services Computing (SCC 2008), Honolulu, Hawaii, USA, July 2008.

[7] Sangeetha Seshadri, Lawrence Chiu, Cornel Constantinescu, Subhashini Balachandran, Clem Dickey and Ling Liu. Enhancing Storage System Availability on Multi-Core Architectures with Recovery-Conscious Scheduling. To Appear in the 6th USENIX Conference on File and Storage Technologies (FAST'08).

[8] Sangeetha Seshadri, Lawrence Chiu, Karan Gupta, Paul Muench, Ling Liu and Brian F. Cooper. A Fault-Tolerant Middleware Architecture for High-Availability Storage Services. IEEE International Conference on Services Computing (SCC) 2007.

[9] Sangeetha Seshadri, Vibhore Kumar, Brian F. Cooper and Ling Liu. Optimizing Multiple Queries in Distributed Stream Systems Using Hierarchical Network Partitions. IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2007.

TECHNICAL REPORTS, WORKSHOP PAPERS AND DEMOS

[10] Sangeetha Seshadri, Vibhore Kumar and Brian F. Cooper. Optimizing Multiple Queries in Distributed Data Stream Systems. 2nd IEEE International Workshop on Networking Meets Database (NetDB), in conjunction with ICDE 2006.

[11] Vibhore Kumar, Brian F. Cooper, Greg Eisenhauer, Srihari Govindharaj, Chaitanya Karlekar, Mohamed Mansour, Karsten Schwan, Sangeetha Seshadri, Balasubramanian Seshasayee. Policy-Driven Autonomic Management in Enterprise-Scale Information Flows. 4th IEEE International Conference on Autonomic Computing ICAC-2007. (Demo)

[12] Sangeetha Seshadri, Brian F. Cooper, Ling Liu. CubeCache: Efficient and Scalable Processing of OLAP Aggregation Queries in a Peer-to-Peer Network. CERCS Technical Report. GIT-CERCS-07-12, 2005.

[13] Sangeetha Seshadri, Paul Muench, Lawrence Chiu and Karan Gupta. Cluster Models – An analysis of availability, complexity and scalability Trade-offs. Intern Report, IBM Almaden Research Center, August 2005.

UNDER SUBMISSION

[14] Sangeetha Seshadri, Brian Cooper, Vibhore Kumar, Ling Liu and Karsten Schwan. Scaling Distributed Stream Query Services with STREAMREUSE. In preparation.

[15] Gong Zhang, Ling Liu, Sangeetha Seshadri. Scaling Reliable Location-based Overlay Service. Under submission.

[16] Sangeetha Seshadri, Lawrence Chiu, Cornel Constantinescu, Subashini Balachandran, Clem Dickey, Ling Liu and Paul Muench. A Recovery-Conscious Framework for Fault Resilient Storage Systems. In preparation.

PATENTS

[17] "Improving Performance for Read Requests using Race-based Reads, Inverted Read Paths and Adaptive Read Request Routing." V. Hsu, S.Seshadri, L. Chiu. Disclosure submitted June 2008. Patent in progress.

[18] “Log(lock) Architecture for System State Restore during Thread-level Micro-Recovery”. S.Seshadri and L.Chiu. Disclosure submitted August 2008. Patent in progress.

TALKS

RESEARCH PROJECTS

Architectures and Techniques to Improve Availability in Large Scale Storage Systems (Aug 2005 onwards)

Enterprise storage systems are the foundations of most data centers today and extremely high availability is expected as a basic requirement from these systems. With rapid and exponential growth of digital information and the increasing popularity of multi-core architectures, the demand for large scale storage systems of extremely high availability (moving close to 7 nines) continues to grow. On the other hand, embedded storage software systems (controllers) are becoming much more complex and difficult to test especially given concurrent development and quality assurance processes and the fact that legacy systems are being adapted to newer hardware. With software failures and bugs becoming an accepted fact, focusing on recovery and reducing time to recovery has become essential in many modern storage systems today. In current system architectures, even with redundant controllers, most microcode failures trigger system-wide recovery causing the system to lose availability for at least a few seconds, and then wait for higher layers to redrive the operation. This unavailability is visible to customers as service outage and will only increase as the platform continues to grow using the legacy architecture. How can we improve the availability of a highly concurrent storage controller and scale the recovery process without re-architecting legacy code?

Optimization Techniques for Large Scale Distributed Stream Query Processing Services (Aug 2005 onwards)

This project addresses the problem of optimizing multiple distributed stream queries that are executing i simultaneously in distributed data stream systems. A static query optimization approach of "plan, then deployment" is inadequate for handling distributed queries involving multiple streams and node dynamics faced in distributed data stream systems and applications. Thus, the selection of an optimal execution plan in such dynamic and networked computing systems must consider operator ordering, reuse, network placement, and search space reduction. How do we quickly choose efficient plans from the large space of possibilities while taking into consideration both network and processing costs?
Next, it is observed that stream queries are typically processed by a selection of collaborative nodes and often share similar stream filters (such as stream selection or stream projection filters). The ability to reuse existing operators during query deployment, especially for long running queries, is critical to the performance and scalability of a distributed stream query processing service. Concretely, we argue that by taking advantage of opportunities to reuse the same distributed operators for multiple and different concurrent queries and intelligently consolidate operator computation across multiple queries, we can reduce the cost of query deployment and minimize duplicated in-network processing. The technical challenges of reuse in streaming systems include dealing with large and time-varying workloads, dynamically exploiting similarities between queries and the runtime application of network knowledge. We believe that an effective reuse approach to providing high performance and high scalability for distributed stream query services should embody both network locality awareness and operator semantic awareness of stream queries in reuse decisions.

InfoBeacons: Guiding Users to Internet Information Sources (Aug 2004 - Aug 2005 )

The Internet provides a wealth of useful information in a vast number of dynamic information sources, but it is difficult to determine which sources are useful for a given query. Most existing techniques either require explicit source cooperation (for example, by exporting data summaries), or build a relatively static source characterization (for example, by assigning a topic to the source). We present a system, called InfoBeacons that takes a different approach: data and sources are left 'as is', and a peer-to-peer network of beacons uses past query results to guide queries to sources, who do the actual query processing. This approach has several advantages, including requiring minimal changes to sources, tolerance of dynamism and heterogeneity, and the ability to scale to large numbers of sources. We present the architecture of the system, and discuss the advantages of our design. We then focus on how a beacon can choose good sources for a query despite the loose coupling of beacons to sources. Beacons cache responses to previous queries and adapt the cache to changes at the source. The cache is then used to select good sources for future queries. We discuss results from a detailed experimental study using our beacon prototype which demonstrates that our loosely coupled approach is effective; a beacon only has to contact sixty percent or less of the sources contacted by existing, tightly coupled approaches, while providing results of equivalent or better relevance to queries.


Clustering Models for High-Availability in Scale-out Storage Systems (May 2005 - Aug 2005)

Worked on the design and analysis of clustering models for high-availability in scale-out storage systems. Analyzed flat and hierarchical clustering models focusing on system availability, complexity and scalability. Clustering is an obvious solution for maximizing availability and performance and balancing load in storage systems. Most often, high-availability and reliability is achieved through redundancy at software, hardware and network levels. However, as system components increase only linearly, the system.s state space increases exponentially, thereby making it very difficult to test and verify the system. Potentially, these systems are more prone to failures arising from software defects that escape undetected during testing and are also difficult to reproduce. In this project we investigate the trade-offs between system complexity and availability from the perspective of clustered storage systems. We evaluate designs and policies that help us maximize availability while minimizing the number of system-states.


CubeCache: Semantic Caching in Peer-to-Peer OLAP Networks (Aug 2004 onwards)

Peer-to-peer systems are an area of much recent activity due to their cost-effectiveness, scalability and ability to distribute the overhead of sharing and storing data and performing computations. These characteristics make such systems ideal for OLAP query processing. In this project, we describe the framework for a system that utilizes a peer-to-peer network to efficiently process OLAP queries. Specifically, we consider the problem of searching for data in such a network and present techniques to locate data and perform query processing with load balancing.

PROFESSIONAL ACTIVITIES/AWARDS

- IBM Ph.D Scholarship 2007-2008
- Recipient of the K.C.Mahindra scholarship for doctoral studies
- Stood 3rd in the class of M.Sc Mathematics
- Awarded University Scholarship for all semesters of study
- Coordinator, Department of Mathematics, Apogee 2000
- External Reviewer (VLDB 2006; ICDE 2006; SIGMOD 2006, 2007; MobiQuitous 2007; IBM Systems Journal 2007; ACM TOIT 2008; PPNA 2008; ICDCS 2008;)

REFERENCES

Available on request.