myphoto  
 
Fang Zheng
Ph.D. candidate
College of Computing
Georgia Institute of Technology
Address: Klaus Advanced Computing Building Room 1210
Georgia Tech, 266 Ferst Dr, Atlanta, GA 30332-0765
Phone: (404)385-1406 (office)
E-Mail: fzheng at cc dot gatech dot edu

[ Biography ] [ Research ] [ Publications ] [ Courses ] [ Curriculum Vitae ] [ Links ]

 
Biography
I am a Ph.D. student in College of Computing at Georgia Tech since Fall 2007. My advisor is Prof. Karsten Schwan. I am a research assistant in the Center for Experimental Research in Computer Science (CERCS). I received my B.E. and M.E. degrees in Computer Science from Xi'an Jiaotong University, China, in 2004 and 2007, respectively.
 
Research
I am interested in High Performance Computing and Distributed Systems, with focus on scalable I/O and data management for large scale scientific applications.

I am involved in the following projects.

PreDatA Project
PreDatA, short for Preparatory Data Analytics, is an approach for preparing and characterizing data while it is being produced by the large scale simulations running on peta-scale machines. By dedicating additional compute nodes on the peta-scale machine as staging nodes and staging simulation's output data through these nodes, PreDatA can exploit their computational power to perform selected data manipulations with lower latency than attainable by first moving data into file systems and storage. Such in-transit manipulations are supported by the PreDatA middleware through RDMA-based data movement to reduce write latency, application-specific operations on streaming data that are able to discover latent data characteristics, and appropriate data reorganization and metadata annotation to speed up subsequent data access. As a result, PreDatA enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and inspection, as well as for data exchange between concurrently running simulation models.

Performance evaluations with several production peta-scale applications on Oak Ridge National Lab's Leadership Computing Facility demonstrate the feasibility of the PreDatA approach, including its minimal impact on the total execution time of large-scale simulations. They also demonstrate advantages derived from using PreDatA, such as improved simulation time, timely insight into output data and improved read performance of output files.

 
ADIOS (Adaptable I/O System) Project
The Adaptable IO System provides a simple, flexible way for scientists to desribe the data in their code that may need to be written, read, or processed outside of the running simulation. By providing an external to the code XML file describing the various elements, their types, and how you wish to process them this run, the routines in the host code (either Fortran or C) can transparently change how they process the data. ADIOS uses a log-structured file format termed BP for data storage. This design, together with other techniques such as buffering and scheduling, helps ADIOS to achieve portable high performance I/O on start-of-the-art High End Computing platforms.

ADIOS has been adopted by several production parallel scientific codes. Performance evaluation on Oak Ridge National Lab's Leadership Computing Facility shows sustantial write and read performance advantages of ADIOS. ADIOS is implemented as a high-level I/O library and is distributed as open-source software.

ADIOS has been featured in HPCWire.

 
Publications (Google Scholar Profile)
  • Fang Zheng, Hongfeng Yu, Can Hantas, Matthew Wolf, Greg Eisenhauer, Karsten Schwan, Hasan Abbasi, Scott Klasky. "GoldRush: Resource Efficient In Situ Scientific Data Analytics Using Fine-Grained Interference Aware Execution". To appear in ACM/IEEE International Conference for High Performnance Computing, Networking, Storage and Analysis (SC 13). Denver, CO. November 2013.
  • Hongbo Zou, Magda Slawinska, Karsten Schwan, Matthew Wolf, Greg Eisenhauer, Fang Zheng, Jai Dayal, Jeremy Logan, Scott Klasky, Tanja Bode, Matthew Kinsey, Michael Clark. "FlexQuery: An Online In-situ Query System for Interactive Remote Visual Data Exploration at Large Scale". To appear in 2013 IEEE International Conference on Cluster Computing (Cluster 2013). Indiannapolis, IN. September 2013.
  • Fang Zheng, Chitra Venkatramani, Karsten Schwan, Rohit Wagle. "Cache Topology Aware Mapping of Stream Processing Applications onto CMPs". In Proc. of 33rd International Conference on Distributed Computing Systems (ICDCS 2013). Philadelphia, PA. July 2013.
  • Fang Zheng, Hongbo Zou, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Jai Dayal, Tuan-Anh Nguyen, Jianting Cao, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, Hongfeng Yu. "FlexIO: I/O Middleware for Location-Flexible Scientific Data Analytics". In Proc. of 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013). Boston, MA. May 2013.
  • Jai Dayal, Jianting Cao, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Fang Zheng, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, Jay Lofstead. "I/O Containers: Managing the Data Analytics and Visualization Pipelines of High End Codes". In Proc. of International Workshop on High Performance Data Intensive Computing (HPDIC 2013), held in conjunction with IPDPS 2013. Boston, MA. May 2013. (Best Paper Award).
  • Fang Zheng, Hongbo Zou, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Jai Dayal, Tuan-Anh Nguyen, Jianting Cao, Hasan Abbasi, Scott Klasky, Norbert Podhorszki, Hongfeng Yu. "FlexIO: Location-flexible Execution of In Situ Data Analytics for Large Scale Scientific Applications". Poster in 21st ACM International Symposium on High Performance Distributed Computing (HPDC 2012). Delft, The Netherlands. June 2012.
  • Hongbo Zou, Fang Zheng, Matthew Wolf, Greg Eisenhauer, Karsten Schwan, Hasan Abbasi, Qing Liu, Norbert Podhorszki, Scott Klasky. "Quality-Aware Data Management for Large Scale Scientific Applications". In Proc. of 3rd International Workshop on Petascale Data Analytics: Challenges and Opportunites (PDAC-12), held in conjunction with ACM/IEEE SC '12. Salt Lake City, UT. November 2012.
  • Fang Zheng, Hasan Abbasi, Jianting Cao, Jai Dayal, Karsten Schwan, Matthew Wolf, Scott Klasky, Norbert Podhorszki. "In-Situ I/O Processing: A Case for Location Flexibility". In Proc. of 6th Parallel Data Storage Workshop (PDSW-11), held in conjunction with ACM/IEEE SC '11. Seattle, WA. November 2011.
  • Fang Zheng, Jianting Cao, Jai Dayal, Greg Eisenhauer, Karsten Schwan, Matthew Wolf, Hasan Abbasi, Scott Klasky, Norbert Podhorszki. "High End Scientific Codes with Computational I/O Pipelines: Improving their End-to-End Performance". In Proc. of 2nd International Workshop on Petascale Data Analytics: Challenges and Opportunites (PDAC-11), held in conjunction with ACM/IEEE SC '11. Seattle, WA. November 2011.
  • Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldfield, Todd Kordenbrock, Karsten Schwan, Matthew Wolf. "Managing Variability in the IO Performance of Petascale Storage Systems". In Proc. of ACM/IEEE International Conference for High Performnance Computing, Networking, Storage and Analysis (SC 10). New Orleans, LA. November 2010.
  • Philippe Selo, Yoonho Park, Sujay Parekh, Chitra Venkatramani, Hari Pyla, Fang Zheng. "Adding Stream Processing System Flexibility to Exploit Low-overhead Communication Systems". In Proc. of 3rd Workshop on High Performance Computational Finance (WHPCF 2010), held in conjunction with SC '10. New Orleans, LA. November 2010.
  • Hasan Abbasi, Matthew Wolf, Greg Eisenhauer, Scott Klasky, Karsten Schwan, Fang Zheng. "DataStager: Scalable Data Staging Services for Petascale Applications", Cluster Computing Journal, Volume 13 Issue 3, September 2010. (extended journal version of HPDC 2009 paper).
  • Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay Lofstead, Qing Liu, Scott Klasky, Manish Parashar, Norbert Podhorszki, Karsten Schwan, and Matthew Wolf. "PreDatA - Preparatory Data Analytics on Peta-Scale Machines". In Proc. of 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010). Atlanta, GA. April 2010.
  • Hasan Abbasi, Jay Lofstead, Fang Zheng, Scott Klasky, Karsten Schwan, Matthew Wolf. "Extending I/O through High Performance Data Services". In Proc. of 2009 IEEE International Conference on Cluster Computing (Cluster 2009). New Orleans, LA. September 2009.
  • Jay Lofstead, Fang Zheng, Scott Klasky, Karsten Schwan. "Adaptable, Metadata Rich I/O Methods for Portable High Performance I/O". In Proc. of 23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2009). Rome, Italy. May 2009.
  • Hasan Abbasi, Matthew Wolf, Greg Eisenhauer, Scott Klasky, Karsten Schwan, Fang Zheng. "DataStager: Scalable Data Staging Services for Petascale Applications". In Proc. of 18th ACM International Symposium on High Performance Distributed Computing (HPDC 2009). Munich, Germany. June 2009.
  • Norbert Podhorszki, Scott Klasky, Qing Liu, Ciprian Docan, Manish Parashar, Hasan Abbasi, Jay Lofstead, Karsten Schwan, Matthew Wolf, Fang Zheng, Julian Cummings. "Plasma Fusion Code Coupling Using Scalable I/O Services and Scientific Workflows". In Proc. of 4th Workshop on Workflows in Support of Large-Scale Science (SC-WORKS 2009), held in conjunction with Supercomputing '09. Portland, OR. November 2009.
  • Jay Lofstead, Scott Klasky, Michael Booth, Hasan Abbasi, Fang Zheng, Matthew Wolf, Karsten Schwan. "Petascale IO Using The Adaptable IO System". 2009 Cray User's Group Meeting (CUG 2009). Atlanta, GA. May 2009.
  • Jay Lofstead, Fang Zheng, Scott Klasky, Karsten Schwan. "Input/Output APIs and Data Organization for High Performance Scientific Computing". In Proc. of 3rd Petascale Data Storage Workshop (PDSW 2008), held in conjunction with Supercomputing '08. Austin, TX. November 2008.
 
Courses
CS6210 Advanced Operating System
CS7210 Distributed Computing
CS6400 Database System Concepts&Design
CS6290 High Performance Computer Architecture
ISyE6500 Probabilistic Models and Their Applications
 
Links
ACM Digital Library
IEEE Xplore
HPCwire
 

Last Updated: Aug. 14th 2013