Welcome to Semih Sahin's Homepage
@ Georgia Institute of Technology
...
Semih Sahin, Ph.D
Graduate Research Assistant
Computer Science
College of Computing
Georgia Institute of Technology,
Atlanta, Georgia, USA

Office:   KACB-3319
Phone:  +1 470 257 0517
Email:   ssahin7 at gatech dot edu

About me:
I was born in 1990, in Eskişehir, Turkey. I spent my childhood there until i was admitted to Istanbul Fatih Science High School with a scholarship in 2004. I joined National Olympiad in Informatics contests in high school and won bronze medal in 2006. In 2008, I was admitted to Department of Computer Engineering in Bilkent University with full scholarship. In 2013, I received my B. S. degree with "The Best Theorist" award among senior students.

I completed my M.S degree in the same department working with Assoc. Prof. Buğra Gedik in 2015. I was working on Stream Processing/Distributed Systems. My thesis project is "C-Stream: A co-routine based auto-parallel stream processing engine".

Currently, I am 3rd year PhD student, working in systems area with Prof. Ling Liu. My current research focuses on memory optimizations for memory-intensive big data frameworks. My broader research interests include Stream Processing, Data Intensive Distributed Systems, Large Scale Machine Learning, Graph Processing, Cloud Computing, and Performance Evaluation.

In Summer 2016 and 2017, I was a Research Intern in IBM T.J Watson Research Center working with Dr. Carlos Costa, Dr. Bruce D'Amora and Dr. Yoonho Park.
CV/Bio:    You can access the full CV from here.
Education:
  • Georgia Institute of Technology, Atlanta, USA
    Ph.D, Computer Science, 2015-2020 (Expected)
    Advisor: Prof. Ling Liu
  • Bilkent University, Ankara, Turkey
    M.S., Computer Engineering, 2013-2015
    GPA: 3.73
    Advisor: Assoc. Prof. Bugra Gedik
  • Bilkent University, Ankara, Turkey
    B.S., Computer Engineering, 2008-2013
    GPA: 3.41
    Honor Degree
Test Scores:
  • Toefl iBT: 103, Reading: 28, Listening:26, Speaking:22, Writing:27
  • GRE: Verbal:144, Quantitative:170, Writing: 3.5
Honors & Awards:
  • IEEE 5th International Congress on Big Data Best Student Paper Award - 2016
  • RCN BD Fellowship
  • European Union Master’s Degree Fellowship
  • Bilkent University M.S. Degree Full Scholarship - 2013
  • Bilkent University Best Theorist Award - 2013
  • Bilkent University B.S. Degree Full Scholarship - 2008
  • National Olympiad in Informatics, Bronze Medal - 2006

Research Interests: Distributed Systems, Stream Processing, Parallel Computing, Social Network Analysis, Big Data Analysis, Large Scale Machine Learning, Operating Systems and Cloud Computing

Selected Projects:

Leveraging Idle Memory, and Fast RDMA transfer for Spark Shuffle
Shuffle phase in Spark consist of two parts: shuffle write, and shuffle read. In this project, we improve shuffle write performance of Spark applications by leveraging idle memory in the executors. Moreover, in order to improve shuffle read, we perform data distribution among executors over RDMA.

Stage Level Shuffle Aggregation in Spark
Every Spark job performs its computation on distinct set of RDD partitions. The shuffle data generated by map task is aggregated on the resulting RDD. In this project we enable Spark executors to perform stage level aggregation on the RDD partitions to be used for the same map operation. This design improves applications' performance by reducing shuffle data size to be distributed over network among executors.

RDD memory management and Optimization
In this project, we improve memory utilization of Spark's JVM executors, by leveraging explicit management of intermediate data (RDDs). First, we address memory balancing problems in executors by enabling them make use of shared off-heap memory on demand. Secondly, we allow Spark executors to use remote memory in enviroments where cluster nodes are not homogenous and memory disaggregation is available, over both rdma, and tcp/ip.

A co-routine based auto-parallel stream processing engine
In this research project, we developed co-routine based stream processing engine in C++, in which user-defined operators can be implemented together with built-in operators. The stream programming model naturally expose data and pipeline parallelism. In our project, we propose several scheduling algorithms to exploit data and pipeline parallelism more effectively. Furthermore, we are profiling operator behaviours during runtime and propose a method for auto-parallelism to exploit data parallelism.

Incremental k-truss decomposition
k-truss is a maximal connected subgraph such that each edge is supported by k-2 vertices. If a vertex v can make a triangle with and edge e, then we can say that vertex v supports edge e. There are two main parts of the research, which are truss decomposition and incremental k-truss decomposition. In the first part, we find a K value for each edge such that K indicates the maximal K-truss of an edge. In the second part, we proposed a method to update these values efficiently when an edge is removed or added.

Publications:
  • Semih Sahin and Ling Liu. "Supporting Disaggregated RDD Memory with DORMY", Under preparation.

  • Semih Sahin, Wenqi Cao and Ling Liu. "Performance Impact of SPARK RDD on Big Data Analytics", Under submission to Usenix ATC 2018.

  • Semih Sahin and Bugra Gedik. "C-Stream: A co-routine based auto-parallel stream processing engine." To appear in ACM TOPC 2018

  • Semih Sahin and Ling Liu. "Improving Spark Memory Management with FastRDD", Accepted in Research Track of 2018 Southern Data Science Conference

  • Wenqi Cao, Ling Liu, Calton Pu, Semih Sahin and Qi Zhang. "Efficient Host and Remote Memory Sharing with FastSwap", Under submission to Usenix ATC 2018.

  • Qi Zhang, Ling Liu, Calton Pu, Wenqi Cao, Semih Sahin. "Towards Demand Driven Memory Slicing Through Shared Memory Management", Under submission to IEEE ICDCS 2018.

  • Ling Liu, Yanzhao Wu, Wenqi Wei, Wenqi Cao, Semih Sahin, Qi Zhang. "Benchmarking Deep Learning Frameworks: Design Considerations, Metrics and Beyond", Under submission to IEEE 2018 ICDCS.

  • Semih Sahin, Wenqi Cao, Qi Zhang and Ling Liu. ``JVM Configuration Management and Its Performance Impact for Big Data Applications", Proceedings of IEEE 2016 Big Data Congress (BigData 2016).

  • Wenqi Cao, Semih Sahin, Ling Liu, Xianqiang Bao. ``Evaluation and Analysis of In-Memory Key-Value Systems", Proceedings of the 2016 IEEE Big Data Congress (BigData 2016).
Work Experience:

Research Intern
Teaching Assistant
Research Intern
Teaching Assistant
Teaching Assistant
Teaching Assistant
IBM T.J Watson Research Center
Georgia Tech CS6675 - Advanced Internet Application Development
IBM T.J Watson Research Center
Bilkent University CS 315 - Programming Languages
Bilkent University CS 315 - Programming Languages
Bilkent University CS 101 - Algorithms and Programming I
Summer 2017
Spring 2016
Summer 2016
Fall 2014
Fall 2013
Fall 2013
Contact:

Address:

Georgia Institute of Technology
Atlanta, Georgia, USA

Office:  3319, Klaus Advanced Computing Building (KACB)


Phone:

+1 470 257 0517


E-mail:

ssahin7 at gatech dot edu