CS6220 Big Data Systems and Analytics
Lecture: 1:30pm - 2:45pm TR; Klaus 2447 (Aug. 20 - Dec. 13, Lectures ending on Dec. 4)
Office hours:TR 11-12noon or by appointment
Course TA: Mehmet Emre Gursoy, Stacey Truex
Office hours: by appointment
Data has been the No. 1 fast growing phenomenon on the Internet for the last decade. Big data demands both high performance computing and elastic utility driven computing. Big data analytics holds the potential to reveal deep insights such as social influence among customers by analyzing business transactions, user-generated feedback ratings, social and geographical data. In the past 40 years, data was primarily used to record and report business activities and scientific events, and in the next 40 years data will be used also to derive new insights, to influence business decisions and to accelerate scientific discovery. One of the key challenges is to provide the right platforms and tools to make reasoning of big data easy and simple. Another key challenge is to revolutionize the ways of collecting, processing and analyzing the massive data that exceeds the processing capacity of existing computing systems. Big data education should cover big data systems, big data algorithms, big data technology, big data programming, big data applications from both research and development perspectives.
This course reviews concepts, techniques, algorithms and systems issues in big data education and research, with strong emphasis on systems and analytics, and explores big data opportunities from a variety of science and engineering applications, and examine various research problems and challenges that are critical for developing big data systems and big data applications. Main topics to be covered include but not limited to: fundamentals of data storage systems and optimizations, fundamentals of data mining and knowledge discovery, fundamentals of big data aware computing systems and software design, fundamentals of cluster computing and distributed file systems, fundamentals of geographically distributed data intensive systems. We will also cover big data applications that pose new challenges to big data systems and analytics, such as healthcare, mobile commerce, social media, Internet of Things, software defined computing, cyber manufacturing, cyber-physical systems, to name a few. This course is designed to provide the fundamental training for big data scientists from high performance big data computing systems, to big data applications and big data analysis and management algorithms, and to look beyond the present status of the Big Data and conjecture what possible future technologies and applications will evolve. The course will include a significant project component that will typically require Java/C++/CGI/HTML5 programming.
Students are expected to have taken Operating Systems (CS2200 or equivalent) and Introduction to database systems (CS 4400/6400 or equivalent). In addition, students are expected to have a solid grasp of Java/C/C++/CGI programming. Sockets programming is not required but desirable.
A detailed description of course structure and administration can be found in Course Introduction.
Grades will be computed using the tentative weighting scheme below:
There are a total of 4 homework assignments and on average one assignment every 2 weeks. Usually each assignment is posted 2-3 weeks before its due date. Each assignment requires a student to choose from two types of assignments: reading based or programing based.
Technology review is used as the take home final exam for the course. Topics will come from weekly lectures, class discussions, guest presentations as well as homework assignments. You are required write a technology review of 10-15 pages in single column and 1.2-1.5pt spacing, including figures and references. This technology review paper is due by 11:55pm on the final exam day.
The course project is a team project with 3-4 members per project group. Individual project is approved case by case. The topic of the project has to be in the the Big Data systems or Big Data analytics areas. The projects should demonstrate some innovations in either project design or project implementation.
In principle, you can propose anything you wish: algorithms, implementation, benchmarking, evaluation, interesting Big data applications, to name a few. For the students who are currently working part time in companies, it is possible to propose a work related project. However, all course project related material must be non-proprietary, i.e., I will not sign any non-disclosure agreement just to evaluate a project. Students are encouraged to come up with your own project ideas. You are encouraged to discuss your project ideas with instructor.
Important Dates: The important due dates for project proposal, project demo, project presentation and project code and documentation deliverable can be found from the TSquare Wiki page.
Useful References and Texts
To be posted under the course area on TSquare.gatech.edu.