CS6220 Big Data Systems and Analytics

(Fall 2020)

Welcome to cs6220. There will be no attendance requirement for this course in the fall 2020 semester due to the current context of COVID-19, which makes the campus environment unsafe and puts the health and lives of our students and their families are at risk. As instructor, I will make my best effort to provide you the experiences that are closer to the face-to-face learning. I am open to suggestions and discussions from students on Canvas.


| General | Description | Grading | Important Dates | Introduction | Schedule & Notes |
Project | Technology Review | Course Readings | FAQ | GT Calendar |



General Information

Instructor: Professor Ling Liu
Office: KACB 3340, Phone: 5-1139, Email: lingliu AT cc DOT gatech DOT edu
Lecture: 2pm - 3:15pm TR; Klaus 2443 (Aug. 17 - Dec. 10, Lectures ending on Nov. 24)
Office hours:TR 3:15pm - 4:15pm or by appointment

Course TA:  Stacey Truex, Wenqi Wei
Office hours: by appointment

Course Objectives and Description

Data has been the No. 1 fast growing phenomenon on the Internet for the last decade. Big data demands both high performance computing and elastic utility driven computing. Big data analytics holds the potential to reveal deep insights such as social influence among customers by analyzing business transactions, user-generated feedback ratings, social and geographical data. In the past 40 years, data was primarily used to record and report business activities and scientific events, and in the next 40 years data will be used also to derive new insights, to influence business decisions and to accelerate scientific discovery. One of the key challenges is to provide the right platforms and tools to make reasoning of big data easy and simple. Another key challenge is to revolutionize the ways of collecting, processing and analyzing the massive data that exceeds the processing capacity of existing computing systems. Big data education should cover big data systems, big data algorithms, big data technology, big data programming, big data applications from both research and development perspectives.

This course reviews concepts, techniques, algorithms and systems issues in big data education and research, with strong emphasis on systems and analytics, and explores big data opportunities from a variety of science and engineering applications, and examine various research problems and challenges that are critical for developing big data systems and big data applications. Main topics to be covered include but not limited to: fundamentals of data storage systems and optimizations, fundamentals of data mining and knowledge discovery, fundamentals of big data aware computing systems and software design, fundamentals of cluster computing and distributed file systems, fundamentals of geographically distributed data intensive systems. We will also cover big data applications that pose new challenges to big data systems and analytics, such as healthcare, mobile commerce, social media, Internet of Things, software defined computing, cyber manufacturing, cyber-physical systems, to name a few. This course is designed to provide the fundamental training for big data scientists from high performance big data computing systems, to big data applications and big data analysis and management algorithms, and to look beyond the present status of the Big Data and conjecture what possible future technologies and applications will evolve. The course will include a significant project component that will typically require Java/C++/CGI/HTML5 programming.

Prerequisites:
Students are expected to have taken Operating Systems (CS2200 or equivalent) and Introduction to database systems (CS 4400/6400 or equivalent). In addition, students are expected to have a solid grasp of Java/C/C++/CGI programming. Sockets programming is not required but desirable.

A detailed description of course structure and administration can be found in Course Introduction.


Grading

Grades will be computed using the tentative weighting scheme below:

The grading policy can be found in the Course Introduction and FAQ (Important! Read Me).


Important Dates


All due dates are on Friday midnight, with no penalty graceful extension to 9am Saturday.

Weeks 2-3
Assignment 1: post on Week 2 Sunday, Due on Week3 (Friday midnight)
Project team formation due on Wiki signup: Week 3 Friday midnight

Weeks 4-5
Assignment 2: post on Week 4 Sunday, Due on Week 5 (Friday midnight)
Signup proposal team meeting with professor on Wiki for Monday of Sept 30.

Week 6
Project Proposal due: Week 6 (Friday midnight)

Weeks 7-8
Assignment 3: post on Week 7 Sunday, Due on Week 8 (Friday midnight)
Signup workshop presentation + project demo meeting with professor on Wiki (due Friday midnight on Week 8).
Fall Recess week (Oct 14-15): no homework on Week 9

Weeks 10-11:
Assignment 4: post on Week 10 Sunday, Due onWeek 11 (Friday midnight)

Weeks 13-15:
course project workshop presentations

Week 15:
Demo with Professor in the CERCS small conf room near Professor's office KACB 3340.
Due to COVID-19, demo will be done remotely and more instruction in the first week of fall 2020 semester.

Final Exam:
Technology Review (Take home exam) due 5:30pm with graceful extension till midnight of the final exam day on Canvas.


Homework Assignments

There are a total of 4 homework assignments and on average one assignment every 2 weeks. Usually each assignment is posted 2-3 weeks before its due date. Each assignment requires a student to choose from two types of assignments: reading based or programing based.

Technology Review

Technology review is used as the take home final exam for the course. Topics will come from weekly lectures, class discussions, guest presentations as well as homework assignments. You are required write a technology review of 10-15 pages in single column and 1.2-1.5pt spacing, including figures and references. This technology review paper is due by 11:55pm on the final exam day.


Project

The course project is a team project with 3-4 members per project group. Individual project is approved case by case. The topic of the project has to be in the the Big Data systems or Big Data analytics areas. The projects should demonstrate some innovations in either project design or project implementation.

In principle, you can propose anything you wish: algorithms, implementation, benchmarking, evaluation, interesting Big data applications, to name a few. For the students who are currently working part time in companies, it is possible to propose a work related project. However, all course project related material must be non-proprietary, i.e., I will not sign any non-disclosure agreement just to evaluate a project. Students are encouraged to come up with your own project ideas. You are encouraged to discuss your project ideas with instructor.

Important Dates: The important due dates for project proposal, project demo, project presentation and project code and documentation deliverable can be found from Canvas Course Wiki page.

Useful References and Texts

To be posted under the course area on Canvas.gatech.edu.

 


Last updated on Aug. 8, 2018 by Ling Liu (lingliu AT ccgatechedu)