CS6220 Big
Data Systems and Analytics
(Fall
Semester Annually)
| General | Description | Grading | Important Dates | Project | Technology Review | Course Readings | FAQ | GT Calendar |
General
Information
Lecture: Check on Canvas for each year
Office hours: Check on Canvas
Course TA: Check on Canvas course website
Office hours: by appointment
Course Objectives
and Description
Data has been the No. 1 fast growing phenomenon on the Internet for the last decade. Big data systems demands both high performance computing and elastic utility driven computing. Big data analytics holds the potential to reveal deep insights such as social influence among customers by analyzing business transactions, user-generated feedback ratings, social and geographical data. In the past 40 years, data was primarily used to record and report business activities and scientific events, and in the next 40 years data will be used also to derive new insights, to influence business decisions and to accelerate scientific discovery. One of the key challenges is to provide the right platforms and tools to make reasoning of big data easy and efficient. Another key challenge is to learn and master different platforms, models and mechanisms for collecting, processing and analyzing the massive data. Big data education should cover big data systems, big data algorithms, big data programming, big data optimizations from both research and development perspectives.
This course reviews concepts, techniques, algorithms and systems issues in big data computing, with strong emphasis on systems and analytics, and explores big data opportunities from a variety of business, science and engineering applications, and examine various open issues and technical challenges that are critical for developing big data systems and analytic applications. Main topics to be covered include but not limited to: fundamentals of big data computing systems and software design, fundamentals of cluster computing, fundamentals of AI/ML models for single task learning, multi-task learning of one data modality and cross-modality learning, fundamentals of distributed learning and edge analytics. We will leverage big data applications to illustrate the new challenges in both ML/AI model learning and big data system development, such as healthcare, mobile commerce, social media, Internet of Things, cyber manufacturing, cyber-physical systems, to name a few. The course will include a significant project component that will typically require Python/Java/C++/C programming.
Prerequisites:
There is no hard requirements on prerequisites. Students are expected to have taken Operating Systems (CS2200 or equivalent)
and Introduction to database systems (CS 4400/6400 or equivalent). In addition,
students are expected to have a solid grasp of Java/C/C++/Python programming.
Sockets programming is not required but desirable.
A
detailed description of course structure and administration can be found in Canvas course website for each year
Grades will be computed using the
tentative weighting scheme below:
Weeks 2-3
Assignment 1: post on Week 2 Sunday, Due on Week3 (Friday midnight)
Project team formation due on Wiki signup: Week 3 Friday midnight
Weeks 4-5
Assignment 2: post on Week 4 Sunday, Due on Week 5 (Friday midnight)
Signup proposal team meeting with professor on Wiki for Monday of Sept 30.
Week 6
Project Proposal due: Week 6 (Friday midnight)
Weeks 7-9
Assignment 3: post on Week 7 Sunday, Due on Week 8 (Friday midnight). If fall break is on week 8, then the due date will be friday of Week 9.
Signup workshop presentation + project demo meeting with professor on Wiki (due Friday midnight on Week 8).
Meeting with instructor on project proposal feedback in Week 7 (signup schedule on course wiki on Canvas.
Weeks 10-11:
Assignment 4: post on Week 10 Sunday, Due on Week 11 (Friday midnight)
Weeks 13-16:
course project workshop presentations
Week 16:
Demo with Professor in the CERCS small conf room near Professor's office KACB 3340.
Final Exam:
Technology Review (Take home exam) due 5:30pm with graceful extension till midnight of the final exam day, which is posted on Canvas or Georgia Tech final exam schedule page.
There are a total of 4 homework assignments and on average one assignment every 2 weeks. Usually each assignment is posted 2-3 weeks before its due date. Each assignment requires a student to choose from two types of assignments: reading based or programing based.
Technology review is used as the take home final exam for the course. Topics will come from weekly lectures, class discussions, guest presentations as well as homework assignments. You are required write a technology review of 10-15 pages in single column and 1.2-1.5pt spacing, including figures and references. This technology review paper is due by 11:55pm on the final exam day.
In principle, you can propose anything you wish: algorithms, implementation, benchmarking, evaluation, interesting Big data applications, to name a few. For the students who are currently working part time in companies, it is possible to propose a work related project. However, all course project related material must be non-proprietary, i.e., I will not sign any non-disclosure agreement just to evaluate a project. Students are encouraged to come up with your own project ideas. You are encouraged to discuss your project ideas with instructor.
Important Dates: The important due dates for project proposal, project demo, project presentation and project code and documentation deliverable can be found from Canvas Course Wiki page.
Useful
References and Texts
To be posted under the course area on Canvas.gatech.edu.