Data science is the future, and it must be used responsibly. This was the message of Professor Jeannette Wing’s lecture, Data for Good: Data Science at Columbia. The Columbia University computer science professor delivered the Mary Jean Harrold Memorial Distinguished Lecture on Dec. 7.
Named after former School of Computer Science (SCS) Professor Mary Jean Harrold, the lecture honors women in computing who are changing the field.
“We started this as a memorial to Mary Jean to find a lecturer who would really capture her excellence for service and spirit,” SCS Professor Dana Randall said in her introduction. “Jeannette has had an incredibly distinguished career.”
Wing is the Avanessians Director of Columbia’s Data Science Institute (DSI). She has also served as corporate vice president of Microsoft Research, assistant director of the Computer and Information Science and Engineering Directorate at the National Science Foundation, and chair of the Carnegie Mellon University School of Computer Science. Her research spans a wide breadth of computer science—including programming languages, software engineering, security, and distributed systems—all areas that are vital to data science.
DSI is comprised of 300 affiliated faculty in 12 schools, touching degrees in cybersecurity, smart cities, media, and more. Its mission is to advance research, bring data science to all fields, and ensure data is being used ethically.
“We have a chance with data science to get it right, and I think it’s important to start teaching data ethics from day one,” Wing said.
Wing’s institute is part of an international focus on data science, and Georgia Tech is also contributing to the conversation. Randall co-leads the Institute for Data Engineering and Science (IDEaS) to connect industry, academia, and government to our data science expertise and work on big data problems together.
SCS Professor Ellen Zegura makes sure students incorporate responsible data science into their work. She has created the Computing for Good program to bring students into the communities they’re studying, works with Serve-Learn-Sustain to provide tools for using data in the classroom, co-leads a data summer program for college students from across the country, and is studying how to bring care to data science research.
Wing discussed how ethics must be applied to the entire data science process, as follows:
- Generating data. Everything from sensors to cameras generates data. It’s vital that researchers know companies and other third parties can find value in this data and use it to their advantage.
- Processing data. In this stage, data is wrangled, cleaned, compressed, encrypted, and managed to make it easier to use. Throughout this process, data must be kept secure.
- Interpreting data. Data is only useful when analyzed. Data visualization gives a narrative to what the data means, and the story researchers tell can alter how that data is used.
Wing believes it’s fundamental that academics embed ethics in all parts of the process.
“It is academia’s reasonability to understand the fundamentals of why the world works or why something we engineer works,” Wing said. “Industry doesn’t have time to step back and figure it out. If we don’t have the academics thinking beyond deep learning, then it’s not good for the entire community.”