New Machine Learning Algorithms Reduce Bias in Identifying Groups
A new approach is the first of its kind to address fairness in a popular graph analysis technique used by banks, social networks, and others to identify groups, according to Georgia Tech researchers.
The approach couples new machine learning (ML) algorithms with spectral clustering analysis – a common method for identifying groups by revealing often hidden connections between people in massive, multi-dimensional graphs. The researchers found that groups identified using the new method were as much as 34 percent more diverse than those identified using spectral clustering alone.
“Obviously, you want to figure out who the communities are, but you also want them to be diverse,” said School of Computer Science (SCS) Ph.D. student Samira Samadi and research team member.
As a starting point, the team defined diversity for the project as a representation of each demographic that is proportional between an identified group, also known as a cluster, and the original dataset. The researchers tested their algorithms on a natural variant of the stochastic block model, a famous random graph model used to study the performance of clustering algorithms. On this model, they proved the efficacy of the new algorithms to return diverse clusters in the data.
Beyond the theoretical, the researchers also tested the algorithms on empirical datasets. These tests confirmed the previous findings that the new approach can deliver more proportionally balanced clusters with minimal impact on interconnectivity or quality of the clusters.
“Designing fair clustering algorithms helps ML to draw a more diverse image of communities in a network,” Samadi said. “This not only leads to less representational bias toward specific demographics, but could also help marketers to maximize their full potential customer base.”
The researchers presented their work in the paper, Guarantees for Spectral Clustering with Fairness Constraints, at the International Conference on Machine Learning (ICML) in Long Beach, California, from June 9 to 15. Samadi co-wrote the paper with SCS Assistant Professor Jamie Morgenstern, Rutgers postdoctoral researcher Matthäus Kleindessner, and Rutgers Assistant Professor Pranjal Awasthi.