As machine-learning algorithms become more prominent in everyday life – from helping judges with courtroom decisions to determining who will get a bank loan – the importance of making sure that they are are as transparent and as fair as possible is critical.
Helping to make transparency and fair easier to achieve for developers, two Georgia Tech Ph.D. students in the School of Computer Science are pushing forward on a tool that reduce bias that can be created in low-dimensional representations of large datasets. Uthaipon (Tao) Tantipongpipat and Samira Samadi, recently published a new paper that details how large data sets can be compressed, which saves time and computing resources, while preserving essential traits for all groups identified through the analysis.
Computer Science Ph.D. student Tao Tantipongpipat.
Samadi and Tantipongpipat’s previous work uses principal component analysis (PCA), a dimension reduction technique that has been the gold standard for analyzing large data sets more efficiently. Their own version, Fair-PCA, uses the strength of PCA and retains more information so that algorithms can, in theory, have better data for decision-making.
In their latest work, the duo is optimizing Fair-dimensionality reduction, allowing populations to be more accurately represented when not only using PCA, but a wider class of dimension reduction techniques.
The updated algorithm incorporates multiple equity measurements for populations – i.e. with respect to social and economic welfare – and takes into account multiple demographical attributes. For example, gender is usually analyzed as male and female, but this leaves transgender people and other non-binary people out of an algorithm’s calculations leading to unfair or biased assessments.
This new work is designed to allow machine learning researchers to analyze complex data sets more accurately, potentially leading to less bias.
"I feel like if fairness and bias are not being taken seriously into account at this point, then our problems are only going to compound. Machine learning algorithms are dominating our lives every day and they learn to behave based on previous outcomes. If we just let this build up and if we don't take care of it now, it will have a huge impact, one that may not be as positive as we had hoped,” said Samadi.
The team will present Multi-Criteria Dimensionality Reduction with Applications to Fairness in December at the 33rd Annual Conference on Neural Information Processing Systems (NeurIPS) 2019 in Vancouver, British Columbia.