Traditional & algorithmic bias-fraught data a reality

Monday, 15 March 2021 | Karthik Venkatesh / Shefali Mehta

Traditional & algorithmic bias-fraught data a reality

Monday, 15 March 2021 | Karthik Venkatesh / Shefali Mehta

The offline biases that humans hold are codified into numbers. The figures often fail to capture the structural issues of the offline world into representative data points

We have heard that data, like photographs, can’t lie. However, these notions are changing in the modern world. Technology tools like Photoshop single-handedly disproved the statement regarding photographs and it is increasingly coming to the fore that data don’t always necessarily tell the whole truth. Lee Baker, statistician and author of the book ‘Truth, Lies and Statistics: How to lie with Statistics’ said: “Data doesn’t lie. People do. If your data is biased, it is because it has been sampled incorrectly or you asked the wrong question (whether deliberately or otherwise).” However, the concept of algorithmic bias is more nuanced and is gaining traction with the increase in automation and the use of Artificial Intelligence (AI) in virtually all fields. The issue of bias is not a new concept. The human race throughout its evolution has seen various biases affecting its trajectory and how we perceive ourselves and our surroundings. The biases that impact our society have also crept into the functioning of technology because, it is, after all, conceptualised by individuals having their own priorities and notions. In the context of data bias, we can analyse whether data are biased when they are a result of an insufficient or non-inclusive sample or the person interpreting the data has an impact on the outcome or its interpretation.

However, beyond this straightforward notion of bias, with the proliferation of “smart” or AI-driven technologies, the concept of algorithmic bias is gaining ground. Algorithms often display or function with the same biases as those held by the individuals who create them. This is a matter of concern as information and communications, along with the use of AI-based technology, have found their way into every facet of human life. Such biases can be harmful especially to marginalised communities and individuals, who have faced discrimination in other walks of life and are now at risk in digital environments also.

In the age of automated decision-making, the machine learning models are trained on data that are available to the systems. This is often the raw material that these machines “learn” from and that forms a world view for the machine, within its limited scope of functioning. This makes it all the more crucial that the training figures for these models are unbiased. But are they really? Many have held that the offline biases that humans hold are codified into numbers. The figures often fail to capture the structural issues of the offline world into representative data points. The Gender Shades study by Buolamwini and Gebru highlighted disparate performance in commercial facial recognition systems. This scenario highlighted how designers fail to account for and think outside of their world view and identity, namely skin colour or sex — which results in adverse externalities for communities that hold vulnerable identities. The lack of dark skinned faces and female representation led to inaccuracy in predictions for those who fell within these categories. Earlier this year, a PhD student in Canada brought attention to issues with the videoconferencing service Zoom’s algorithms pertaining to their virtual backgrounds and how it was causing the AI to “remove” the heads of African-American users while using virtual backgrounds. Zoom responded to these concerns and ensured that they were working to create a more inclusive platform for all. Inclusiveness in the phase of conceptualisation of such algorithms is a critical aspect.

Often the implications of algorithmic bias are more serious when biased datasets train algorithms for decision-making, particularly in areas such as crime, education and healthcare — the errors of exclusion are costly. There is a false equivalence between machine-generated results and on-ground reality which leads to denial of healthcare benefits, difficulty in finding jobs, profiling of a vulnerable group and so on.

In December 2019, a non-partisan entity of the US Federal Government, the National Institute of Standards and Technology (NIST), released a study that acknowledged the existence of demographic bias in facial recognition algorithms. The research, that was the first of its kind in studying the issue of “identification” over the issue of recognition, clearly showed that most facial recognition tracking (FRT) fared poorly in examining the faces of women, people of colour, the elderly and children. This is alarming as the use of FRT by law enforcement in the US and around the world, including in India, is increasing at a rapid rate and has real life implications where such biases could harm someone’s life.

Even the healthcare sector has not been spared. Though the use of AI and machine learning is intended to make healthcare more inclusive and reduce the inequalities in services, it is increasingly being found that it exacerbates them instead. The issue of algorithms being formulated on data that isn’t representative of society, finds very serious and far-reaching implications in healthcare by impacting the effectiveness and functionality of automated diagnostic systems which are increasingly being employed across the globe. For instance, algorithms used to diagnose skin cancer lesions are largely trained using light-skinned individuals and show lowered efficiency pertaining to individuals with darker skin, thereby placing them at risk of not being diagnosed in time.

In India, we are very far from having open conversations about the need for accountability of algorithms and transparency for the same. Though India has recently begun its journey on creating a framework for data protection, but the policy attempts by the Government have not touched upon issues such as algorithmic inequality or transparency. The Government is rapidly digitising the approach to governance and embarking upon formulating India’s data governance frameworks to build a more fair and equitable digital ecosystem. However, none of the frameworks governing digital programmes or the proposed data governance legislations address the issue of bias in data and algorithms. Another pressing issue with respect to the Government’s employment of algorithms has also been a lack of transparency or the practice of auditing the outcomes of such algorithms to assess probable bias.

Though the Government has previously acknowledged the need to standardise AI systems and development in India through the AI Stack, announced in 2019, not much progress has been made since. Many experts in the field have suggested that the way forward for India would be to formulate an algorithmic transparency and accountability Bill, which would incorporate safeguards such as algorithmic outcome auditing pertaining to the usage of automated decision-making.

There is a need to make data representative in a manner that it attempts to solve one aspect of algorithmic bias. By making the space more diverse, including members from disadvantaged groups to be part of the process and by looking at issues from an interdisciplinary lens — many outcomes involving exclusion can be pre-empted. Along with this, strong bias detection and mitigation frameworks that aim to constantly check and improve on existing design will ensure a more inclusive system.

Right from those who design these algorithms to the data sets used to create them, we must ensure that these are a reflection of our society and are as diverse as society itself. It is also imperative that such individuals represent various fields such as social science, law, ethics and philosophy in addition to engineers, to ensure that the final product is privacy respecting, ethical and inclusive. We must strive to not view the efficacy of algorithms or models myopically but broadly from a lens of inclusivity and in the hope that it serves the lowest common denominator just as efficiently as the majority.

Venkatesh is research coordinator and Mehta is strategic engagement and research coordinator, The Dialogue. The views expressed are personal.