Kerry Mackereth
- Nov 9, 2021
- 21 min read

Ranjit Singh on India’s Biometric Identification System, Representation and Governance

In this episode we chat to Ranjit Singh, a postdoctoral scholar for the AI On The Ground team at the New York-based Data & Society Research Institute. We discuss India’s Biometric Identification System, the problems with verifying a population of a billion people, and the difficulties in having to check whether beneficiaries of state pensions are still alive. We also talk about the problems with classification systems, and how we can better understand the risks posed by biometrics through looking at the world from the perspective of a biometric system, in high and low resolution.

Ranjit is a Postdoctoral Scholar at the AI on the Ground Initiative of Data & Society Research Institute. He studies the intersection of data infrastructures, global development, and public policy. He is currently leading a research project on and collaborating on building a research community around mapping the conceptual vocabularies and sites of AI in the Global South. His dissertation research examines on-the-ground problems in and practices of building and appropriating Aadhaar (translation: Foundation), the biometrics-based national identification infrastructure of India. It advances public understanding of the affordances and limits of biometrics-based data infrastructures in practically achieving inclusive development and reshaping the nature of Indian citizenship.

Reading List:

Ranjit Singh and Steven J. Jackson, 'Seeing like an Infrastructure: Low-resolution Citizens and the Aadhaar Identification Project', in Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW2, Article 315 (October 2021).

Episode 4: Data & Infrastructure | Public Books.

Singh, Ranjit, and Steven J. Jackson. 2017. “From Margins to Seams: Imbrication, Inclusion, and Torque inthe Aadhaar Identification Project [4].” In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Denver, CO: ACM.

Singh, Ranjit. 2020. “‘TheLiving Dead’: Orphaning in Aadhaar-Enabled Distribution of Welfare Pensions inRajasthan [5].”Public 30 (60): 92–104. https://doi.org/10.1386/public_00008_7.

Transcript:

KERRY MACKERETH:

Hi! We're Eleanor and Kerry. We're the hosts of The Good Robot podcast, and join us as we ask the experts: what is good technology? Is it even possible? And what does feminism have to bring to this conversation? If you wanna learn more about today's topic, head over to our website, where we've got a full transcript of the episode and a specially curated reading list with work by, or picked by, our experts. But until then, sit back, relax, and enjoy the episode.

ELEANOR DRAGE:

Today we’re talking to Ranjit Singh, a postdoctoral scholar for the AI On The Ground team at the New York-based Data & Society Research Institute. We discuss India’s Biometric Identification System, the problems with verifying a population of a billion people, and the difficulties in having to check whether beneficiaries of state pensions are still alive. We also talk about the problems with classification systems, and how we can better understand the risks posed by biometrics through looking at the world from the perspective of a biometric system, in high and low resolution. I hope you enjoy the show.

KERRY MACKERETH:

Hi, thank you so much for joining us. I was wondering if you wouldn’t mind introducing us, telling us a little bit about what you do, and explaining what brings you to the topic of good technology?

RANJIT SINGH:

Okay, my name is Ranjit. I am a postdoctoral scholar at the AI on the Ground team at Data & Society Research Institute, which is based out of New York City. Before doing this postdoctoral work, I was doing my dissertation at Cornell University where I was studying India's national biometric-based identification infrastructure called Aadhaar, which translates to foundation in English, I was interested in looking at how the number becomes the foundation for securing welfare rights in the country, and then how it becomes the foundation of becoming a citizen in the country, and that's what my dissertation is about. And now that I have moved on from my dissertation project, I look at this larger issue of thinking about the conceptual vocabulary and the sights of appropriating and building AI in the Global South. And the project is basically oriented towards understanding the different ways in which scholars from different disciplines, ranging from Law and Media Studies to Information Science and Anthropology and Science and Technology studies, kind of talk about issues of living with data driven systems, but in concepts and ideas which are very different from each other. But they simultaneously talk about the same things in a way. So I’m interested in what these commonalities are, and how to articulate the differences between the different approaches that people have. And that's in broad strokes what I do.

ELEANOR DRAGE:

Fantastic. As we are The Good Robot, we want your take on our million dollar question ‘what is ‘good technology’?, and specifically in the context of data-driven governance, what might good technology do and how do we go about creating it?

RANJIT SINGH:

I think it's ... you know, I thought about this question, because you also mentioned it earlier to me. And it kind of raises quite a few interesting ... One, one approach that I initially took was to actually problematize what you mean by good, in a way. And that, to a certain extent, was kind of focused on the different ways in which we think about concepts that kind of emerge in the context of thinking about data driven technology. So for example, there are a variety of different ways in which we define fairness, there's a variety of different ways in which we define accountability. So in that sense, there are a variety of ways in which we can talk about what is good in the context of how governance would work using these technologies. One of the ways in which I kind of look at it in my work is through the lens of, how are people represented through data driven technologies? There is this particular notion of data doubles or data representations that are intrinsically a part of how we think about the way in which people get accounted for in Information Systems. So if you are thinking about these information systems as being resources to organise services, whether it's for government, or whether it's private services, they kind of depend on what is stored about you, as a person, which allows people who are providing these services to actually then categorise you in one way or the other. So I'm interested in basically looking at how adequately these technologies can represent the life fold of a person. And I kind of use the word effective as way of trying to get at, you know, simultaneously the issue of, there has to be a way in which people get represented in these systems at the same time, because these systems inevitably simplify life. So people, it becomes increasingly difficult to actually account for the complexities of life in the way in which broad categorizations are created. So then the trade off between these two is what I in my world call effective representation. And I would say a good technology, especially in the context of data-driven technology, that's effective data representations of people. But that is, you know, that in itself is a word and a concept that has multiple meanings. What are the facts of being is open to interpretation not only from the perspective of people who play the systems, but also from the perspective of people who experience these systems on an everyday basis. So what I would say is that a good technology is one that allows us to not only build representations, but also allows us to question them and so any good technology should be built with a thought process that goes into due process as well as redress in the face of harms that these technologies that create in the process of being implemented. And if that process is that due process is a part of building that technology, and there is an adequate feedback mechanism, irrespective of whether the technology is designed for personalization, or whether that technology is designed to deliver services of one kind or another or target certain communities of people as opposed to others, it would work because it allows for the conversation to continue. The problem that I basically am trying to get at is that often these systems are designed without thinking through what would due process look like when people face harms from these systems. And any good technology should account for the way in which we inevitably think that you know, these technologies are going to do certain kinds of harms. And there has to be a way by which people who are subject to these harms have an opportunity to speak back. And that would be a good technology.

KERRY MACKERETH:

That’s absolutely fascinating and I really love the problems that you’re bringing out around the harms that can come from, for example, oversimplifying data that can lead to ineffective or problematic representations. I was wondering since we ended on this point of harms, would you mind going into a bit more detail on specifically what kinds of harm might emerge from these new technologies and forms of data collection?

RANJIT SINGH:

Absolutely. There are quite a few different ways in which I can actually address this question. One, of course, is that you can imagine any data set to be a bunch of rows and columns, right. And every person would basically be a row. And the different attributes are basically column-sensitive. Now that’s a very simplified version of thinking about a data set. But what it allows us to do is to think about particular categories of information, and what meaning do they have. So one of the one of the events that I specifically talk about in my research, and you know, I have a paper on this issue as well, which is, is if the issue of what I call the living dead, and it kind of written around the idea that in the distribution of elderly welfare pensions in India, one of the categories that was, you know, being negotiated and verified, was that whether these elderly people to whom these pensions had to be distributed were alive or not. All right, so you have to basically certify that you're alive in order to continue receiving pensions. Now, that's a column of data, right? Which basically says, Is this person alive, yes or no? Right. And it's based on a set of verifications that happen where street-level bureaucrats go to individual houses, and then figure out whether this person is there or not, have a conversation with them, mark them as being alive, and then they continue to receive their pensions for another year. And the process, this ritual of verification continues, right. Interestingly, when the government shifted in India, specifically a state government in Rajasthan, which is in the west of India, shifted from issuing money orders to the elderly to putting money into their bank accounts. There's a whole process of verification that went into this system where they were trying to verify whether these people whom the pensions were to be given were alive or not. And a lot of people were basically declared dead on record, without verification. And there is there are a lot of reasons for this, and one primary reason being that, you know, there are just a lot of beneficiaries and for one person to be able to verify all of them or, you know, a handful of street-level bureaucrats to be able to verify a population of a million people who need to be given these pensions, it's hard. So you know, you can easily imagine somebody who's, you know, had a really long day of work, and basically at the end of the day, realises that he still needs to finish verification of 20 more people and just realise that, you know, marking them dead is easier than actually going and meeting them. And that, to a certain extent is a part of this distance that is enacted by these data records where we simultaneously think of them as records rather than people. But they are people as well. And this tension that, you know, happens, because of the fact that, you know, in a simplified representation, we take the representation as being all that there is, rather than looking through what imagining the boundary beyond the representation, which actually reflects on real people having real lives, that, to a certain extent, is the challenge of thinking through what it means for providing services based on data, right, it's the same issue that, you know, emerges in the way in which we basically deny people loans, mortgages, based on their credit scores in America, right? It's the same kind of an issue where what we are simultaneously trying to do is to figure out good and effective ways of scaling up these processes, which doesn't require us to actually meet people in person, while simultaneously forgetting that there are real people behind these data representations and what the changes that we make to data, then have real consequences on people who are behind them, you know, or that are hidden by them. So that's the kind of harms that I usually talk about in my work is kind of a form of representation of harms. But at the same time, you know, any process that kind of transforms from one way, so, you know, the, there's whole sets of procedures that are in place in order to basically make data-driven governance work on the ground, which requires not only building these datasets, but also digitising government records, right. So you know, a lot of countries in the Global South are going through this process of digitalising statecraft. And that requires a sort of an interpretation of a variety of different existing procedures that existed before the digitalisation process began. And now these new procedures that have been added to the whole system, because the fact that, you know, there's a transition going on in the way in which governance happens, and that transition, is especially hard for people who were used to the existing procedures, and now are expected to get used to this new procedure almost immediately, because the government has shifted very quickly. But people have not basically been able to actually secure the right amount of data infrastructure literacy to be able to understand and navigate these systems. And that creates another set of harms, which are kind of focused more on, you know, how do I manage the circulation of my data from one, you know, government department to another? How do I manage the interpretation of my data, which is being done by somebody who doesn't know my situation, right. And all of these are different kinds of ways in which data kind of becomes not only a feature of data driven governance, but also its work in a way, and that relationship is, goes beyond the question of representation and it becomes a question of what I was talking about initially, about due process, where, you know, there has to be mechanisms of grievance redressal in place, and there have to be mechanisms of what I call in my work, backward compatibility of bureaucratic procedures. And what I mean by that is that just like information systems are backward compatible as versions, we cannot shift to a new way of governance without actually having some kind of backward compatibility with existing procedures, so that we can shift back to them in case of failures and then slowly move towards the new version over time.

KERRY MACKERETH:

Thank you so much, it's really really interesting work and I see such fascinating parallels with the kind of work you’re doing with the work we’re doing in gender studies and feminism when we think about harms, right, and something I really like about your work is that you talk about these ideas of high resolution and low resolution forms of harm, and to me this bears lots of parallels with work in gender studies which looks at how often certain kinds of gender-related violence aren’t recognised as crises or they’re framed as being very normal, very normalised, chronic, mundane, things that scholars like Lauren Berlant calls ‘slow death’, or Elizabeth Povinelli talk about operating in these temporalities of the normal. So I was wondering if you wouldn't mind explaining or exploring a bit more these kinds of harm, high and low resolution kinds of harm and how that relates to data governance?

RANJIT SINGH:

You know, right. So to give a bit of context to where high and low resolution comes from, it’s kind of based on the idea of thinking about, how do we place ourselves in the perspective of these systems and look at the world through their eyes, right? So the work is kind of based on what I call the conceptual frame of seeing like an infrastructure, right, so it’s kind of borrowed from a kind of standpoint epistemology where there is a recognition of the fact that, you know, our view of the world is always partial and perspective. And if we change that perspective to these data driven systems, and look at the world through their eyes, what do we see? Right. And different actors do different ways of seeing like an infrastructure. So for example, for a designer, they are looking at people through the eyes of these infrastructure and one of the insights that, you know, I got from this way of seeing is when one of the designers talked about, you know, we had to reduce the mandatory data categories for enrollment into this biometric project to only four demographic categories, primarily, because the more the number of data categories there are, the more filters of exclusion we are putting on people. So if anybody cannot provide data for a particular data category, which is mandatory, they would be excluded from the system. And that, to a certain extent, also speaks to the notion of gender, because one of the demographic categories that was important, and if made mandatory in the system is gender. And, interestingly enough, Aadhaar is the first ID document in India that recognises transgender as a separate sex in India, which kind of created its own set of challenges for for transgender people because initially, if you had to enrol into the system … because your previous ID documents either listed you as a male or a female, you could not use your previous ID documents to certify that you are transgender, which required which created this additional requirement of documents issued by doctors which certify that you are transgender, which created a barrier for a lot of people in basically enrolling for the system and declaring themselves as being transgender. So this led to an entire court case where in … you know, the early, you know, I think in the early 2010s, I don't remember the year, now, the Supreme Court decided that you can simply declare your gender without having to provide proof of it. And that, to a certain extent, was a transformational moment for the project itself because it allowed for people to simply declare their sex, rather than having to prove it. So what I mean by this is that now this system is looking at people through the lens of this gender and creating this category of whether you are male, female, or transgender. And that's one way by which, you know, the system defines your identity. But, you know, on the other way, or the other way of looking at it would be, you know, how do people look at what they need to do in order to participate in the system, right, so they need to provide these sets of documents, which allows them to actually become a part of the system. The classic case, again, here, taking the transgender community in India in mind, is that the challenges the transgender community has faced in India and enrolling into Aadhaar will find over also based on being able to provide a proof of residence. So and this kind of comes from, you know, one of the ways in which transgender communities live in India, so the you know, there'll be a bunch of 40 to 50 transgender people who would live in the same house. And that, to a certain extent, is the way in which this, you know, community kind of works together. But at the same time, the expectation of nuclear families means that people cannot claim that they live in the same place on record, which creates a bunch of issues in terms of just providing proof of residence to be able to enrol into the system and then basically claim welfare based on that, right. So what I mean by the issues of high resolution and low and low resolution is that the examples that I currently provided are of how the system struggles to see certain communities of people, whether it's because people have to provide proof that they're transgender, or whether they have to provide proof of residence, enrollment registering into the system is in itself a hard process. And that to a certain extent, means, or at least in my framework, it means that they manifest to these systems in low resolution. And what I mean by rolling low resolution here is simply you know, again, think about any optical image in low resolution, which kind of provides a very pixelated view of, you know, anything that is being imaged. And that, to a certain extent, impedes the way you can see that image, right. So you can look at it in terms of, you know, how computer monitors work, right, they can have high resolution monitors, and you can have low resolution monitors. And you can basically imagine the difference in the way in which you can see an image on any of these monitors separately. So along these lines, there are also people who are really readily visible in these systems for a variety of reasons. So one larger argument here, for example, is that minoritised communities, especially in America, have been systems [subject?] to long term surveillance, you know, there is this beautiful book Dark Matters [Simone Browne], I think it’s called, which kind of focuses on this particular issue of how minorities are the most surveilled communities in the world. You know, we do that with the poor as well, in the world where, you know, the poor are the most surveilled communities across the world in that sense. So to a certain extent, what we are also simultaneously doing is that we are producing more and more information about particular communities or people as a way of surveilling them, which basically means that we are creating high resolution images, data representations rather, of these people through these systems. To a certain extent that basically lends itself to the challenges of surveillance, as well as invasion of privacy, which is a different set of harms, and it also depends on what is the politics of why a system is being put into use. So for example, if I'm trying to provide welfare, through a data-driven system of welfare delivery, high resolution in this context would basically mean that you're being included into the system of welfare distribution, which is something that, you know, a lot of people might actually want, right? There is literature, in in the context of India, which kind of focuses on this idea that, you know, ID cards are not just a resource to secure citizenship rights, they are also a resource to basically protect … that people use to protect themselves against harassment, from the police and other sorts of, you know, state based interventions, right? If you are, if you don't have an ID, you are stateless, and that to a certain extent produces its own set of challenges. So to some extent, being high resolution in systems that actually recognise you and allow you to represent yourself to the state is something that, you know, a lot of people actually need, to basically participate in the functioning of the state itself. But at the same time, if the same system is basically used, being used for criminal investigations or for differentiating between immigrants and citizens, then you can easily imagine that, you know, being high resolution in such systems, renders you more likely to face harms in the context of you know, how you are treated as an immigrant, or how you are basically could be suspected of a crime that you haven't done. So in a way this, it's not simply just a question of, you know, low resolution means marginalisation and high resolution [means] surveillance. It's a question of this looking through the spectrum of different ways in which different kinds of data representations in the context of the politics of the infrastructure that is being used, in order to differentiate between people on one ground or the other can produce these spectrums of harms, which are different according to how we imagine the politics of these systems to be.

ELEANOR DRAGE:

As Kerry said we use feminist theory to ground and orient our understanding of what makes good technology, and we have texts and scholars that we particularly love and that drive and orient our work, but what key ideas or scholars from science and technology studies or from other disciplines grounds your work?

RANJIT SINGH:

Absolutely. So you know how, during when you're doing your AP exams, advancement exams, in, in the, in an American PhD programme, you're also supposed to, you know, do field surveys of literature, in a way. So one of the works that I was basically interested in, at that point of time was where does this notion of infrastructure come from? And what I found, interestingly is that, you know, there's that lineage or you know, that can be drawn between various ways in which old school STS [Science and Technology Studies] thought about issues of infrastructure, and to a certain extent, the notion of infrastructure developed by Susan Leigh Star and Karen Ruhleder, is actually a feminist critique of, you know, large technical systems and actor network theory. So, to a certain extent, you know, part of my work has always been about engaging productively with feminist critiques of Science and Technology. One of the papers that, you know, always comes to my mind, when somebody asked me this question of where do you draw your inspiration is the speech by Susan Leigh Star on being allergic to onions. And the piece is fundamentally about how she went to McDonald's to order a burger. And she's allergic to onions so she basically asked them not to put onions on her burger. And because the standardised version of meeting a McDonald's burger requires you to have onions on it, the simple request of not having an onion on a burger effectively meant that you had to wait for 45 minutes to get a burger. And she writes a paper about this [laughs]. And fascinatingly, what she's talking about in that paper is about these particular processes by which we standardise things, and how any deviation from the standard results in a form of being rendered residual, in one way or the other. And Star goes on to develop these ideas in a variety of different ways. Right. So you know, there's this work which he does with Geff Bowker, where she's basically talking about torque, or the experience of time, as you would experience it, when, you know, when your time doesn't actually match with the time of a classification system. And the core example that they use there is the experience of people during apartheid in South Africa, as a way of grounding that, you know, you could be classified in terms of your race in one way and then will only be able to access certain spaces because of your of the racial segregation of geography itself. I kind of draw a lot on it, I also kind of have followed Star’s work as she continues to develop these ideas over time, for their later work they kind of talk about residual categories, and what it means to be residual within a system. And the core argument there is primarily centred on the idea that, you know, a residual category for a classification system, the simplest example of it would be ‘none of the above’, right. So you know, you don't fit into any of these categories that are given to you. So you kind of become a part of ‘none of the above’. And this is a sort of double silencing of people. One is that you basically are not represented in the system, but also any kind of identity or social history that you might have is simply erased because of the fact that ‘none of the above’ in and of itself doesn't mean anything, right, it's a way of creating non-people.

I kind of use this notion of residual category as a way of thinking about how in the distribution of welfare through the biometric-based system in India, it creates new kinds of people who are residual to data-driven governance or welfare in new ways. And one of the classic examples that I use here is that what biometrics does for data records is that it kind of enables the government to fundamentally argue that for people who are able to provide their biometric IDs for government records or welfare distribution, these data records are unique. Whereas any other record that was on this database prior to this edition of this biometric ID is now a duplicate entry. Right? It’s a ghost entry, it's a duplicate entry, it doesn't exist anymore, because you know, all people who are unique have already claimed that data records. So now increasingly, one of the core work that a lot of marginal people have to do in the country in order to continue receiving welfare is to prove that they are not duplicate. And that they’re are real people who have not forsaken their entitlements. Right. So these categories of people, for example, duplicate records, or people who have forsaken their entitlements, because they didn't claim their data record, are new categories of residuality that you can see in data-driven governance and governance of welfare in India. So I think that's the core of my work in terms of how I basically think through some of these issues using Susan Leigh Star’s work. In her later life, Susan Leigh Star was thinking about a concept which she called “orphans of an infrastructure”. And what she meant by it was that, you know, there are people who kind of simply struggle with representing themselves because of a variety of reasons. One is that they're beyond the representational scope of an infrastructure, or, you know, it could simply be the data clerks don't believe them, and don't record that information, right, there can be a variety of ways in which you can be an orphan to the system. I kind of look at it in terms of you know, orphan is a noun. And I kind of look at this whole process as being a practice, right? So it's not that people are orphans and that's a given. There's a continuous process by which people become orphans, which I call orphaning. Right? So, you know, this, this particular way in which we think about infrastructures as being given sometimes, right, it's a way by which they are the invisible background on which our life happens. But with data, we are currently looking at how this infrastructure is being created. Right? It's in the process of happening, it's becoming rather than being right, in a way, and that to a certain extent has been the focus of my work, where I'm looking at these consequential infrastructure processes that are underway right now, which are creating new forms of distinctions between people who are able to leverage these systems and are better able to represent themselves at the expense of others who struggle with their representation, and are not able to leverage these systems to secure their own life chances. And that distinction and that difference is something that, you know, I am deeply interested in and try to theorise in my work.

KERRY MACKERETH:

Thank you so much, that was so really fascinating, and honestly there are so many things that now I want to go away from this and look into, like the residual categories in particular, the onion story as well, it’s such an amazing illustration of those kinds of ordinary frustrations that we know and experience but actually have much bigger implications to how we think about data and processes and how they get used. We both just want to say thank you so much for taking the time to talk to us today about your work, we’re both really excited to continue following what you’re doing. Before we sign off we believe that you’re also moving into the podcast world with your work, so could you give us a brief description of what you’re planning to do?

RANJIT SINGH:

I am particularly interested in storytelling, and what I hope to do is to organise a set of workshops on stories of AI from the global south, and then do a podcast, which is kind of like a story slam. So bring people together, ask them to tell stories of their work, and then discuss the commonalities between the stories that are basically narrated, right. So get two or three people together, listen to their stories, and then have a discussion on what these stories mean for these larger conceptual arguments that we make on AI in the Global South. And whether it is about leapfrogging or the digital transformation of societies. Or it's about extractivism, data colonialism, there is a broad spectrum of concepts and ideas that are basically used in order to describe what the impact of AI would be on countries, which are still trying to wrap their way around how to basically make these technologies work for themselves. And I find these stories to be effective in that sense and I'm hoping that my podcasts will probably be about what the stories are, where do they come from and how do people actually tell the stories in the first place?

Ranjit Singh on India’s Biometric Identification System, Representation and Governance

Recent Posts

Join our mailing list