Under the new DPDP Act, collecting personal user data from the Internet, a fundamental aspect of generative AI, may prove to be challenging
When OpenAI popularised ChatGPT at the end of last year, it took a while to realise that the trade-off to getting a human-like chatbot lay in the amount of data that it gobbled up. A lot of this data, which has been collected by numerous entities including the likes of OpenAI, Microsoft and Google over time, comes from the lack of clear regulations on data protection and privacy around the world. But, last month, India introduced its Digital Personal Data Protection (DPDP) Act, and while it is far from the perfect data protection law, it does introduce a layer that could make matters tricky for stakeholders of the generative AI industry.
DO GENERATIVE AI COLLECT SO MUCH DATA?
In a nutshell, yes. The core technology behind generative AI lies in what is known as the transformer model, a form of artificial intelligence invented by a team of Google researchers back in 2017. The idea that a transformer model works on is to consume massive amounts of data to understand how humans create sentences and react to various phrases — in short, understand how they hold conversations.
These models, known as large language models (LLMs), are as a result exposed to a vast amount of data — the more, the merrier — to let it see the widest possible variety of human conversations. This has helped it learn to speak and respond like a human, thus being able to ‘generate’ human-like responses. Hence, the term ‘generative’ AI.
“Any breach of what is permitted under the DPDP Act is likely to attract all clauses of penalties and strictures that a data fiduciary will be exposed to.”- Kirti Mahapatra, Partner, Shardul Amarchand Mangaldas & Co.
For all of these to work, it is fundamental to collect all data. Historically, the collection of such data has so far been possible in a near-unchecked manner because of the lack of stringent data regulations in most parts of the world. However, this is gradually changing, with major geographies such as the European Union, Singapore, the USA, and now, India implementing data regulations.
DPDP ACT AND GENERATIVE AI
To be sure, the DPDP Act does not mention generative AI explicitly or specifically, and instead speaks about the collection or scraping of personal data from social media platforms. However, this scraping of data is an important clause, since it enables the fundamental idea behind generative AI.
According to Section 3 of the DPDP Act, companies will be required to establish a consent procurement mechanism if they are collecting personal data of users that have not been posted by users themselves. In other words, while companies will be able to deem a user’s self-posted information on social media as consent for data collection if the same is posted by another person, companies will have to find a way to seek individual consent.
This is confusing and complicated, and also potentially expensive. The fundamental idea of collecting data at scale is to optimise the cost of finding data to train AI models on. With such a law, companies will need to create a mechanism to identify first-party data posts against third-party posts and differentiate between the two. They will then be required to enable a consent mechanism for the third-party posts while ensuring that there is no duplication of the same personal information between the two lots to ensure that companies are compliant and are not subject to litigations at any point.
WHAT DO LEGAL EXPERTS SAY?
Legal experts agree with the premise of difficulty that the DPDP Act introduces to generative AI in India. Kirti Mahapatra, Partner, Shardul Amarchand Mangaldas & Co., says, “Under the Indian digital personal data protection law, businesses processing digital personal data in India or elsewhere will face a significant impact. Entities developing and offering generative AI models will have to take into account the fact that if their AI solutions rely on personal data originating in India and it is used to offer goods and services in the country, they will be treated as ‘data fiduciaries’ under the Act.”
In simpler terms, any breach of what is permitted under the Act is likely to attract all clauses of penalties and strictures that a data fiduciary will be exposed to. As Mahapatra adds, “Non-compliance with the Act will not only attract substantial monetary penalties, it may also now impact business continuity in India.”
WILL INDIA REGULATE AI EVEN FURTHER?
An initial statement from Union IT Minister Ashwini Vaishnaw said that the Centre is not looking at regulating AI in any form to preserve innovation and progress of the technology. However, in July, during the ongoing consultative process of the upcoming Digital India Act regulation, Rajeev Chandrasekhar, Union Minister of State for Electronics and IT, said that the Centre will consider some form of AI regulation to safeguard users from “harm” due to AI.
In the recently concluded B20 India summit, top officials and executives that included G20 sherpa Amitabh Kant, Microsoft president Brad Smith, Adobe chief Shantanu Narayen and IBM chief Arvind Krishna, all spoke about the need to regulate AI to foster responsible and ethical development. The top experts also spoke about the need for major economies, including India, to regulate AI from bias, and personal data is likely to play a major role in it.
By Vernika Awal
feedbackvnd@cybermedia.co.in