Relevant documents: 3rd Report from the Constitution Committee and 9th Report from the Delegated Powers Committee. Scottish, Welsh and Northern Ireland Legislative Consent sought.
My Lords, I remind the Committee that if there is a Division in the Chamber, the Committee will adjourn for 10 minutes from the sound of the Division Bells.
Clause 67: Meaning of research and statistical purposes
59: Clause 67, page 75, line 9, after “processing” insert “solely”.
Member’s explanatory statement
This amendment prevents misuse of the scientific research exceptions for data reuse by ensuring that the only purpose for which the reuse is permissible is for the scientific research—with no additional purposes.
I have tabled Amendments 59, 62, 63 and 65, and I thank the noble Lord, Lord Clement-Jones, my noble friend Lady Kidron and the noble Viscount, Lord Camrose, for adding their names to them. I am sure that the Committee will agree that these amendments have some pretty heavyweight support. I also support Amendment 64, in the name of the noble Lord, Lord Clement-Jones, which is an alternative to my Amendment 63. Amendments 68 and 69 in this group also warrant attention.
I very much support the Government’s aim in Clause 67 to ensure that valuable research does not get discarded due to a lack of clarity around its use or because of an overly narrow distinction between the original and new purposes of the use of the data. The Government’s position is that this clause clarifies the law by incorporating into the Bill recitals to the original GDPR. However, while the effect is to encourage scientific research and development, it has to be seen in the context of the fast-evolving world of developments in AI and the way that AI developers, given the need for huge amounts of data to train their large language models, are reusing data.
My concern is that the scraping of vast amounts of data by these AI companies is often positioned as scientific research and in some cases is even supported by the production of academic papers. I ask the Minister to understand my concerns and those of many in the data community and beyond. The fact is that the lines between scientific research, as set out in Clause 67, and AI product development are blurred. This might not be the concern of the original recitals, but I beg to suggest to the Minister that, in the new world of AI, there should be concern about the definition presented in the Bill.
Like other noble Lords, I very much hope to make this country a centre of AI development, but I do not want this to happen at the expense of data subjects’ privacy and data protection. It costs at least £1 billion—even more, sometimes—to develop a large language model and, although the cost will soon go down, there is a huge financial incentive to scrape data that pushes the boundaries of what is legitimate. In this climate, it is important that the Bill closes any loopholes that allow AI developers to claim the protections offered by Clause 67. My Amendments 59, 62, 63 and 65 go some way to ensuring that this will not happen.
My Lords, I support the amendments from the noble Viscount, Lord Colville, which I have signed, and will put forward my Amendments 64, 68, 69, 130 and 132 and my Clause 85 stand part debate.
This part of the GDPR is a core component of how data protection law functions. It makes sure that organisations use personal data only for the reason that it was collected. One of the exceptional circumstances is scientific research. Focus on the definitions and uses of data in research increased in the wake of the Covid-19 pandemic, when some came to the view that legal uncertainty and related risk aversion were a barrier to clinical research.
There is a legitimate government desire to ensure that valuable research does not have to be discarded because of a lack of clarity around reuse or very narrow distinctions between the original and new purpose. The Government’s position seems to be that the Bill will only clarify the law, incorporating recitals to the original GDPR in the legislation. While this may be the policy intention, the Bill must be read in the context of recent developments in artificial intelligence and the practice of AI developers.
The Government need to provide reassurance that the intention and impact of the research provisions are not to enable the reuse of personal data, as the noble Viscount said, scraped from the internet or collected by tech companies under legitimate interest for training AI. Large tech companies could abuse the provisions to legitimise mass data scraping of personal data from the internet or to collect via legitimate interest—for example, by a social media platform, about its users. This could be legally reused for training AI systems under the new provisions if developers can claim that it constitutes scientific research. That is why we very much support what the noble Viscount said.
In our view, the definition of scientific research adopted in the Bill is too broad and will permit abuse by commercial interests outside the policy intention. The Bill must recognise the reality that companies will likely position any AI development as “reasonably described as scientific”. Combined with the inclusion of commercial activities in the Bill, that opens the door to data reuse for any data-driven product development under the auspices that it represents scientific research, even where the relationship to real scientific progress is unclear or tenuous. That is not excluded in these provisions.
4:00 pm
Businesses already routinely conduct trials or profit from children’s use of educational technology for product development, without their knowledge or parental permission. This is contrary to the UNCRC Article 32 principle of a right to protection from economic exploitation or public engagement, which work suggests parents want.
I turn to Amendments 68 and 69. There is a danger of what can be described as a clubcard culture of sharing data—however useful and without consideration of the data subject—permeating this Government’s approach to data. These amendments probe whether a researcher who is self-described as scientific would be able to use the data of those who have objected to their data being used in that way. They add safeguards to Clause 68 to ensure that confidence in research and government uses of data is maintained. They are designed to make it clear that, when the purpose limitations are changed, a choice must be offered to data subjects, and to ensure that existing data subject dissents are respected and cannot be ignored.
On the clause stand part notice, Clause 85, despite its title, actually removes safeguards on the use of data for research purposes, as the noble Viscount mentioned and as I explained. The powers in the clause, particularly in new Article 84D, provide wide discretion to the Secretary of State without meaningful parliamentary scrutiny. These powers, as the noble Viscount has mentioned, were identified by EU stakeholders as a main source of concern regarding the continuation of the UK adequacy decision, a review of which is due in 2025—as we have referred to throughout proceedings. The risks these powers constitute to the UK adequacy decision are more than hypothetical. If the need to establish a delegated legislative power is justified, it needs to be subject to clear restraints and the Secretary of State should not be given unfettered discretion to override the rights and freedoms of individuals under the GDPR.
My Lords, I will speak to Amendments 59, 62, 63 and 65 in the name of my noble friend Lord Colville, and Amendment 64 in the name of the noble Lord, Lord Clement-Jones, to which I added my name. I am also very much in sympathy with the other amendments in this group more broadly.
My noble friend Lord Colville set out how he is seeking to understand what the Government intend by “scientific research” and to make sure that the Bill does not offer a loophole so big that any commercial company can avoid data protections of UK citizens in the name of science.
At Second Reading, I read out a dictionary definition of science:
“The systematic study of the structure and behaviour of the physical and natural world through observation, experimentation, and the testing of theories against the evidence obtained”—
i.e. everything. I also ask the Minister if the following scenarios could reasonably be considered scientific. Is updating or improving a new tracking app for fitness, or a bot for an airline, scientific? Is the behavioural science of testing children’s response to persuasive design strategies in order to extend the stickiness of commercial products scientific? These are practical scenarios, and I would be grateful for an answer in order to understand what is in and out of the scope of the Bill.
When I raised Clause 67 at a briefing meeting, it was said that it was, as my noble friend Lord Colville suggested, just housekeeping. The law firm Taylor Wessing suggests that what can
“‘reasonably be described as scientific’ is arguably very wide and fairly vague, so it will be interesting to see how this is interpreted, but the assumption is that it is intended to be a very broad definition”.
Each of the 14 law firm blogs and briefings that I read over the weekend described it variously as loosening, expanding or broadening. Not one suggested that it was a tightening and not one said that it was a no-change change. As we have heard, the European Data Protection Supervisor published an opinion stating that
My Lords, I have in subsequent groups a number of amendments that touch on many of the issues that are raised here, so I will not detain the Committee by going through them at this stage and repeating them later. However, I feel that, although the Government have had the best intentions in bringing forward a set of proposals in this area that were to update and to bring together rather conflicting and difficult pieces of legislation that have been left because of the Brexit arrangements, they have managed to open up a gap between where we want to be and where we will be if the Bill goes forward in its present form. I say that in relation to AI, which is a subject requiring a lot more attention and a lot more detail than we have before us. I doubt very much whether the Government will have the appetite for dealing with that in time for this Bill, but I hope that at the very least—it would be a minor concession at this stage—they will commit at the Dispatch Box to seeking to resolve these issues in the legislation within a very short period because, as we have heard from the arguments made today, it is desperately needed.
More importantly, if, by bringing together documentation that is thought to represent the current situation, either inadvertently or otherwise, the Government have managed to open up a loophole that will devalue the way in which we currently treat personal data—I will come on to this when I get to my groups in relation to the NHS in particular—that would be a grievous situation. I hope that, going forward, the points that have been made here can be accommodated in a statement that will resolve them, because they need to be resolved.
My Lords, it is a pleasure to take part in today’s Committee proceedings. In doing so, I declare my technology interests as set out in the register, not least as adviser to Socially Recruited, an AI business.
I support the noble Viscount, Lord Colville, in his amendments and all the other amendments in this group. They were understandably popular, to the extent that when I got my pen out, there was no space left for me to co-sign them, so I was left with the oral tradition in which to reflect my support for them. Before going into the detail, I just say that we have had three data Bills in just over three years: DPDI, DISD and this Bill. Over that period, though the names have changed, much of the meat remains the same in the legislation. Yet, in that period, everything and nothing haschanged —everything in terms of what has happened with generative AI.
Considering that seismic shift that has occurred over these three Bills, could the Minister say what in this Bill specifically has changed, not least in this part, to reflect that seismic change? Regarding “nothing has changed”, nothing has changed in terms of the incredibly powerful potential of AI for positive or negative outcomes, ably demonstrated with this set of amendments.
If you went on to Main Street and polled the public, I believe that you would get a pretty clear understanding of what they considered scientific research to be. You know it. You understand why we would want to have a specified definition of scientific research and what that would mean for the researchers and for the country.
However, if we are to draw that definition as broadly as it currently is in the Bill, why would we bother to have such a definition at all? If the Government’s intention is to enable so much to come within the perimeter, let us not have the definition at all and let us allow to continue what is happening right now, not least in the reuse of scrape data or in how data is being treated in these generative AI models.
I start by apologising because, due to a prior commitment, I am not able to stay for many of the proceedings today, but I see these groupings and others as critical. In the few words that I will say, I hope to bring to bear to this area some of my experience as a Health Minister, particularly in charge of technology and development of AI.
I can see a lot of good intent behind these clauses, to make sure that we do not stop a lot of the research that we need. I was recently very much involved in the negotiation of the pandemic accord regarding the next pandemic and how you make sure that any vaccines that you develop on a worldwide basis can be distributed on a worldwide basis as well. One of the main stumbling blocks was that the so-called poorer countries were trying to demand, as part of that, the intellectual property to be able to develop the vaccines in their own countries.
The point we were trying to make was that, although we could see the good intentions behind that, it would have a real chilling effect on pharmaceutical companies investing the hundreds of millions or even billions of pounds, which you often need with vaccines, to find a cure, because if they felt that they were going to lose their intellectual property and rights at the end, it would be much harder for them to justify the investment up front.
4:15 pm
One thing that got me excited about the potential of AI in the health space was said by some Harvard professors. For years and years, we have not been able to make any inroads into dementia because we just do not know any of the causes and what we are trying to go after. The reason we were able to get a Covid vaccine so quickly was that we knew exactly what we were trying to attack. With dementia, we do not have those avenues of attack, but the professors said that if you take the data we have in the UK—yes, it would involve scraping—and look at the people who are suffering from dementia today, wind the clock back 10, 15 or 20 years and look at what they were seeing their GP about, you will start to see some of the early warning indicators. If you throw all that at AI, you might suddenly have whole new avenues of attack, because it identifies patterns that you did not realise existed.
There absolutely are scientific research reasons for doing that, and it is done from a very tricky position; but of course, the pharmaceutical companies would do it for the commercial benefit, because if you can find a cure or something to ameliorate the progression of dementia, that would obviously be incredibly valuable. It is about getting that balance right.
20 of 227 shown
The definition of scientific research in proposed new paragraph 2, in Clause 67(1)(b), is drawn broadly. My concern is that many commercial developments of digital products, particularly those involving AI, could still claim to be, in the words of the clause, “reasonably … described as scientific”. AI model development usually involves a mix of purposes—not just developing its capabilities but also commercialising as it develops services. The exemption allowed for “purposes of technological development” makes me concerned that this vague area creates a threat whereby AI developers will misuse the provisions of the Bill to reuse personal data for any AI developments, provided that one of their goals is technological advancement.
Amendments 59 and 62, by inserting the word “solely” into proposed new paragraphs 2 and 3 in Clause 67, would disaggregate reuse of data for scientific research purposes from other purposes, ensuring that the only goal of reuse is scientific research.
An example of the threat under the present definition is shown by Meta’s recently allowing the reuse of Instagram users’ data to train its new generation of Llama models. When the news got out, it created a huge backlash, with more than half a million people reposting a viral hoax image that claimed to deny Meta the right to reuse their data to train AI. This caused the ICO to say that it was pleased that Meta had paused its data processing in response to users’ concerns, adding:
“It is crucial that the public can trust that their privacy rights will be respected from the outset”.
However, Meta could well claim under this clause that it is creating technological advancement which would allow it to reuse any data collected by users under the legitimate interest grounds for training the model. The Bill as it stands would not require the company to conduct its research in accordance with any of the features of genuine scientific research. These amendments go some way to rectify that.
Amendment 63 increases the test for what is deemed to be scientific interest. At the moment, the public interest test is applied only to public health. I am pleased that NHS researchers will have to recognise this threshold, but why should all researchers doing scientific work not have to adhere to this threshold? Why should that test not be applied to all data reuse for scientific research? By deleting the public health exception, the public interest test would apply to all data reuse for scientific purposes.
The original intention of the RAS purpose of the GDPR supports public health for scientific interests. This is complemented by Amendment 65, which uses the tests for consent already laid out in Clause 68. The inclusion of ethical thresholds in the reuse of data should meet the highest levels of academic rigour and oversight envisaged in the original GDPR. It will demand not just ethical standards in research but for it to be supervised by an independent research ethics committee that meets UKRI guidance. These requirements will ensure that the high standards of ethics that we expect from scientific research will be applied in evaluating the exemption in Clause 67.
I do not want noble Lords to think that these amendments are thwarting the development of AI. There is plenty of AI research that is clearly scientific. Look at DeepMind AlphaFold, which uses AI to analyse the shape of proteins so that they can be incorporated in future drug treatment and will move pharmaceutical development. It is an AI model developed in accordance with the ethical standards expected from modern scientific research.
The Minister will argue that the definition has been taken straight from EU recitals. I therefore ask her to consider very seriously what has been said about this definition by the EU’s premier data body, the European Data Protection Supervisor, in its preliminary opinion on data protection and scientific research. In its executive summary, it states:
“The boundary between private sector research and traditional academic research is blurrier than ever, and it is ever harder to distinguish research with generalisable benefits for society from that which primarily serves private interests. Corporate secrecy, particularly in the tech sector, which controls the most valuable data for understanding the impact of digitisation and specific phenomena like the dissimilation of misinformation, is a major barrier to social science research … there have been few guidelines or comprehensive studies on the application of data protection rules to research”.
It suggests that the rules should be interpreted in such a way that permits reuse only for genuine scientific research.
For the purpose of this preliminary opinion by the EDPS, the special data protection regime for scientific research is understood to apply if each of three criteria are met: first, personal data is processed; secondly, relevant sectorial standards of methodology and ethics apply, including the notion of informed consent, accountability and oversight; and, thirdly, the research is carried out with the aim of growing society’s collective knowledge and well-being as opposed to serving primarily one or several private interests. I hope that noble Lords will recognise that these are features that the amendments before the Committee would incorporate into Clause 67.
In the circumstances, I hope that the Minister, who I know has thought deeply about these issues, will recognise that the EU’s institutions are worried about the definition of scientific research that has been incorporated into the Bill. If they are worried, I suggest that we should be worried. I hope that these amendments will allay those fears and ensure that true scientific research is encouraged by Clause 67 and that it is not abused by AI companies. I beg to move.
I turn to Amendments 64, 68, 69, 130 and 132 and the Clause 85 stand part debate. The definition of scientific research in proposed new paragraph 2 under Clause 67(1)(b) is drawn so broadly that most commercial development of digital products and services, particularly those involving machine learning, could ostensibly be claimed by controllers to be “reasonably described as scientific”. Amendment 64, taken together with those tabled by the noble Viscount that I have signed, would radically reduce the scope for misuse of data reuse provisions by ensuring that controllers cannot mix their commercial purposes with scientific research and that such research must be in the public interest and conducted in line with established academic practice for genuine scientific research, such as ethics approval.
Since the Data Protection Act was introduced in 2018, based on the 2016 GDPR, the education sector has seen enormous expansion of state and commercial data collection, partly normalised in the pandemic, of increased volume, sensitivity, intrusiveness and high risk. Children need particular care in view of the special environment of educational settings, where pupils and families are disempowered and have no choice over the products procured, which they are obliged to use for school administrative purposes, for learning in the classroom, for homework and for digital behavioural monitoring.
The implications of broadening the definition of research activities conducted within the state education sector include questions of the appropriateness of applying the same rules where children are in a compulsory environment without agency or routine practice for research ethics oversight, particularly if the definition is expanded to commercial activity.
Parental and family personal data is often inextricably linked to the data of a child in education, such as home address, heritable health conditions or young carer status. The Responsible Technology Adoption Unit within DSIT commissioned research in the Department for Education to understand how parents and pupils feel about the use of AI tools in education and found that, while parents and pupils did not expect to make specific decisions about AI optimisation, they did expect to be consulted on whether and by whom pupil work and data can be used. There was widespread consensus that work and data should not be used without parents’ and/or pupils’ explicit agreement.
“scientific research is understood to apply where … the research is carried out with the aim of growing society’s collective knowledge and wellbeing, as opposed to serving primarily one or several private interests”.
When the Minister responds, perhaps she could say whether the particular scenarios I have set out fall within the definition of scientific and why the Government have failed to reflect the critical clarification of the European Data Protection Supervisor in transferring the recital into the Bill.
I turn briefly to Amendment 64, which would limit the use of children’s personal data for the purposes of research and education by making it subject to a public interest requirement and opt-in from the child or a parent. I will speak in our debate on a later grouping to amendments that would enshrine children’s right to higher protection and propose a comprehensive code of practice on the use of children’s data in education, which is an issue of increasing scandal and concern. For now, it would be good to understand whether the Government agree that education is an area of research where a public interest requirement is necessary and appropriate and that children’s data should always be used to support their right to learn, rather than to commoditise them.
During debate on the DPDI Bill, a code of practice on children’s data and scientific research was proposed; the Minister added her name to it. It is by accident rather than by design that I have failed to lay it here, but I will listen carefully to the Minister’s reply to see whether children need additional protections from scientific research as the Government now define it.
We have seen what has happened in terms of the training, but when you look at what could be called development and improvement, as the noble Viscount has rightly pointed out, all this and more could easily fit within the scientific research definition. It could even more easily fit in when lawyers are deployed to ensure that that is so. I know we are going to come on to rehearsing a number of these subjects in the next group but, for this group, I support all the amendments as set out.
I ask the Minister these two questions. First, what has changed in all the provisions that have gone through all these three iterations of the data Bill? Secondly, what is the Government’s intention when it comes to scientific research, if it is not truly to mean scientific research, if it is not to have ethics committee involvement and if it is not to feel sound and be defined as what most people on Main Street would recognise as scientific research?