AI content labelling
This Library briefing paper provides an overview of what AI content labelling is, why it is being used and how it works.
‘Generative artificial intelligence’ (AI) refers to systems that use machine learning to create new content such as text, images, audio, video and code. Content produced by generative AI is becoming increasingly realistic. This makes it harder for people to distinguish between content created with, and without, AI.
What is AI content labelling and why is it being used?AI content labelling is used to alert people when they are engaging with content that has not been created by humans. It involves marking content that has been generated or altered by AI to help people understand its origins and assess its reliability.
AI content labelling is particularly relevant in the context of ‘deepfakes’, which are AI-generated videos, images or audio files deliberately designed to appear real. Deepfakes can spread disinformation and be used to commit fraud and depict individuals in misleading ways. Labels applied to deepfakes are sometimes described as “impact-based labels”, as they highlight the potential for harm.
Not all AI-generated or AI-altered content sets out to deceive or is inherently misleading. “Process-based labels” aim to communicate to users how a particular piece of content was created, including whether AI was involved, rather than comment on its potential to cause harm or other consequences.
How does AI content labelling work?AI content labelling can take different forms. Visible disclaimers are explicit labels placed on content, such as text overlays, captions, watermarks or audio prompts. For example, AI chatbots may be labelled as “simulation” or “parody” to clarify that users are interacting with software rather than a real person. Tags may also be embedded in the metadata for a piece of content to provide information about its origins and to indicate how AI was involved in its creation.
The Coalition for Content Provenance and Authenticity (C2PA), for example, has developed “content credentials”, an open-source protocol using cryptography to encode details about a piece of content’s origin and editing history. A watermark label, usually a speech bubble containing “cr”, accompanies the content and provides provenance information when it is clicked. Companies such as Adobe have adopted this system and integrated it into their software, and platforms like LinkedIn and Meta use it to label AI-generated content.
Invisible digital watermarks are also being developed for a range of digital content. Invisible digital watermarks embed technical signals in content that reveal its origin and/or composition. While these signals are invisible to people viewing the content, they can be detected by specialised algorithms.
There is not yet a consensus on the most effective design for an AI content label. The answer will partly be determined by what a label is setting out to achieve. For example, is it to highlight the use of generative AI in the creation of a piece of content (a ‘process-based’ label), or is to alert people that a piece of content could be misleading (an ‘impact-based’ label)?
Are there requirements to label AI content?In the UK, there is no legislation requiring AI-generated content to be labelled. The UK Government’s consultation on Copyright and Artificial Intelligence, which closed in December 2024, acknowledged the benefits of clear AI labelling, but noted that there were “technical challenges involved”. Proposals for wider AI regulation in the UK have been delayed, and it is unclear whether future legislation will include labelling requirements.
In the European Union, article 50 of the EU AI Act sets transparency rules for content produced by generative AI. For example, providers of AI systems that interact with humans must alert users when they are interacting with AI, while providers of AI systems that generate or manipulate content must mark outputs in a machine-readable way. These rules are due to take effect in August 2026, although the European Commission has proposed delaying implementation until 2027.
Company policies on AI content labellingSocial media companies, news organisations, search engines and gaming platforms have adopted varying approaches to labelling AI-generated content. Social media companies variously rely on automatic detection technologies and user disclosures to label AI content, and in some cases differentiate between AI-edited and AI-generated content. Many news organisations include guidelines on the use of AI-generated content in their editorial policies, although published, publicly available guidelines are not universal. The policies of search engines and gaming platforms on AI-generated content vary significantly and are often partly determined by the policies of the companies that own or operate them.