What is data protection in the context of AI?

Privacy, security and identity

Data protection is the body of law and practice governing how personal data is collected, used, stored and shared, built around durable principles such as lawfulness, purpose limitation, data minimisation, accuracy, storage limitation, security and accountability. In the context of AI, it is the backbone regime most systems interact with, because training data, model inputs, model outputs, profiling and automated decisions routinely involve information about identifiable people. It is broader than any single statute and distinct from privacy as a right.

Reviewed by Jackie, Head of Learning & Development, Levellers · Last reviewed 8 June 2026

What this means

Data protection is the field of rules and operational practice that controls what organisations may do with personal data, meaning any information relating to an identified or identifiable living person. It sets conditions for collecting, using, storing, sharing and deleting that data, and it gives individuals rights over information about them. It applies whenever a person can be identified, directly or indirectly, even if a name is not attached.

It helps to separate three things that are often blurred. Privacy is a broad human right, recognised in instruments such as the Universal Declaration of Human Rights, concerned with private life and freedom from intrusion. Data protection is a related but distinct field focused specifically on the processing of personal data; in EU law it is even treated as a separate fundamental right. The GDPR is one specific instrument that implements data protection in the EU and, in domesticated form, the UK; it is not the whole field. Information security is narrower still: it is one component of data protection (keeping data safe), not a substitute for the lawful, fair and accountable handling the wider regime requires.

In an AI setting, data protection is the regime that most often bites first. Models are trained on large datasets that frequently contain personal data; prompts and inputs can contain personal data; outputs can reveal or infer it; and many AI uses involve profiling or decisions about people. Regulators have been clear that there is no "AI exemption" from these rules.

Why it matters

For anyone building, buying or governing AI, data protection is usually the first hard legal constraint to clear, and the one with the sharpest enforcement teeth. Across most regimes the core obligations are similar enough that getting the principles right travels well across borders, while getting them wrong is expensive and public.

The stakes are concrete. Infringements of the basic processing principles sit in the highest tier of fines under the UK GDPR, up to GBP 17.5 million or 4 per cent of total worldwide annual turnover, whichever is higher. Regulators have already applied data protection law to AI: the Italian authority fined Clearview AI EUR 20 million in 2022 over its facial recognition database, and on 3 September 2024 the Dutch DPA (Autoriteit Persoonsgegevens) imposed a EUR 30.5 million fine on the same company, its largest EU privacy penalty, over a database described as "more than 30 billion photos" converted into unique biometric codes. Italy's Garante also fined OpenAI EUR 15 million on 20 December 2024, the first GDPR fine against a generative AI company, for processing personal data to train ChatGPT without an adequate legal basis, failing transparency obligations and lacking age verification. These are not edge cases; they show that scraping public data, building biometric systems or training large models on personal data all fall squarely within the regime.

Beyond fines, data protection shapes whether a product can lawfully ship at all. A poorly evidenced lawful basis, an undeclared purpose, an unmanaged high-risk use or an inability to honour individual rights can each stop deployment, force retraining or trigger orders to delete data or models. Treating data protection as a design input rather than a late compliance check is the difference between a system that scales and one that is recalled.

How it works

The common principles that travel across regimes

Most modern data protection regimes share a recognisable core. The UK GDPR sets out seven principles in its Article 5: lawfulness, fairness and transparency; purpose limitation; data minimisation; accuracy; storage limitation; integrity and confidentiality (security); and accountability. The OECD Privacy Guidelines, first agreed in 1980 and updated in 2013, express a closely related set of eight: collection limitation, data quality, purpose specification, use limitation, security safeguards, openness, individual participation and accountability. The wording differs but the logic is the same: collect only what you need, for a clear and declared purpose, keep it accurate and secure, hold it no longer than necessary, be open about what you do, and be able to demonstrate compliance. These principles are the durable spine of the field and the right anchor for AI work, because they do not depend on any particular technology.

Personal data, special categories and identifiability

The regime applies to personal data: information relating to an identified or identifiable person. A subset is treated as more sensitive. Under the GDPR's Article 9, "special category data" includes data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data used for the purpose of uniquely identifying a person, and data concerning health, sex life or sexual orientation. Biometric data is defined narrowly: a photograph becomes biometric data only when processed through specific technical means to identify someone. This matters intensely for AI, because facial recognition, voice analysis and similar systems routinely create special category data and trigger stricter conditions.

Lawful basis, profiling and automated decisions

Processing personal data needs a lawful basis (such as consent, contract or legitimate interests). For AI training on personal data, regulators have homed in on whether legitimate interests can apply and under what conditions. Separately, the regime restricts solely automated decisions with legal or similarly significant effects. The GDPR's Article 22 gives individuals the right not to be subject to such decisions except in defined circumstances, with safeguards including the right to obtain human intervention, to express a view and to contest the decision. Profiling and automated decision-making are therefore governed directly by data protection, which is why so much AI governance routes through it.

Accountability tools: by design, DPIAs and records

Accountability is operationalised through specific duties. Data protection by design and by default requires controllers to build the principles into systems from the outset and to default to minimal processing. A Data Protection Impact Assessment (DPIA) is required before processing likely to result in a high risk to people, which expressly includes systematic large-scale evaluation through profiling, large-scale processing of special category data, and the use of innovative technologies such as AI. The DPIA forces an organisation to describe the processing, assess necessity and proportionality, evaluate risks and set mitigations before going live.

Institutions: supervisory authorities and standards bodies

Independent supervisory authorities, commonly called data protection authorities, monitor and enforce the law. They hold investigative, corrective and advisory powers, including the power to order changes, ban processing, order erasure and impose fines. Above national bodies, coordinating institutions such as the European Data Protection Board issue guidance and resolve disputes. Standards and frameworks supplement the law without replacing it: the NIST Privacy Framework is a voluntary tool, deliberately agnostic to any particular technology, sector, law or jurisdiction, that helps organisations manage privacy risk through enterprise risk management.

The international layer

Data protection is increasingly global. The Council of Europe's Convention 108, opened for signature in 1981, was the first binding international treaty in the field; its modernised version, Convention 108+, was finalised in 2018 and adds provisions on biometric data, algorithmic transparency and stronger oversight. Many jurisdictions have aligned their laws with the GDPR through what IAPP principal researcher Muge Fazlioglu describes as a worldwide "Brussels effect", with countries including Brazil, Thailand, China, Saudi Arabia and India enacting their first comprehensive data privacy laws in the 2020s. By the IAPP's Global Privacy Law and DPA Directory, data protection and privacy laws are now in effect in 144 countries, covering roughly 6.64 billion people, about 82 per cent of the world's population. The result is a large and growing patchwork of national regimes that share the same principled core but differ in detail.

Examples

Training a generative model on web data. When AI developers scrape the open web to build models, regulators treat the personal data caught up in that scraping as in scope. The European Data Protection Board's Opinion 28/2024, adopted in December 2024, found that AI models trained on personal data cannot in all cases be considered anonymous, that anonymity must be assessed case by case, and that legitimate interests can be a lawful basis for training only if a three-step necessity and balancing test is satisfied. The UK's ICO has taken a comparable line, indicating that legitimate interests is likely the only available basis for training-data scraping and pressing developers to improve transparency and to honour individual rights.

Deploying facial recognition. Clearview AI built a database of billions of facial images scraped from the web and sold matching services. The Italian authority found this unlawful under multiple principles, including lawfulness, transparency, purpose limitation and storage limitation, and the special category rules on biometric data, and imposed a EUR 20 million fine in 2022 with an order to delete data on people in Italy. On 3 September 2024 the Dutch DPA imposed a further EUR 30.5 million fine, finding violations of GDPR Articles 5(1)(a), 6, 9(1) and 12; it noted that Clearview "has not objected to this decision and is therefore unable to appeal against the fine". This is a worked example of why biometric AI almost always engages the strictest parts of the regime.

A generative AI assistant in the EU. Italy's Garante imposed a temporary block on ChatGPT's processing of Italian users' data on 30 March 2023, then on 20 December 2024 issued a EUR 15 million fine plus a mandatory six-month public-awareness campaign across Italian radio, TV, print and online media. The case turned on classic data protection points: a lawful basis for using personal data to train the model, transparency towards users, and age verification. The example shows how the principles, not any AI-specific statute, framed the enforcement.

Common misunderstandings

"Data protection and the GDPR are the same thing." No. The GDPR is one instrument within a much broader field. Many countries have their own data protection laws, and international instruments such as the OECD Guidelines and Convention 108+ predate and sit alongside the GDPR.

"Data protection is just privacy." They overlap but are distinct. Privacy is a broad right concerning private life; data protection is a specific regime regulating the processing of personal data, treated in EU law as a separate fundamental right with its own detailed rules.

"Data protection is the same as information security." Security (integrity and confidentiality) is only one principle among several. Lawful, fair, transparent and accountable handling is required even when data is perfectly secure.

"There is an AI exemption." There is not. Regulators have stated plainly that AI systems processing personal data must comply, and that even incidental processing counts.

"If a model only outputs statistics, no personal data is involved." Not necessarily. Personal data can remain absorbed in a model's parameters and be extractable through queries, so a trained model is not automatically anonymous.

Risks and boundaries

Data protection is not a complete AI rulebook. It governs personal data, so it does not by itself address safety, accuracy of non-personal outputs, intellectual property, competition or product liability, and it is not a substitute for dedicated AI legislation. It is often misapplied as a box-ticking exercise: a DPIA written after launch, or a privacy notice that nobody can act on, satisfies neither the letter nor the spirit of the regime.

Several questions are genuinely unsettled. The precise conditions under which a trained model is anonymous, and when legitimate interests justifies training on scraped data, remain contested and are being worked out case by case rather than through bright-line rules. Guidance is also moving: regulators have signalled further work on web scraping and on generative and agentic AI, and some national laws are themselves being amended. Enforcement against entities outside a jurisdiction can be legally valid but practically hard to collect.

Specific legal statuses can also change after a decision is issued. A fine that is announced may be appealed, suspended by a court or reduced, so any single enforcement figure should be read as the position at a point in time rather than a settled, final liability.

What to do next

Start from the principles, not the tooling. Map where personal data enters your AI lifecycle: training data, fine-tuning data, prompts and inputs, retrieval sources, logs, and outputs. For each, identify the purpose, the lawful basis and the minimum data actually needed.

Run a DPIA before deployment for any high-risk use, which includes profiling, large-scale special category data and innovative technologies such as AI. Treat it as a design instrument that can change the build, not as paperwork filed afterwards.

Build data protection by design into the architecture: minimise collection, set privacy-protective defaults, enable the identification and deletion of specific individuals' data where feasible, and make it possible to honour access, objection and erasure requests across the system, including training sets.

Get the special category and biometric analysis right early, because facial recognition, emotion or trait inference and similar uses face the highest thresholds and attract the most enforcement. Document your reasoning.

Clarify roles in the supply chain (controller, processor, joint controller), reflect them in contracts, and ask vendors how their systems support your obligations. Keep records so you can demonstrate compliance to a supervisory authority.

Watch the moving parts: track guidance from your relevant authority on training data, scraping and generative AI, and treat named laws and enforcement decisions as current examples that may change.

Related: a DSAR.

Related: IAM.

Related: SAML? A practical guide to federation and enterprise single sign-on.

Related: OAuth? A practical guide to delegated authorisation.

Related: a JWT? A practical guide to token claims, trust and validation.

Related: TLS? A practical guide to encrypted transport and secure connections.

Related: an AI jailbreak.

Have a question or a suggestion, or want to understand how we research and review these guides? Read about our editorial standards and how to reach us.

FAQs

Is data protection the same as the GDPR?

No. Data protection is the broad field of law and practice governing personal data. The GDPR is one instrument that implements it in the EU, and in domesticated form in the UK. Many other countries have their own laws, and international instruments such as the OECD Privacy Guidelines and Convention 108+ also form part of the field.

How is data protection different from privacy?

Privacy is a broad human right concerning private life and freedom from intrusion. Data protection is a narrower, more technical regime focused on how personal data is processed. In EU law the two are treated as separate fundamental rights.

What are the core data protection principles?

Commonly: lawfulness, fairness and transparency; purpose limitation; data minimisation; accuracy; storage limitation; integrity and confidentiality (security); and accountability. The OECD expresses a closely related set of eight principles. These are stable across most regimes.

Does data protection law apply to AI systems?

Yes, whenever they process personal data. Regulators have stated there is no AI exemption. Training data, inputs, outputs, profiling and automated decisions all commonly involve personal data and so fall within the regime.

Can a trained AI model contain personal data?

It can. Regulators have found that personal data may remain absorbed in a model's parameters and be extractable through queries, so a model trained on personal data is not automatically anonymous and must be assessed case by case.

When is a DPIA required for AI?

Before processing likely to result in a high risk to people. This expressly includes systematic large-scale profiling, large-scale processing of special category data, and the use of innovative technologies such as AI.

What is a data protection authority?

An independent supervisory authority that monitors and enforces data protection law, with powers to investigate, order changes, ban processing, require erasure and impose fines. Coordinating bodies issue guidance and resolve cross-border disputes.

What happens if we get it wrong?

Consequences range from corrective orders and bans on processing to deletion orders and large fines, plus reputational harm. Breaches of the basic principles sit in the highest fine tier under the UK GDPR.

Sources