A data lead classifying personal identifiers and workflow data before using an AI tool.

What is PII?

Privacy, security and identity

PII means personally identifiable information. It is widely used in security, cloud and US-style privacy language to describe information that can identify, distinguish or be linked to an individual. In UK and EU data protection work, the more important legal term is usually personal data. For AI workflows, the practical point is to identify information that relates to an identifiable person before it is copied into tools, prompts, logs, datasets or outputs.

Reviewed by Jackie, Head of Learning & Development, Levellers · Last reviewed 8 June 2026

What this means

PII means personally identifiable information. The term is common in security policies, cloud contracts, technical documentation and US-oriented privacy material. NIST definitions describe PII as information that can distinguish or trace an individual's identity, either alone or combined with other linked or linkable information.

UK and EU data protection practice usually centres on "personal data", not PII. The ICO explains personal data as information that relates to an identified or identifiable individual. That includes obvious identifiers, such as name and email address, and less obvious identifiers, such as identification numbers, location data, IP addresses, cookie identifiers and combinations of information that can identify someone indirectly.

For Levellers-style AI work, the safest practical approach is to treat PII as a useful security shorthand, but use personal data for UK GDPR analysis. The question is not only "does this field identify someone on its own?" It is also "could this information relate to an identifiable person in this workflow, with this context and these linked records?"

Why it matters

AI makes terminology discipline more important because information moves quickly between documents, prompts, databases, retrieval indexes and outputs. A name in a spreadsheet is easy to spot. A customer reference embedded in a transcript, a support summary, a location trace, a device identifier or a free-text note may be less obvious.

If staff think PII only means direct identifiers such as passports or bank account numbers, they may paste personal data into tools without recognising it. If leaders treat every data point as equally sensitive, they may block useful low-risk workflows. The aim is to classify data clearly enough to apply proportionate controls.

Good classification helps with data minimisation, DSARs, ROPA, vendor review and security design. It also helps teams understand when anonymisation is real and when data is only pseudonymised or masked.

How it works

A practical classification model should separate direct identifiers, indirect identifiers, sensitive context and linked records. Direct identifiers include names, email addresses, telephone numbers, national identifiers, account numbers and biometric records. Indirect identifiers include combinations such as job title plus employer plus location, or event data that can be linked back to an individual.

The ICO's personal data guidance stresses context. The same information may be personal data for one controller and not for another, depending on whether it relates to an identifiable individual and what other information is reasonably available. Pseudonymised data remains personal data if it can be attributed to a person using additional information. Truly anonymous data is outside the UK GDPR, but anonymisation must be real.

In AI workflows, classification should happen before data is used. Teams should ask what personal data is necessary, whether it can be removed, generalised, masked, kept local or replaced with synthetic examples, and whether the AI vendor will retain or use the data for other purposes.

A good AI data classification process should also consider where the data appears after the initial use. Personal data can move into prompt history, cached context, vector databases, exports, audit logs, screenshots and generated drafts. A field may be removed from the prompt but still remain in the source document or retrieval index. Teams should therefore classify the workflow, not just the single text box the user sees.

Examples

A support ticket contains a customer name, address, complaint detail and order number. The name and address are obvious identifiers, but the complaint text may also be personal data because it relates to the customer. If the ticket is summarised by AI, the summary may also contain personal data.

A sales operations team exports CRM notes for lead scoring. Individual names, emails and phone numbers are direct identifiers. Job titles, company names, territories, interaction history and scoring labels may also relate to identifiable contacts in context.

A product team uses call transcripts to find common issues. If transcripts include names, voices, account details or unique circumstances, they should be treated as personal data unless properly anonymised. Masking names alone may not be enough if the remaining story identifies the person.

An internal search assistant indexes policy documents and HR guidance. The documents may look generic, but embedded examples, disciplinary records, sickness notes or named approvals can create personal data exposure inside the retrieval system.

Common misunderstandings

PII and personal data are identical terms. They overlap, but they are not always used in the same way. UK GDPR analysis should normally use the term personal data.
Only names and email addresses count. Indirect identifiers, online identifiers, location data and linked records can also identify people.
Pseudonymised data is anonymous. It is not. The ICO says pseudonymised data remains personal data for UK GDPR purposes.
Free text is low risk because it has no fixed fields. Free text can contain names, health details, complaints, opinions, addresses, reference numbers and other personal data.
AI redaction is always reliable. Automated detection can help, but it can miss context, unusual names, combined identifiers and data embedded in documents or images.

Risks and boundaries

The main risk is under-classification. If teams label only a narrow set of fields as PII, they may leave personal data in prompts, exports, logs and training examples. The second risk is false anonymisation. Removing obvious identifiers may not be enough if the remaining data can still identify the person through context or linkage.

There is also a boundary between privacy and security language. Security teams often use PII to decide handling controls. Data protection teams need to decide whether UK GDPR applies, what lawful basis is used, what rights apply and whether additional safeguards are needed. The two views should support each other rather than compete.

For sensitive or high-impact workflows, classification should not be left to users guessing at the point of prompt entry. Use approved data categories, examples, tool restrictions and review steps. This article is not legal advice and does not replace a data protection assessment.

Another risk is treating PII detection as purely technical. Automated redaction tools are useful, but they can struggle with local formats, unusual names, context, images, handwriting and combinations of facts that identify someone only to people inside the organisation. Human review and workflow limits are still needed where the consequence of exposure is material.

What to do next

Create a short data classification guide for AI use. Include examples of direct identifiers, indirect identifiers, special category data, confidential business data and data that should never be entered into unapproved tools. Make the examples specific to your workflows, not generic privacy training.

Then add controls at the points where data moves. For prompts, use rules and templates. For uploads, use approved locations and redaction steps. For RAG or search, review what is indexed and who can retrieve it. For vendors, check the DPA and retention settings. The aim is to reduce unnecessary personal data before it enters the AI workflow.

Add examples to the AI policy that people can recognise immediately. For instance: do not paste full customer complaints into open tools; remove names and account references before asking for wording help; do not upload HR files unless the tool and workflow are approved; treat transcripts as personal data until reviewed. Clear examples reduce reliance on staff interpreting abstract privacy language under pressure.

For leaders, the useful habit is to ask for concrete examples from each team. What personal data appears in sales notes, complaints, transcripts, invoices, HR records and screenshots? Which examples are safe for an AI tool, which need masking, and which should stay out completely? This turns classification into a practical operating rule rather than a theoretical definition.

Have a question or a suggestion, or want to understand how we research and review these guides? Read about our editorial standards and how to reach us.

FAQs

Is PII a UK GDPR term?

Not usually. UK GDPR and ICO guidance focus on personal data. PII remains useful as a security and technical shorthand, but it should not replace UK personal data analysis.

Is an email address PII?

Usually yes, if it identifies or can be linked to an individual. It may also be personal data under UK GDPR if it relates to an identified or identifiable person.

Is an IP address personal data?

It can be. The ICO includes online identifiers such as IP addresses and cookie identifiers as possible personal data, depending on context.

Is masked data safe to use in AI tools?

It depends on the quality of masking and the remaining context. Pseudonymised or redacted data can still be personal data if people remain identifiable.

What is the best first control?

Stop unnecessary personal data entering the workflow. Data minimisation is usually more reliable than trying to clean everything after it has spread through tools, logs and outputs.

Sources

‹ What is OCR?

What is RBAC? ›