Diagram showing the machine learning lifecycle with attack surfaces at data, model, platform and deployment stages

What is adversarial machine learning?

Privacy, security and identity

Adversarial machine learning, often shortened to AML, is the field concerned with attacks that exploit the way machine learning systems are trained, queried, or deployed. It is the umbrella concept for AI specific attacks such as evasion, data poisoning, privacy extraction, and some prompt based attacks against generative systems. For leaders, it matters because it turns AI reliability into a security question: the model, its data, and its surrounding workflow can all be manipulated.

Reviewed by Jackie, Head of Learning & Development, Levellers · Last reviewed 8 June 2026

What this means

Traditional cyber security often focuses on servers, accounts, networks and code. Adversarial machine learning focuses on something more specific: the weaknesses created by the fact that a system learns from data and makes statistically driven predictions or generations.

A useful way to think about it is this. Ordinary software does what it has been explicitly programmed to do. Machine learning systems infer patterns from data. That gives them flexibility, but it also creates unusual attack paths. A malicious actor may be able to change the model's input, influence the data it learns from, probe it to learn sensitive information, or exploit the way it handles natural language and context.

AML is therefore a broad umbrella, not one single attack. In older predictive systems it includes things like evasion attacks, poisoning attacks, and privacy attacks such as extracting information from a model. In generative systems it also extends to attack classes such as direct prompting and indirect prompt injection. Taxonomies are still evolving, and different frameworks group attacks slightly differently, but the overall idea is stable: the behaviour of the ML system is being manipulated through ML specific weaknesses.

This matters because many organisations now trust models with work that affects money, security, customer interactions, content, fraud detection, logistics, and internal knowledge. If those systems can be manipulated, you do not just have a model problem. You have a business control problem.

Why it matters

Adversarial machine learning matters because it sits at the intersection of trust, automation, and attack surface. The more an organisation relies on a model, the more attractive that model becomes as a target.

If a fraud model can be gamed, losses rise. If a computer vision model can be misled, safety and quality controls weaken. If a language model can be manipulated into leaking or acting outside policy, internal controls weaken. If a recommendation or ranking model is skewed, commercial behaviour changes. In each case, the model does not need to be "hacked" in the old sense for damage to happen.

Leaders should also care because AML cuts across teams. It is not owned by data science alone. Security teams, platform teams, risk owners, procurement, and business operators all influence the exposure. A strong model with poor access control, weak data governance, or no incident path is still a weak system.

How it works

Adversarial machine learning can affect a model at different stages of its life cycle. That is one reason the term is broad. The threat may arise before training, during training, at deployment, or after release through repeated queries and probing.

One major category is training time manipulation. Here the attacker influences what the model learns. That may involve poisoning training data, corrupting labels, altering the training process, or introducing hidden behaviours that only appear later. Data poisoning is the best known example and deserves its own treatment, but it makes sense as part of the wider AML family.

A second major category is deployment time manipulation. Here the model is already trained, but an attacker crafts inputs that lead to wrong or unsafe behaviour. In predictive AI this can mean altered images, text, signals or other inputs that cause misclassification or degraded performance. In language models it often appears as prompt based attacks that try to alter how the model interprets instructions and context.

A third category is privacy compromise. Attackers may probe the system to infer what data was used in training, reconstruct sensitive information, or learn enough about the model to imitate or extract it. These attacks matter especially when the model has been trained on sensitive or proprietary material, or when the service exposes rich query interfaces.

A fourth category involves resource and system abuse. Some attacks aim less at wrong answers and more at wasting compute, reducing availability, or creating conditions in which the system becomes unreliable or expensive to operate.

Generative AI has broadened the picture further. NIST's current taxonomy distinguishes attacks on predictive systems from attacks on generative systems, while still treating them inside one adversarial ML field. That matters because many board level conversations focus only on chatbots. In reality, AML also applies to computer vision, audio systems, ranking systems, cyber defence models, anomaly detection, and industrial ML.

It is also important to separate AML from ordinary cyber weaknesses. If an attacker steals credentials to access a model server, that is a serious security problem, but it is not necessarily an AML attack. Many current frameworks now draw a distinction between traditional cyber attacks on the surrounding environment and attacks that specifically exploit the learning or inference properties of the model. Both matter. They are just not identical.

For leaders, the practical view is that an ML system has several attack surfaces at once.

There is the environment, meaning infrastructure, networks, compute and storage.

There is the platform, meaning the application code, serving stack, APIs, and orchestration.

There is the model, meaning the learned artefact itself, including its behaviour, weights, prompts or configuration.

There is the data, meaning training, tuning, evaluation, retrieval and monitoring datasets.

AML becomes easier to manage when you see all four together. A secure model on an insecure platform is still exposed. A well controlled API using untrusted training data is still exposed. A carefully guarded assistant with excessive tool permissions is still exposed.

Defence therefore has to be system level. Current guidance from standards and security bodies tends to converge on a few durable ideas.

Build security in from the start. Do not bolt it on after deployment. Threat model the data flows, training inputs, model exposures, tools, and user roles before scale up.

Treat supply chain integrity as fundamental. Know where your models, datasets, labels, libraries and artefacts came from. Verify them, version them, and keep records.

Apply least privilege. Models and agents should not have broad permissions just because broad permissions are convenient.

Test adversarially. Use red teaming, robustness testing, and continuous evaluation to see how the model behaves under pressure, not just under ideal prompts and benchmark tasks.

Monitor in production. Look for changes in input patterns, behaviour drift, unusual resource usage, leakage indicators, and new failure modes.

Plan response and recovery. If a model shows compromised behaviour, teams should know how to contain it, roll back, preserve evidence, and notify the right owners.

One further point is worth stressing. The taxonomy is still moving. Different organisations classify the same phenomenon in slightly different ways. NCSC has recently proposed seven attack classes to help security teams think more clearly. NIST maintains a broader taxonomy that spans predictive and generative systems. MITRE ATLAS provides a living knowledge base of adversary tactics and techniques. This is not a sign that the field is confused beyond use. It is a sign that the field is maturing quickly. Leaders should respond by insisting on plain definitions in internal policy rather than pretending every term is settled forever.

Examples

A finance team uses a model to flag anomalous transactions. If a malicious actor learns how the model reacts to certain patterns and adjusts behaviour accordingly, the defensive model can be sidestepped even without access to the underlying code.

A manufacturing or logistics team uses computer vision to classify items or detect defects. If the model is vulnerable to altered inputs, quality checks can be weakened and operational decisions skewed.

A customer support organisation deploys a language model connected to knowledge sources and internal tools. Here AML is not just about toxic outputs. It includes prompt based manipulation, indirect instructions hidden in content, privacy compromise and tool misuse.

A security team uses ML to rank alerts. If an adversary can bias what the model sees or infer how it scores, the model may stop helping defenders and start helping the attacker.

Common misunderstandings

One misunderstanding is that adversarial ML is only about strange image tricks against computer vision. That is a classic example, but the field is much broader and now clearly includes generative systems.

Another is that this is only relevant to frontier or public AI tools. In practice, ordinary enterprise models can be exposed if they shape real decisions or interact with sensitive data.

A third is that stronger models automatically fix the problem. Better models can improve resilience in some cases, but they do not remove the underlying attack surface created by learning systems.

A fourth is that AML replaces standard cyber security. It does not. Standard cyber controls remain foundational. AML adds model specific concerns on top.

Risks and boundaries

Not every organisation needs the same depth of AML programme. The right level depends on how central the model is, what harm it could cause, and how much access it has. Over engineering controls for a low risk internal drafting tool can waste effort. Under engineering them for a fraud, identity, safety, or agentic system is harder to justify.

It is also important to accept that some defences remain immature. Testing methods, detection methods, and robustness claims can vary widely. Leaders should prefer evidence from realistic internal evaluation over broad marketing statements.

This article is a practical explainer, not professional security advice. If an ML system affects money movement, physical safety, regulated advice, or access to sensitive data, specialist review is sensible.

What to do next

First, identify which ML systems actually matter. List the models that influence decisions, automate actions, or touch sensitive data. These are your priority AML assets.

Second, threat model each one across the life cycle. Ask what could go wrong at training, tuning, deployment, querying, monitoring and update stages.

Third, map dependencies. Record datasets, third party models, artefact stores, serving infrastructure, tool connections, and who can change what.

Fourth, set baseline controls. Access control, versioning, provenance, logging, deployment review, rollback mechanisms, and incident ownership should exist before scale up.

Fifth, test beyond normal use. Run adversarial evaluation, not just happy path testing. Include attempts to degrade performance, leak information, skew behaviour, or misuse connected actions.

Sixth, put monitoring into operations. Track drift, unusual inputs, resource abuse, and behaviour changes after updates or data refreshes.

Finally, ask suppliers direct questions. What AML testing do they do, what risks do they cover, how often do they retest, how do they handle disclosure, and what evidence can they provide?

Have a question or a suggestion, or want to understand how we research and review these guides? Read about our editorial standards and how to reach us.

FAQs

Is adversarial machine learning only about generative AI?

No. It applies to predictive and generative systems alike. Generative AI has made the topic more visible, but AML also covers vision, fraud, ranking, speech and other machine learning applications.

What sits under the AML umbrella?

Common examples include evasion attacks, poisoning attacks, privacy compromise, model extraction, and some prompt based attacks in generative systems. Exact taxonomy varies by framework.

Is prompt injection the same thing as AML?

Prompt injection is generally treated as one class of attack inside the broader AML landscape for generative systems. AML is the wider parent concept.

Do standard cyber controls still matter?

Very much so. Identity, network security, logging, patching, supply chain controls and incident response are still foundational. AML adds model specific concerns on top.

How do we know whether AML is a board issue?

It becomes a board or senior leadership issue when ML systems shape material decisions, interact with customers, hold sensitive data, or have enough privileges that failure could create major harm.

Is the field settled?

No. Terminology and taxonomies are still evolving. That is why organisations should define terms clearly in policy and focus on practical risk reduction rather than perfect vocabulary.

Sources

Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (National Institute of Standards and Technology). Primary. Core taxonomy for adversarial ML across predictive and generative systems, including evasion, poisoning, privacy, direct prompting and indirect prompt injection.
Understanding adversarial attacks against Machine Learning and AI (National Cyber Security Centre). Primary. Supports the distinction between ML specific attacks and broader cyber attacks, the seven attack classes, and the point that terminology remains overloaded in places.
Machine learning principles (National Cyber Security Centre). Primary. Supports the secure by design framing that ML vulnerabilities must be considered throughout the system life cycle.
SAFE-AI: A Framework for Securing AI-Enabled Systems (MITRE). Primary. Supports the model, data, platform and environment view of AI enabled system security and the need for AI specific controls.
Securing Machine Learning Algorithms (ENISA). Primary. Corroborates the broader European security view that ML systems face threats including data poisoning, adversarial attacks and data exfiltration.
Guidelines for secure AI system development (National Cyber Security Centre). Primary and corroborative. Supports the secure design, development, deployment and maintenance framing for AI systems.

‹ What is an AI jailbreak?

What is data poisoning? ›