What is AI security?
Security and identity
AI security is the practice of protecting AI systems, the data and models they rely on, and the organisation using them from attack, misuse, leakage and manipulation. It includes ordinary software and cloud security, but also AI-specific risks such as prompt injection, data poisoning, model theft, memorised data leakage, insecure agent actions and hidden risk in third-party models, datasets and tools.
What this means
AI security matters because modern AI systems are not just another application on the network. They combine software, models, prompts, data stores, user conversations, external tools and, increasingly, the ability to take actions. That creates a wider and stranger attack surface than most leaders are used to thinking about.
A normal business system generally does what it was programmed to do unless someone breaks in, changes the code or abuses a weak process. An AI system can also be manipulated through the content it is allowed to read. A document, email, web page, support ticket or chat message can become part of the attack path. If the system can search internal files, call tools or update records, a bad answer can turn into a real business event.
This is why AI security is not only about keeping attackers out. It is also about controlling what the model can see, what it is allowed to do, how much trust you place in its output, and how quickly you can spot and contain failure. For leaders, the practical question is simple: where does this AI get its instructions, where does it get its data, what can it influence, and what is the damage if it is wrong or manipulated?
Why it matters
Traditional cyber security asks familiar questions. Can someone break into the system, steal data, alter records, run code, disrupt service or move laterally through the estate? Those questions still matter for AI. Every AI system still depends on ordinary foundations such as identity, access control, logging, patching, supplier control and secure infrastructure.
What changes is that AI systems introduce a new layer of behaviour between input and action. Large language models and other machine learning systems do not respond through fixed logic alone. They interpret patterns, probabilities and context. That means ordinary-looking content can influence behaviour in ways that are harder to predict, test and lock down than a conventional rules-based system.
What is genuinely new
The biggest shift is that AI systems often blur the line between instructions and data. In a conventional app, a user query, a PDF or a database record is usually treated as data. In an AI system, especially one built on a large language model, that same content may also shape behaviour because the model processes it as language. This is why prompt injection exists and why indirect prompt injection matters so much. The model may not reliably separate trusted guidance from untrusted content.
The second shift is that the data pipeline becomes part of the security boundary. Training data, fine-tuning data, user feedback, retrieved documents and memory stores can all become attack vectors. If someone can manipulate the material an AI system learns from or consults at answer time, they may change behaviour without touching the core application code.
The third shift is that AI systems can be non-deterministic. The same prompt may not always produce exactly the same response, and model behaviour can change after a vendor update, a different retrieval result, a configuration tweak or a small change in context. That does not make them unusable, but it does make assurance harder. Security testing cannot stop at one successful check.
The fourth shift is agency. A model that only drafts text is one thing. A model that can search the web, read inboxes, query internal drives, open tickets, write code, change records or trigger payments is another. Once an AI system can act in external systems, the security question moves from "can it say the wrong thing?" to "can it do the wrong thing?"
The fifth shift is dependency. Many organisations do not build their own models. They buy AI capability through cloud providers, productivity suites, copilots, model APIs, agent frameworks and connectors. That is sensible, but it means the organisation depends on third-party model behaviour, training practices, update cycles, sub-processors and tool ecosystems that may be partly opaque.
What is familiar, but changed
Many AI risks are not brand new. Supplier risk, data leakage, weak permissions, insecure APIs and poor monitoring all existed before generative AI. What changes is speed, scale and ease of abuse. A misconfigured shared drive was already a problem. Put a natural-language copilot on top of it and the same problem becomes simpler to exploit and harder to notice. A weak approval process was already risky. Add voice cloning or an over-permitted finance agent and the same weakness becomes more dangerous.
So AI security is best understood as an extension of cyber security, not a replacement for it. The fundamentals still matter. In fact, weak fundamentals usually become more costly once AI is added. But leaders also need to understand the genuinely new attack surface: models that can be manipulated by allowed inputs, data pipelines as attack paths, probabilistic behaviour, tool-using agents and hidden dependency on third-party model stacks.
How it works
The AI threat map
No single taxonomy covers every AI risk, but two are especially useful. The OWASP Top 10 for Large Language Model Applications gives business and product teams a practical application-level map. MITRE ATLAS gives defenders a living knowledge base of adversary tactics and techniques against AI-enabled systems. Taken together, they help separate headline noise from the threats that matter in real deployments.
Prompt injection, including indirect prompt injection
Prompt injection is the most distinctive threat in many generative AI deployments. It happens when a model is steered by malicious or manipulative instructions in a way the system owner did not intend. A direct prompt injection comes from the user entering text into the interface. An indirect prompt injection comes from untrusted content the model reads as part of its work, such as a web page, file, email, support ticket, note field or retrieved document snippet.
For a leader, the core point is this: if an AI system reads untrusted content and is also trusted to answer questions, reveal information or take actions, then that content may become part of the control path. An attacker does not always need to break into the system. They may only need to get carefully crafted material into a place the model will later read. This is why retrieval-based assistants and agentic systems need extra care. Prompt injection is not just about rude answers. It can lead to data exposure, hidden instruction leakage, unsafe tool use, false summaries and workflow manipulation.
This threat also differs from classic injection problems in one uncomfortable way. There is no clean equivalent of separating data from commands with perfect reliability. Good design can reduce likelihood and cut damage, but leaders should not assume a single filter will make the problem disappear.
Jailbreaks and guardrail evasion
A jailbreak is an attempt to bypass a model's behavioural restrictions. In plain terms, someone tries to get the AI to ignore its rules, reveal hidden instructions, produce disallowed material or carry out a restricted action. Guardrails are the controls used to keep the system within expected boundaries, such as hidden prompts, refusal training, safety classifiers, filters and tool restrictions. Guardrail evasion is the broader category of getting around those controls.
For business readers, the important point is that a jailbreak is not only a public-relations issue. If your assistant is meant to refuse sensitive customer queries, avoid speculative advice, stay within approved workflows or ask for human confirmation before acting, a successful bypass can undermine those business controls. Safety layers help, but they are not a secure boundary on their own. High-trust actions need external controls as well.
Data poisoning of training or fine-tuning data
Data poisoning means tampering with the data used to train, fine-tune or update an AI system so that its later behaviour is skewed. Fine-tuning means extra training on organisation-specific examples or feedback so that a model better matches a local task. If that material is manipulated, the model may become less reliable, more biased, easier to trigger in a certain direction or quietly backdoored.
Poisoning is not limited to frontier model builders. It matters to any organisation that curates a knowledge base, ingests user feedback, retrains on internal content, or allows staff and customers to submit material that later affects model behaviour. A support assistant retrained on bad ticket data, a moderation model updated from manipulated feedback, or a retrieval system fed with unverified documents can all be nudged off course. The business harm may show up as poor decisions, hidden failure, brand damage or security bypass rather than as a dramatic technical incident.
Model theft and extraction
Model theft is the unauthorised acquisition of a model or its valuable behaviour. That can happen through direct theft of model weights or through model extraction, where an attacker uses repeated queries to reproduce enough of the model's functionality to create a useful copy or understand how it behaves. In commercial terms, this can be intellectual-property loss. In security terms, it can also help an attacker study the model offline, discover weak points, bypass protections or infer sensitive details about what the model has learned.
This matters most where a model provides real competitive advantage, contains sensitive decision logic, or is used in a security-sensitive function. Leaders should think about model access, API exposure, logging, rate limits, anomalous query detection and the degree of detail returned to users. If a service reveals too much confidence or internal state, it may offer more material to work with than intended.
Adversarial inputs and evasion attacks
Adversarial inputs are carefully crafted inputs designed to make a model misread what it is seeing. In a language model that may mean text. In other systems it may be an image, a voice sample, a document scan, sensor data or a behavioural pattern. Evasion means pushing the model into the wrong classification or judgement at the moment of use while the input still appears normal or acceptable to a human observer.
This matters beyond chatbots. Fraud scoring, spam detection, biometric checks, document processing, quality inspection, safety monitoring and computer-vision systems can all be affected. A business does not need to understand the underlying maths to govern the risk. It needs to know where the organisation relies on model judgement in place of hard rules, what harm follows from false negatives or false positives, and what fallback exists when model confidence is weak.
Sensitive data leakage and memorisation
AI systems can leak information in more than one way. The obvious way is by being granted access to material they should never see. The less obvious way is by revealing sensitive content through their outputs. That content may come from the current prompt, chat history, retrieved documents, tool responses, hidden instructions or, in some cases, memorised fragments from training data.
This is one reason leaders should avoid treating the model itself as a private vault. Hidden system prompts are not a strong secret store. Chat history can resurface in surprising ways. Staff can paste confidential data into public tools. Research has shown that language models can reproduce memorised training snippets under some conditions. The practical lesson is not that every model will leak everything, but that confidential data should be minimised, access should be scoped, retention should be understood, and outputs should be treated as something that may need filtering and review.
Insecure tool use and excessive agency in agentic systems
An agentic system is an AI system that can decide which tools to call and in what order to pursue a goal. This is where AI security becomes very operational. A model may search internal systems, draft emails, modify records, run scripts, create tickets, approve requests or trigger downstream automations. If permissions are too broad, a manipulated or mistaken model can cause real business damage without any classic network intrusion.
Excessive agency means giving the AI too much freedom, too many actions, or too little supervision for the task. The risk does not only come from malicious prompts. It also comes from ambiguity. A model may misunderstand its goal, choose the wrong tool, loop excessively, use the wrong data source or act confidently on a false premise. Read-only access is safer than write access. Drafting is safer than dispatching. Recommendation is safer than execution. Good architecture keeps those distinctions clear.
Supply chain and third-party risk
Very few AI systems are single products. They are stacks. A business assistant may rely on a foundation model, model host, retrieval layer, document parser, embeddings service, agent framework, safety filter, plugin set, productivity suite, web search provider and cloud platform. Datasets, open-source libraries and fine-tuning services add more layers.
That means AI security is also supply chain security. A weak component, poisoned dataset, compromised library, unvetted plugin, unsafe connector or silent vendor update can change your risk profile. Traditional software supply chain concerns still apply, but AI adds model and dataset provenance, behavioural changes after model updates, and the chance that third-party content alters behaviour at runtime. Leaders should want a clear inventory of what the AI stack actually depends on, what changes without notice, and how the provider communicates incidents and model updates.
How AI changes the wider threat picture
AI security is not only about protecting AI systems themselves. AI also changes the wider threat picture around the organisation. The near-term pattern is not usually a brand new class of attack. It is a force multiplier. Familiar attacks become cheaper, faster, more personalised and easier to run at scale.
Scaled phishing and social engineering
Generative AI makes it easier to produce convincing text in the right tone, language and local style. That matters because phishing often works at the margins. Poor spelling, awkward phrasing and generic wording used to be warning signs. AI lowers that friction. It also helps attackers personalise messages more quickly, summarise public information about targets and sustain conversations for longer. The practical implication is not that everyone will suddenly be fooled, but that message quality alone is no longer a reliable comfort.
Deepfake-enabled impersonation and fraud
Audio and video impersonation raise the risk around payment approvals, executive requests, helpdesk resets and urgent exception handling. A fake voice note from a senior leader, a video call that appears to show a known contact, or seemingly authentic promotional media can all be used to create pressure and credibility. This is especially dangerous in organisations that still rely on urgency, familiarity or seniority as substitutes for verification.
The answer is procedural as much as technical. High-risk requests need independent verification paths, callback rules, dual approval and clear staff permission to slow things down when something feels wrong.
Faster reconnaissance and vulnerability work
AI helps attackers process information at pace. It can assist with code review, configuration analysis, data triage and the summarising of stolen material. For capable actors, it may speed up vulnerability discovery and exploitation planning. For less capable actors, it can lower the barrier to producing basic scripts, improving lures and handling more victims at once.
That does not mean criminals have acquired magical new powers. The strongest gains still tend to appear where attackers already have skill, good data and time. But it does mean defenders lose some of the advantage that came from attacker effort being expensive.
Why this should sharpen, not distort, judgement
Leaders should resist two bad reactions. The first is panic. The second is complacency. AI does not make basic security obsolete. If anything, it makes weak basics more costly. Multi-factor authentication, patching, access reviews, approval controls, shared-drive discipline, supplier management and staff verification habits still block a large share of real harm. AI changes the tempo and scale of attack far more than it changes the laws of security.
Data, identity and access
Data, identity and access are where many AI security issues become real. Most organisations do not suffer harm because a model wrote a clumsy answer. They suffer harm because the model could reach data it should not see, or because it was trusted to act with permissions that were too broad.
Data reach becomes the real issue
An internal assistant can inherit the messiness of the estate it sits on top of. Open folders, stale permissions, poorly labelled confidential files, legacy shared mailboxes and copied data stores become far easier to interrogate once a model can search and summarise them in plain language. The problem is often not "the AI leaked data" as a separate event. The problem is that the AI made already weak data governance instantly more exploitable.
This is especially important for systems that pull live context from document stores, knowledge bases or collaboration tools at answer time. If the source material is overexposed, unverified or badly segmented, the model will reflect those weaknesses. The safest data for AI is data you genuinely need, from sources you trust, under permissions you understand.
Least privilege matters more with AI
Least privilege means giving the minimum access needed for the task, no more. With AI that principle becomes even more important because the system can combine, summarise and repurpose what it sees. A human might never manually browse ten connected systems in one sitting. An AI assistant can do the equivalent in seconds if you allow it.
For leaders, this means insisting on narrow scopes. Start with read-only where possible. Separate read from write. Separate drafting from sending. Separate recommendation from approval. Time-limit credentials where you can. Use role-based access control, or another disciplined permissions model, so the AI only reaches what the specific user or workflow should reach.
Identity for agents, tools and automations
Every meaningful tool-using AI workflow should have a clear identity model. That means knowing which user identity is involved, which service identity the AI uses, which connectors are authorised, and how actions are logged. Shared service accounts and broad inherited permissions are especially risky because they make it hard to know who did what and harder to contain damage.
A practical pattern is one identity per agent or automation, with tightly scoped rights and strong logging. If an assistant helps the sales team draft notes, it should not quietly inherit finance permissions. If an agent can create tickets, that does not mean it should be able to close them, change access rights or spend money.
Human oversight still needs design
Human oversight is often presented as the answer to AI risk. It helps, but only if it is designed properly. A nominal approval step is weak if the reviewer has poor context, too little time or excessive trust in the system. Oversight works best when the AI presents its source basis clearly, the action is reversible where possible, the reviewer has the authority to stop the process, and high-impact decisions require deliberate confirmation rather than passive acceptance.
For mid-sized organisations, a useful rule is simple: the more irreversible, sensitive or regulated the action, the less freedom the AI should have to complete it without an accountable human decision.
Examples
Internal knowledge copilot over shared drives
A mid-sized professional services firm enables an internal assistant across collaboration tools and shared folders. Staff quickly discover they can ask natural-language questions across large amounts of internal material. The benefit is real, but so is the risk. Old permissions, stale HR folders and duplicated finance files become much easier to search. The sensible response is not to turn the tool off forever. It is to clean up access rights first, restrict the assistant to approved sources, segment sensitive areas and log queries against user identity.
Customer support bot with tool access
A company launches a support assistant that reads help-centre content and can create tickets. Customers can also upload screenshots and documents. A hidden instruction in an uploaded file nudges the model away from its intended behaviour. The right control set includes treating uploaded and retrieved content as untrusted, preventing the model from taking sensitive actions without confirmation, keeping tool permissions narrow, and routing edge cases to a human support team rather than letting the assistant improvise.
Finance assistant facing impersonation pressure
A finance manager receives what appears to be an urgent voice message from a senior executive asking for a supplier change and payment release. At the same time, an AI assistant is being piloted to prepare payment instructions from emails and invoices. The risk is not only the fake message. It is the combination of impersonation pressure and an automation path that can move too quickly. A safer design separates message handling from payment execution, requires dual approval, uses known callback routes, and prevents the assistant from releasing funds directly.
HR screening assistant using external AI services
An HR team adopts an AI tool to summarise CVs, interview notes and hiring discussions. The team gains speed, but the material contains sensitive personal data and commercially sensitive assessments. The security issue is not just whether the tool is accurate. It is whether the vendor retains prompts, whether data is reused, who can access the account, what logs exist, and whether the tool is connected to wider systems. Good control here means approved procurement, clear retention settings, minimal data transfer, strict access rights and a rule that final hiring decisions remain with accountable humans.
Common misunderstandings
"AI security is just ordinary cyber security with a new label." Not quite. Traditional controls still matter, but AI adds attack paths through content, data pipelines, model behaviour and tool-using agents.
"We only use off-the-shelf tools, so this is the vendor's problem." Vendor security matters, but your permissions, data handling, integrations, approval flows and staff behaviour still determine much of the real risk.
"Prompt injection is just bad prompting." No. It is a structural risk that comes from letting a model process untrusted content while also trusting it to follow instructions, reveal information or trigger actions.
"Hidden prompts and guardrails keep the system safe." They help, but they are not a strong security boundary. High-impact actions still need external controls such as permissions, confirmations, logging and kill switches.
"If a human reviews the answer, we are covered." Human review helps only when it is well designed. People miss things, get rushed and can over-trust polished AI output.
"Small organisations are too small to worry about this." Many of the easiest harms target weaker controls, not famous brands. Smaller firms can be especially exposed if they adopt AI quickly on top of messy permissions and informal approvals.
Risks and boundaries
Defences and controls
There is no single control that "solves" AI security. The right approach is layered. Some controls sit outside the model, such as access control and approvals. Some sit around it, such as monitoring and tool restrictions. Some sit inside the development process, such as threat modelling and testing. The aim is not perfection. The aim is to reduce likelihood, shrink damage and make failure easier to detect and contain.
Set boundaries before you build or buy
Start by deciding which use cases are suitable for AI and which are not. A drafting assistant for low-sensitivity content is very different from an agent that can change customer records or influence hiring decisions. If the task is high impact, difficult to reverse or heavily regulated, the organisation should set tighter rules from the start. Good boundaries are a security control.
Threat model the whole workflow
Do not threat model only the model. Map the full chain: users, prompts, hidden instructions, retrieved content, files, external tools, APIs, data stores, human approvals and final actions. Ask four questions for each step. What can enter here? What can be influenced here? What can leave here? What damage follows if this step is wrong, manipulated or over-permitted? This brings AI back into a practical risk conversation the business can govern.
Treat inputs, retrieved content and model outputs as untrusted
Many AI incidents start because organisations trust model input or output too early. Untrusted content should stay untrusted even when the model has read it. Retrieved text, user uploads, website content and tool responses should be validated, constrained and, where necessary, sanitised before they affect sensitive actions. Likewise, model output should not be executed, rendered with elevated trust or passed into downstream systems without checks.
This is one reason prompt injection filtering on its own is not enough. A better pattern is to assume some malicious or misleading content will get through and make sure the surrounding system limits what happens next.
Constrain tools, actions and permissions
If an AI system can act, constrain what it can act on. Use allowlists for tools. Limit network access. Prefer read-only connectors first. Require human confirmation for irreversible actions. Split planning from execution where possible so the model can suggest but not directly complete a sensitive step. Make sure spending, account changes, privileged access, external publication and customer-facing commitments have explicit control points.
Test with adversarial pressure, not only happy paths
AI systems need more than ordinary user acceptance testing. They need structured adversarial testing, often called red-teaming, that tries to surface prompt injection, jailbreaks, leakage, unsafe tool use, misleading retrieval, edge-case failures and permission boundary breaks. This testing should happen before launch and after meaningful changes. If your provider updates the model, your security assumptions may need retesting even if your own code did not change.
Monitor behaviour, inputs and drift
Logging is essential because AI failures may look subtle before they look dramatic. Monitor unusual prompt patterns, repeated refusals, sudden behaviour shifts, spikes in tool use, unexpected cost growth, suspicious file access, anomalous connector activity and output that indicates hidden instruction leakage or excessive confidence. Monitoring should focus not only on availability but on changes in behaviour over time.
Manage vendors and the AI supply chain with discipline
Vendor due diligence should go beyond "is this a reputable name?" Ask what model is used, whether your data is retained or used for training, what sub-processors are involved, how permissions work, what logs are available, how incidents are handled, how model changes are communicated, how plugins and connectors are isolated, and whether you can disable features you do not need. A good provider reduces risk. It does not remove your part of it.
Prepare incident response for AI-specific failure
AI incidents do not always look like ordinary breaches. You may need to respond to a manipulated assistant, a leaking prompt, a runaway agent, a poisoned knowledge base or a harmful model update. Prepare for this in advance. Know how to disable a connector, revoke a token, turn off a tool, pull a model from service, preserve logs and communicate clearly to users. A simple kill switch is often more valuable than a sophisticated dashboard if something starts going wrong.
Train staff for the new failure modes
Staff awareness still matters, but it has to fit AI. People should know when they are allowed to paste data into AI tools, which tools are approved, how to verify unusual AI-generated requests, how to spot impersonation pressure, and when to escalate suspicious model behaviour. Managers also need enough understanding to challenge vendors, ask sensible questions and avoid approving risky deployments by accident.
Governance, standards and regulation
AI security sits inside wider AI governance and enterprise security. It is not a side topic for the technical team to sort out later. It affects procurement, legal commitments, data protection, operations, product design, risk ownership and board oversight. The practical task for leadership is to make AI security part of normal governance, with clear owners, clear escalation and clear rules on acceptable use.
Use lifecycle guidance, not one-off checks
A strong starting point is the joint guidance on secure AI system development used across government and security agencies. Its value is not that it offers a clever trick. Its value is that it treats AI security as a full lifecycle issue, from design and build through deployment and ongoing operation. That is the right mental model for a business reader. AI systems change after launch, and their risk often grows through integrations, data expansion and quiet model updates.
Use an enterprise risk framework for AI, not ad hoc judgement
A voluntary enterprise framework helps organisations avoid making AI decisions from hype, urgency or vendor confidence alone. The NIST AI Risk Management Framework is useful here because it gives a structured way to place AI risk within organisational governance, context-setting, assessment and ongoing management. Its generative AI profile adds more specific guidance for the kinds of risks that show up in large language model systems and content-generation tools.
For a mid-sized organisation, this does not mean creating a giant bureaucracy. It means having a repeatable method for deciding what data an AI system can use, what actions it can take, what testing it needs, what controls are mandatory and who signs off on the risk.
Use a common language for adversarial machine learning
Security teams often struggle because AI risk language is inconsistent. "Jailbreak", "evasion", "poisoning", "indirect prompt injection", "model extraction" and "privacy compromise" are sometimes used loosely. The current NIST adversarial machine learning work helps by giving a more disciplined taxonomy across attacker goals, attack stages and mitigations. MITRE ATLAS helps in a similar way from a threat-informed angle by cataloguing tactics and techniques against AI-enabled systems. Together they are useful for threat modelling, testing and cross-team discussion.
Use management-system discipline if you need repeatability
ISO/IEC 42001 matters because it turns AI governance into a management-system question rather than a collection of disconnected good intentions. It is designed for organisations developing, providing or using AI-based products and services, and it supports continuous improvement. For many businesses, especially those dealing with larger customers or regulated environments, this matters because AI security has to become repeatable, auditable and maintainable, not merely well-meaning.
It is also helpful to see AI governance and information security together. AI security does not replace existing security management standards. It sits alongside them and adds AI-specific discipline where the old controls are too broad on their own.
Understand what the EU AI Act expects
The EU AI Act is not only a legal topic for model developers in Silicon Valley. It matters to businesses that sell into Europe, buy AI capability from providers that operate there, or deploy systems in use cases that fall into its scope. As at 3 June 2026, the picture is staged and partly in motion. Some provisions are already in force, including the prohibited practices and AI literacy duties (from 2 February 2025) and the general-purpose model and governance rules (from 2 August 2025). The main high-risk obligations were set for 2 August 2026, but a provisional political agreement reached in May 2026 under the Digital Omnibus would defer them, with the use-based Annex III obligations moving to 2 December 2027 and the product-embedded Annex I obligations to 2 August 2028. That agreement takes legal effect only on formal adoption and publication, so until then the original dates technically still govern. The direction of travel is already clear.
For high-risk AI systems, the security-relevant expectations include risk management, governance of datasets, documentation, logging and traceability, user information, human oversight, and requirements around accuracy, robustness and cybersecurity across the lifecycle. Providers remain responsible through the lifecycle, and deployers have duties around use in line with instructions, monitoring, oversight and acting on risks or serious incidents. For some general-purpose AI models with systemic risk, providers also need to assess and mitigate systemic risk, report serious incidents and ensure adequate cybersecurity of the models and their physical infrastructure.
Even where a business is not directly subject to a specific AI Act obligation today, the Act is still useful as a signal. It shows what serious markets increasingly expect: structured risk management, transparency, control over data and strong security around AI capability that can affect people, operations or rights.
UK organisations still need globally credible practice
A UK business does not need to wait for a single domestic AI law to take AI security seriously. Practical UK guidance already exists, and customers, partners and insurers are likely to expect mature practice regardless of jurisdiction. The commercial reality is simple. If your AI use is weakly governed, buyers and partners will eventually treat that as a business risk even before a regulator does.
What to do next
1. Find where AI already exists. Do not start with policy language. Start with an inventory of approved tools, pilots, vendor add-ons and shadow use across teams.
2. Sort use cases by two questions: what data can this AI reach, and what can it influence or do? These two questions usually tell you more about risk than the marketing category.
3. Set a simple baseline rule. Low-risk drafting can move faster. Any system that touches confidential data, regulated activity, customer commitments, employment decisions, code deployment, access rights or payments needs formal review.
4. Fix permissions before you connect AI to internal content. Review shared drives, mailboxes, CRM roles and service accounts. An AI assistant will expose weak access control faster than a normal interface will.
5. Start with least privilege. Keep early deployments read-only where possible. Separate drafting from sending, and recommendation from execution. Give every agent or automation a scoped identity and turn on audit logs.
6. Test the real failure modes before launch. That means prompt injection, leakage, over-broad retrieval, unsafe tool use, impersonation pressure and edge-case prompts, not just normal user journeys.
7. Put operational controls in place. Know how to disable connectors, roll back model access, revoke credentials, preserve logs and contact the vendor if behaviour changes suddenly.
8. Train managers and front-line staff. They need to know which AI tools are approved, what data must stay out of them, how to verify suspicious requests and when to escalate unusual behaviour.
9. Make one person accountable for each meaningful AI deployment. Shared enthusiasm is not the same as ownership. Every system needs a named risk owner.
10. Review regularly. AI estates change quickly, especially when vendors add features by default. Quarterly review is a sensible minimum for most mid-sized organisations.
FAQs
What is AI security in one sentence?
It is the discipline of protecting AI systems, their data, their models and the business using them from deliberate attack, misuse, leakage and unsafe behaviour.
Doesn't our existing cyber security already cover this?
It covers part of it, and the basics still matter. What it often misses is the AI-specific attack surface: prompt injection, data poisoning, model extraction, memorised data leakage, probabilistic behaviour, tool-using agents and dependence on third-party model providers.
If we only use big-name AI providers, are we safe?
No. A strong provider can reduce some infrastructure and model risk, but it cannot fix your access model, your data sprawl, your approval process, your unsafe connectors or your staff putting the wrong information into the tool.
Are small organisations really at risk?
Yes. Smaller firms may not build frontier models, but they often adopt AI through productivity tools, assistants and vendors connected to real business data. Weak permissions, informal approvals and low monitoring can make the damage sharper.
Is prompt injection just another name for SQL injection?
No. The comparison is useful only up to a point. Prompt injection comes from the model's difficulty in reliably separating trusted instructions from untrusted content, which makes the problem harder to eliminate cleanly.
Do we need a separate AI security team?
Not necessarily. Many mid-sized organisations are better served by a cross-functional approach that brings together IT, security, data, legal, procurement and the business owner of each deployment. Clear ownership matters more than a new org chart.
Is AI security only about chatbots?
No. It also applies to fraud models, document processing, computer vision, biometrics, coding assistants, recommendation engines, copilots and autonomous or semi-autonomous agents linked to business systems.
Are agentic systems riskier than ordinary assistants?
Usually, yes. The ability to choose tools and take actions increases both value and exposure. The main issue is not only what the AI says, but what it is allowed to do when it is wrong or manipulated.
Can human approval make agentic AI safe enough?
Human approval helps, but only if it is meaningful. A rushed reviewer with poor context can become a rubber stamp. High-impact actions need strong permissions, clear evidence, reversible steps where possible and reviewers with time and authority.
Should we stop using AI until the technology matures?
Usually no. A better approach is proportional adoption. Start with lower-risk use cases, narrow permissions, good data discipline, clear ownership and strong review for higher-risk actions. The answer is controlled use, not blind speed or blanket bans.
Sources
Guidelines for secure AI system development (NCSC and CISA). Joint multi-agency lifecycle guidance for secure AI design, development, deployment, monitoring and incident handling, including its use by NCSC and CISA.
Understanding adversarial attacks against Machine Learning and AI (NCSC). Current attack classes across the AI lifecycle, including poisoning, evasion, prompt injection, agent security and supply chain manipulation.
Impact of AI on cyber threat from now to 2027 (NCSC). Evidence-based framing of how AI changes the wider threat picture, especially phishing, social engineering, vulnerability research and exploit development.
AI Risk Management Framework (NIST). Voluntary enterprise framework for managing AI risk across design, development, use and evaluation, and NIST's main AI RMF entry point.
Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile (NIST). Generative AI specific companion guidance for identifying and managing distinct GenAI risks within the AI RMF.
AI 100-2e2025, Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations (NIST). Current NIST terminology and taxonomy for adversarial machine learning, including attacker goals, attack stages, attack types and mitigations.
OWASP Top 10 for Large Language Model Applications (OWASP). Recognised application-level framing for prompt injection, sensitive data disclosure, supply chain risk, data and model poisoning, improper output handling, excessive agency and related GenAI risks.
MITRE ATLAS (MITRE). Living knowledge base of adversary tactics and techniques against AI-enabled systems, useful for threat modelling and linking business risks to attacker behaviour.
ISO/IEC 42001:2023 - Information technology - Artificial intelligence - Management system (ISO). International AI management system standard for establishing, maintaining and continually improving AI governance and risk management.
Navigating the AI Act (European Commission). Official EU explanation of the AI Act's scope, high-risk system obligations, transparency duties, GPAI obligations, cybersecurity expectations and applicability dates.
Extracting Training Data from Large Language Models (USENIX). Evidence that language models can reveal memorised training snippets, including personal information and other sensitive material, under some conditions.
Criminals Use Generative Artificial Intelligence to Facilitate Financial Fraud (FBI Internet Crime Complaint Center). Official evidence on AI-enabled fraud, including realistic phishing text, fake identities, vocal cloning and deceptive video used in impersonation and financial scams.
