Example AI system card showing system scope, safeguards, evaluations, mitigations, and monitoring commitments

What is a system card?

Governance, risk and assurance

A system card is a transparency document for a deployed AI system, not just for the model inside it. It usually describes the system's purpose, architecture, safety testing, known limitations, mitigations, release reasoning, and monitoring commitments in the context in which people will actually use it. For leaders, the value is practical: a system card helps you assess not only what a model can do, but how the surrounding product controls, policies, tools, and workflows change real world risk.

Reviewed by Jackie, Head of Learning & Development, Levellers - Last reviewed 8 June 2026

What this means

A system card is the wider operating brief for an AI system. If a model card tells you about one model, a system card tells you about the system built around that model or models. That usually includes the model itself, but also the prompt setup, tool use, retrieval components, interfaces, safety classifiers, access controls, human review steps, and deployment boundaries.

This matters because people do not usually buy or use a naked model. They use a chatbot, a coding assistant, a voice assistant, a moderation workflow, or some other packaged system. Real world risk often comes from that full configuration rather than from the model weights alone.

A good mental model is this. A model card is the engine sheet. A system card is the aircraft manual for a defined operating mode. It explains the machine, the safeguards, the conditions of use, and the main known hazards.

It is also worth saying clearly that system cards are still an emerging convention as of mid 2026. Major providers use the term, but there is no single settled standard that every organisation follows in the same way. That makes the term useful, but also slightly slippery. Leaders should therefore ask what a given provider includes in its system card, and what is left elsewhere.

Why it matters

System cards matter because most high value AI use is system level, not model level. A model may look strong in isolation, but the deployed system can become unsafe or unreliable because of tool permissions, retrieval quality, voice design, user interface choices, agentic behaviour, or weak monitoring.

For procurement, that means a model card is necessary but not sufficient. If you are buying an assistant that can browse, call tools, generate code, speak with users, or act on sensitive information, you need to know how the whole system was tested and constrained. A system card is one of the few public documents that may show this in a structured way.

For internal governance, a system card helps create shared understanding between engineering, operations, risk, security, and leadership. It tells people what was assessed before release, what assumptions were built in, which mitigations are doing important work, and which limitations still require operational caution.

How it works

A system card usually begins by defining the system and its scope. What is being documented. Which model or models are included. Which interfaces or modalities are in scope. Which release channel is being described. This sounds obvious, but system boundaries are often blurred. If the card does not define scope well, it becomes hard to tell whether a given risk was actually tested.

The next part often describes the architecture at a useful level. Not every proprietary detail, but enough to show the main components. For example, a system card may explain that a conversational assistant uses a base model, a post trained policy layer, moderation models, retrieval, product rules, logging, and human escalation paths. This is where it starts to differ clearly from a model card.

Then comes capability and safety evaluation in context. Strong system cards do not just say that the model was benchmarked. They explain how the deployed system was tested. That can include red teaming, misuse testing, domain specific evaluations, multimodal checks, agentic tool use testing, and third party assessments. The important point is that testing is tied to how the system will actually be used, not just to generic benchmark tasks.

Mitigations are another core section. A system card should identify which safeguards matter in practice, such as policy tuned refusals, content filters, access gating, monitoring, prompt constraints, tool restrictions, human review, rate limits, or release staging. This is crucial for leaders because it shows whether apparent safety depends on a layered system design or on the model alone. In many modern AI products, the answer is layered design.

A useful system card also reports known limitations, residual risks, and what was not tested. This is one of its most valuable functions. If a provider says a system was tested for text, image, and voice, was that full release quality testing in all three, or mainly text with limited testing elsewhere. If a coding agent was evaluated in a sandbox, what happens when it is connected to production tools. If a system can use computer interfaces, how is prompt injection handled. A mature card should help a reader see these boundaries.

Release and monitoring information often follow. Why was the system released in its current form. What internal threshold or policy was used. What monitoring continues after launch. How are incidents or newly discovered risks handled. This matters because AI systems do not stand still. A system card should not read like a timeless claim of safety. It should read like a dated assessment made under stated conditions, with an expectation of revision.

For leaders, the practical reading strategy is simple. Ask three questions while you read. First, what exactly is being documented. Second, which safeguards are essential to the claimed behaviour. Third, how close is this documented system to the one you will actually deploy or buy. If there is a large gap, you need your own system level assessment as well.

Examples

A customer service assistant may use a powerful language model, but the real risk profile depends on the surrounding retrieval system, escalation rules, moderation, and what actions the assistant is allowed to take. A system card is the right place to document that combined picture.

A voice assistant is another strong example. The underlying model may be multimodal, but the deployed voice system also depends on speech processing, speaker constraints, refusal behaviour in audio, content policies, and abuse monitoring. Those are system level questions.

A coding or operations agent with tool use makes the distinction even clearer. The model may be only one part of the stack. The system card needs to cover tool permissions, sandboxing, prompt injection handling, logging, review steps, and circumstances in which autonomous action is restricted.

A moderation product may combine several classifiers, routing logic, thresholds, human queues, and appeal flows. The useful transparency artefact for that full service is a system card, not only a set of model cards.

Common misunderstandings

The most common misunderstanding is that a system card is just another name for a model card. It is not. A model card is narrower and focuses on a single model. A system card covers the deployed system around one or more models.

Another misunderstanding is that a published system card means a system is certified or guaranteed safe. It does not. It is documentation, not a formal guarantee.

A third misunderstanding is that system cards are only relevant for frontier lab releases. In reality, any organisation deploying an AI system in material workflows can benefit from system level documentation, scaled to risk and complexity.

A fourth misunderstanding is that benchmark scores tell the same story. They do not. Benchmarks can say something about capability. They say far less about how a live system is constrained, monitored, and likely to fail in context.

Risks and boundaries

System cards vary widely. One provider may produce a long, evaluation heavy document. Another may publish a short summary. Because the convention is still evolving, leaders should not assume that two system cards are directly comparable line by line.

They can also become stale. A system card may describe a particular release state, while the live system keeps changing through prompt updates, model swaps, policy revisions, or new tools. If versioning is weak, the card quickly loses value.

There is also a selective disclosure problem. Providers may describe major mitigations but not every dependency, threshold, or operational weakness. Some of that will be justifiable for security reasons. Some of it may simply reflect immature practice. Either way, procurement and assurance teams should read system cards as evidence, not as the whole case.

And as with model cards, a system card does not replace legal, security, privacy, or domain specific review. It should feed those processes, not stand in for them.

What to do next

First, ask whether the vendor can provide both a model card and a system card, or close equivalents. If they only provide model level documentation for a complex assistant or agent, you still have a major visibility gap.

Second, map the documented system to your intended deployment. Note any extra prompts, tools, data sources, user groups, or permissions in your environment that are not covered by the published card.

Third, read the safeguards section closely. Work out which protections are doing most of the safety work, and whether those protections will still exist in your implementation.

Fourth, ask how the card is versioned and updated. For a rapidly changing AI product, undated documentation is weak documentation.

Fifth, use the system card to drive acceptance tests for your own pilot. Turn stated limitations and residual risks into concrete scenarios that your team will evaluate before wider rollout.

Have a question or a suggestion, or want to understand how we research and review these guides? Read about our editorial standards and how to reach us.

FAQs

Is a system card always public?

Not always. Some organisations publish them externally, while others keep fuller versions for internal governance or customer due diligence. What matters is that the system level documentation exists and is maintained.

Do I still need a model card if I have a system card?

Usually, yes. The two artefacts answer different questions. The model card explains the model. The system card explains the full deployed setup.

What if a vendor says their product is too complex for a system card?

Complexity is usually a reason for better system documentation, not an excuse to avoid it. The format can vary, but the need for clear scope, testing, mitigations, and limits remains.

Are system cards only for very advanced AI systems?

No. The depth should vary with risk and complexity, but any AI system used in meaningful workflows benefits from clear documentation of scope, safeguards, and limitations.

What is the single most important thing to look for?

Look for whether the card explains the real deployed context, including safeguards and limits. If it reads like a model brochure rather than a system assessment, it is probably not enough.

Sources

System Cards for AI-Based Decision-Making for Public Policy (arXiv). Primary academic source. Supported the point that system cards are broader accountability artefacts that can present audit and assessment information at data, model, code, and system level. cite.
Policy Alignment on AI Transparency (Partnership on AI). Secondary policy source. Supported the point that documentation and transparency are central to managing foundation model risk and that documentation frameworks are still developing across jurisdictions. cite.
Guidance for Safe Foundation Model Deployment (Partnership on AI). Secondary governance source. Supported the statement that system cards are treated as emerging best practice disclosures rather than a single universally fixed standard. cite.
The CLeAR Documentation Framework for AI Transparency (Harvard Shorenstein Center). Secondary framework source. Supported the article's emphasis that documentation for AI systems containing one or more models involves context, trade offs, and no one size fits all template. cite.
Artificial Intelligence Risk Management Framework Generative Artificial Intelligence Profile (NIST). Secondary standards source. Supported the procurement and assurance advice to review transparency artefacts such as system cards and model cards for third party models and systems. cite.

‹ What is federated learning?

What is edge AI? ›