What is IaC?

Delivery and operations

IaC means Infrastructure as Code. It is the practice of defining infrastructure and related configuration in versioned, reviewable files instead of relying mainly on manual setup in consoles and admin screens. Those files describe the resources, settings, policies, and relationships an environment should have, and tools then compare that desired state with the real environment and apply approved changes. Done well, IaC makes infrastructure more repeatable, easier to review, easier to rebuild, and easier to govern over time.

What this means

A simple way to understand IaC is to compare it with "ClickOps", where an administrator creates networks, permissions, storage, gateways, and settings manually through web interfaces. Manual work is sometimes unavoidable, but it is hard to review after the fact, hard to recreate consistently, and easy to forget when an outage, an audit question, or a staff change arrives.

With IaC, the intended configuration lives in code or structured files. A team can review the change in a pull request, run checks, approve it, and then apply it through a defined workflow. If staging needs to mirror production closely, IaC makes that more realistic. If an environment is damaged or simply becomes messy, IaC gives the team a baseline from which to recover.

Small and mid-sized organisations do not need to codify everything. They usually gain most by codifying the infrastructure that supports important services: networks, identities, logging, storage policies, gateways, and core environments.

Why it matters

IaC matters because infrastructure choices are business choices once live services depend on them. If your customer portal, internal analytics stack, document store, or model-serving endpoint relies on cloud resources configured by memory and screenshots, you have an operational fragility problem.

Versioned infrastructure changes improve repeatability and auditability. Reviewable pull requests make it easier to spot risky permissions, exposed endpoints, missing logs, or expensive resource choices before they reach production. Environment consistency matters too. Teams often want staging to behave like production, but manual configuration drifts quickly. IaC narrows that gap.

IaC also supports change discipline because infrastructure can join the same workflow as application code: version control, review, testing, CI/CD, approvals, logs, and rollback planning. For AI-enabled systems, that matters because a model feature still depends on storage, networks, roles, secrets, APIs, queues, observability, and cost controls. If those foundations are configured manually and poorly understood, the "AI" layer inherits the weakness.

How it works

Most IaC workflows revolve around a desired state. A team declares what resources should exist and how they should be configured. The IaC tool reads the current environment, compares it with the declared state, and proposes changes required to close the gap. In mature practice, teams do not jump straight from edited files to live production changes. They review the plan first.

That proposal step is valuable because it shows intent clearly. A plan can reveal that a new subnet will be created, an old storage bucket will be destroyed, a role policy will widen, or an API gateway rule will change. In many tools, teams can separate the plan from the apply step so that a reviewed plan, rather than a fresh ad hoc calculation, is what automation executes.

State is another operational idea leaders should understand. IaC tools often need a record that maps declared resources to real ones in the environment. Without that state, the tool cannot reliably tell what it already manages and what should change next. That is why state handling deserves care.

Drift is the other half of the story. Desired state is what the files say. Actual state is what really exists. Those diverge when people make manual changes, when emergency actions occur, or when connected services alter behaviour. Good IaC practice includes drift detection and a response rule: either restore the environment so it matches the approved configuration, or update the configuration so the code reflects reality.

In practical terms, IaC often sits inside CI/CD. A change to infrastructure code triggers validation, policy checks, perhaps security scanning, a visible plan, and then a controlled apply against a particular environment. That workflow can cover networks, IAM roles, logging destinations, API gateway configuration, managed databases, and other platform components.

Examples

Consider a services company with development, staging, and production environments in a public cloud. Originally, each environment was created manually. Over time, production acquired stricter network rules, different storage settings, and extra logging that staging never received. Releases started to fail only in production because the environments were no longer comparable. Moving those core resources into IaC lets the team define the baseline once, review changes explicitly, and rebuild environments more consistently.

Now take an AI-enabled example. A company builds a document analysis workflow that uses managed object storage, a queue, a serverless function, a model API, a private database, and an API gateway for internal access. The workflow works in one account because a senior engineer configured everything by hand. When the company needs a second region or a realistic staging copy, the lack of codified infrastructure becomes painful. IaC makes it possible to recreate the architecture in a reviewable way instead of relying on memory and screenshots.

A third case is incident recovery. Engineers may make urgent console changes to restore service. The governance question afterwards is whether the approved infrastructure definition was updated and drift reconciled, rather than leaving the environment in a partly documented state.

Common misunderstandings

One misunderstanding is that IaC means infrastructure becomes automatically safe. It does not. Misconfigured infrastructure can be deployed more quickly and more consistently too. If a storage bucket is public in code, IaC will help you recreate that mistake faithfully.

Another misunderstanding is that IaC removes the need to understand the platform. It does not. Teams still need judgement about resource relationships, identity boundaries, state handling, module reuse, and the consequences of applying changes.

It is also wrong to treat IaC as a big-company luxury. Smaller organisations may gain even more from it because they often depend on a handful of people and managed services. Codifying important parts of the estate reduces concentration of knowledge and lowers recovery risk.

Risks and boundaries

The big risk with IaC is not that code exists, but that teams trust it too easily. Hard-coded secrets are an obvious problem. So are copy-paste modules that carry unnecessary permissions, exposed endpoints, outdated defaults, or cost-heavy resource choices into every new environment. Review quality matters because IaC changes can have a very large blast radius.

Permissions deserve special scrutiny. The role used to apply infrastructure changes is often highly privileged. If that identity is poorly protected, or if too many people can trigger production applies, IaC becomes a neat way to accelerate compromise. State files can be sensitive too because they may reveal resource identifiers, configuration details, and sometimes secrets if teams are careless.

Drift is another operational risk. If engineers make console changes for convenience, the code and the environment diverge. After a while nobody knows which is authoritative. Then the next automated apply can overwrite an emergency fix or reintroduce an insecure setting that someone thought they had corrected.

There is also a business boundary. IaC helps govern infrastructure, but it is not a guarantee of good architecture, security, or compliance.

Weak state management is a practical version of the same problem. If teams share state casually across environments, fail to lock it properly, or store it where the wrong people can access it, the reliability of the workflow drops fast. The result may be accidental deletion, confused ownership, or exposure of sensitive operational detail.

What to do next

Leaders should start by asking which parts of the estate cause the most pain when they change. Focus first on the infrastructure that supports important services, regulated workloads, external integrations, or expensive outages. Networks, access roles, logging, storage policies, gateways, and core environments are usually better starting points than trying to codify everything at once.

Ask your team to make a few rules explicit. Which environments are controlled through IaC? What manual actions are still allowed, and how are they reconciled afterwards? Where is state stored? How are secrets kept out of configuration? What approvals are required before production applies? How is drift detected and handled?

Then align IaC with broader operating disciplines such as version control, peer review, CI/CD checks, cost awareness, IAM discipline, and incident learning. The goal is not more code. It is infrastructure that is easier to understand, recover, and govern. It should be simpler, safer, and less dependent on individual memory.

FAQs

Is IaC only relevant if we run everything ourselves in the cloud?

No. Even organisations that rely heavily on managed services still configure networks, identities, logging, storage, gateways, policies, and environments. Those settings often shape exposure, resilience, and cost, so they benefit from being versioned and reviewable.

Does IaC eliminate configuration drift?

Not automatically. IaC makes drift easier to detect and easier to correct, but drift still happens whenever people change infrastructure manually, emergency fixes are applied, or connected services move unexpectedly. Good practice is to decide which source of truth wins and reconcile deviations quickly.

Is rollback easy with IaC?

Sometimes, but not always. Reverting a configuration file is often easy. Reversing the real-world impact of a change can be harder, especially if data has moved, resources have been destroyed, or downstream systems have already adapted. IaC improves control and traceability, but it does not create a perfect undo button.

How does IaC relate to FinOps and cost control?

IaC makes infrastructure choices visible before they are applied, which helps teams challenge unnecessary resource sizes, duplicated environments, or forgotten services. It does not automatically optimise costs, but it creates a better review surface for cost decisions.

Sources