What is noindex?

Crawl, indexing and structured data

noindex is a directive that asks search engines not to include a page or resource in their results. It is set with a meta robots tag in the HTML head for web pages, or with an X-Robots-Tag HTTP header for non-HTML files such as PDFs and images. For it to work, the crawler must be able to access the page to see the instruction, so blocking the same URL in robots.txt defeats it. noindex controls listing in results, not access, so it is not a privacy control for sensitive data.

What this means

noindex is the instruction that says you may look at this page, but please do not list it in your catalogue. A compliant search engine that reads the instruction will drop the page from its results, even if other sites link to it.

There is a catch that trips many teams up. The crawler has to be allowed to fetch the page to see the noindex instruction in the first place. If you also block the page in robots.txt, the crawler never reads it, never sees the noindex, and the page can still appear in results. The two controls do different jobs. robots.txt manages crawling, and noindex manages whether a crawled page is listed.

Why it matters

noindex is the right tool for pages that are reachable but bring no value in search. Think of staging leftovers, filtered utility pages, duplicate routes, expired campaign landing pages, placeholders, internal search results and thin pages. Keeping these out of results tidies up how a site appears and stops weak pages competing with the ones that matter.

The commercial risk runs in both directions. Apply noindex by accident through a template, and you can quietly remove important pages from search. Fail to apply it where needed, and low value pages clutter results and dilute the picture a search engine builds of your site. Because the directive is powerful and often set in shared templates, it deserves deliberate handling and a release check.

How it works

Two ways to set it

For an HTML page, you place a meta robots tag in the head of the document, for example a robots meta tag with the content value noindex. For non-HTML files such as PDFs, images and video, there is no HTML head to edit, so you return an X-Robots-Tag HTTP header with the value noindex instead. The two methods are treated the same way once the crawler sees the directive. The X-Robots-Tag is also useful for applying rules in bulk across a file type.

The crawler must be able to see it

For noindex to take effect, the page or resource must not be blocked by robots.txt and must otherwise be reachable by the crawler. If the URL is blocked, the crawler cannot fetch it, cannot read the directive, and the URL can still appear in results when other pages link to it. The unofficial robots.txt noindex line was never supported by Google and stopped working on 1 September 2019, so noindex must be delivered through the page or the HTTP header.

How it differs from related controls

A canonical tag tells a search engine which of several similar URLs is the preferred one to index. It consolidates duplicates and is a hint, not a removal instruction. Pairing noindex with a canonical that points elsewhere sends conflicting signals and can be misread. Deletion and password protection are different again. Removing a page or putting it behind a login keeps content out of reach entirely, which is the appropriate route for genuinely private material.

Quick removal and the rendering pitfall

For urgent cases, the search engine's removals tool can temporarily hide a URL from results for about six months while a permanent fix takes effect, and it clears the cached copy. To make a removal permanent, apply noindex on a crawlable page, or remove the page and return a 404 or 410. There is also a JavaScript pitfall worth knowing. When a search engine sees a noindex instruction in the initial HTML, it may skip rendering and skip running JavaScript, so trying to add or remove noindex later with JavaScript is unreliable. If you want a page indexed, do not ship noindex in the original page code.

Examples

A team launches a campaign with several temporary landing pages. When the campaign ends, they apply noindex to the expired pages so they stop appearing in results, while leaving them reachable for anyone with a direct link.

A site hosts internal documents as PDFs that should not surface in search. Because a PDF has no HTML head, they configure the server to return an X-Robots-Tag noindex header for those files, applied in bulk across the directory.

A business discovers an old set of thin pages still ranking and confusing visitors. They confirm the pages are not blocked in robots.txt, apply noindex, and use the removals tool for a quick temporary hide while the directive is picked up on the next crawl.

Common misunderstandings

The most damaging misunderstanding is treating noindex as a privacy control. It only governs whether a page is listed, not whether it can be reached. Anyone with the URL can still open the page, so private data needs authentication, a restricted environment or removal.

A second is pairing noindex with a robots.txt block, which prevents the crawler from ever seeing the directive.

A third is assuming JavaScript can reliably add or remove noindex after the page loads. It cannot, because the search engine may decide not to render the page at all.

A fourth is confusing noindex with canonical. One asks for removal, the other expresses a preference among duplicates, and they should not be stacked on the same URL.

Risks and boundaries

The biggest risk is accidental noindex through a shared template, which can remove whole sections of a site from search without anyone noticing until traffic falls.

The second is the robots.txt pairing trap, where blocking the URL stops the directive being seen.

The third is the JavaScript pitfall, where a noindex in the initial HTML leads the search engine to skip rendering, making any later script based change unpredictable.

The governance boundary matters for UK organisations handling personal data. noindex does not satisfy data protection obligations. Where someone exercises a right to erasure, or where personal data must be controlled, the Information Commissioner's Office expects genuine erasure or restriction of the data, not merely hiding a page from search results.

What to do next

Classify your URLs into four buckets. Pages to keep indexed, pages to noindex but keep reachable, pages to remove entirely, and pages to protect behind access controls. The right tool follows from the bucket.

Audit the templates that inject robots meta tags, since a single template error can affect thousands of pages. Build a release check that inspects the live HTML and HTTP headers of key pages, confirming that noindex is present only where intended and absent where it is not. For anything involving personal or confidential data, treat erasure or access control as the requirement, with noindex as a tidiness measure rather than a safeguard.

FAQs

How do I noindex a PDF or image?

You cannot put a meta tag inside a PDF or image, so return an X-Robots-Tag HTTP header with the value noindex for those files. This can be applied in bulk to a file type at the server level.

Why does a page I noindexed still show in search?

Usually because the crawler has not revisited it since you added the directive, or because the page is blocked in robots.txt and the crawler cannot see the noindex. Allow crawling, confirm the tag is live, and request a recrawl.

Should I use noindex for expired campaign pages?

Yes, if you want them reachable by direct link but out of search. Apply noindex while leaving the page crawlable. If they should not be reachable at all, remove them and return a 404 or 410.

Is noindex enough to keep something private?

No. noindex only stops a page being listed. The page is still reachable by anyone with the URL. Use authentication, a restricted environment or removal for private or personal data.

What is the difference between noindex and canonical?

noindex asks a search engine to drop a page from results. A canonical tag names the preferred version among similar pages so duplicates consolidate. Do not stack them on the same URL, as that sends conflicting signals.

Can I block a page in robots.txt and add noindex for extra safety?

No. If robots.txt blocks the URL, the crawler cannot fetch the page and never sees the noindex, so the page can stay in results. Allow crawling until the page drops out, then block the path if you wish.

Why should I avoid setting noindex with JavaScript?

If a noindex appears in the initial HTML, the search engine may skip rendering and never run your JavaScript, so script based changes are unreliable. Keep noindex out of the original code for any page you want indexed.

How do I remove a page quickly?

Use the search engine's removals tool for a temporary hide of about six months, then make it permanent by applying noindex on a crawlable page, or by removing the page and returning a 404 or 410.

Sources

  • Block Search Indexing with noindex (Google Search Central). The definition of noindex, the meta robots tag and X-Robots-Tag header methods, and the requirement that the page must be crawlable and not blocked by robots.txt for the rule to be seen.

  • HTML Living Standard, meta name=robots (WHATWG). The standard definition of the robots meta tag and its values, including noindex, in HTML.

  • RFC 9309: Robots Exclusion Protocol (Internet Engineering Task Force (IETF)). The principle that crawl control and indexing control are separate and that the robots protocol is not a content security measure.

  • Right to erasure (Information Commissioner's Office (ICO)). The UK data protection boundary that personal data must be genuinely erased or restricted, not merely hidden from search.