A website root showing llms.txt alongside robots.txt, sitemap.xml, noindex and structured data layers

What is llms.txt?

Search visibility, crawl and structured data

llms.txt is a proposed, Markdown-based file that some site owners publish at /llms.txt to help large language model tools or AI agents understand the most useful parts of a website. It is best treated as an experiment or emerging convention, not as a settled web standard and not as a guaranteed route into AI answers. It is also not a replacement for robots.txt, sitemap.xml, noindex, structured data or normal content quality work. For most organisations, llms.txt is only worth considering after the core crawlability, indexing, metadata and governance basics are already sound.

Reviewed by Jackie, Head of Learning & Development, Levellers - Last reviewed 8 June 2026

What this means

The appeal of llms.txt is easy to understand. Websites are usually built for humans and browsers, not for language models trying to find the most important page on a topic. A large documentation site might have many layers of navigation, templates, scripts, repeated boilerplate and outdated pages. From that point of view, a short machine-readable guide that says "here is what this site is about, and here are the pages that matter most" sounds sensible.

That is what the llms.txt proposal is trying to do. The proposal site describes it as a way to provide information that helps LLMs use a website at inference time. In other words, when an AI system is trying to answer a user's question, the file is meant to act as a concise signpost. The proposal uses Markdown rather than XML and recommends a structured file with a title, summary and grouped lists of relevant links. It also suggests clean Markdown versions of content pages where useful.

The important phrase here is proposal. llms.txt is not equivalent to the established building blocks of web publishing. It is newer, less stable, not universally supported, and not required by major search platforms. Some official documentation sites already publish one, which shows real-world experimentation. It does not prove universal adoption or ranking benefit.

Why it matters

llms.txt matters because organisations are trying to understand how websites are discovered and reused in AI-assisted experiences. Content owners can see the direction of travel: more people ask questions in AI search tools, browser assistants and answer engines instead of relying only on classic search results. That creates a genuine documentation problem. If your best material is buried inside heavy navigation, mixed with stale pages and hard for machines to parse, you may want a lightweight overview that points systems to your strongest public pages.

There is also a practical internal benefit. Even if llms.txt delivered no external discovery advantage at all, the process of writing one can expose whether your site actually has a sensible public information architecture. You quickly learn whether you can identify the canonical pages, whether your policies are current, whether your documentation is clearly grouped, and whether you have pages you would not want agents to foreground.

However, leaders need to resist a common leap: "AI discovery is changing, therefore llms.txt must be the new must-have technical file." That conclusion is not supported by current official search guidance. Google's Search Central documentation now says there are no additional technical requirements to appear in AI Overviews or AI Mode, that the same SEO best practices remain relevant, and that site owners do not need to create new machine-readable files, AI text files or special schema to appear in those features. That does not make llms.txt useless. It does mean it should be framed as a supplementary experiment, not as a modern replacement for established crawl and indexing fundamentals.

How it works

The current llms.txt proposal says the file normally lives at the root path /llms.txt of a site, though a subpath is also possible. The proposal describes a specific Markdown structure. A file should contain an H1 naming the site or project, a blockquote summary, optional explanatory text, and one or more H2 sections listing important URLs with optional descriptions. There is also a special Optional section whose links can be skipped when a shorter context is needed. The proposal further suggests that sites may publish clean Markdown versions of key pages by appending .md to the original URL pattern, and it discusses related context files such as llms-ctx.txt or llms-ctx-full.txt as implementation choices rather than mandated standards.

That makes llms.txt a guidance file, not an enforcement file. To understand what it is not, it helps to compare it with existing web controls.

robots.txt is about crawler access. Google's documentation says a robots.txt file tells crawlers which URLs they can access and is mainly used to manage crawl traffic. Google also states that robots.txt is not a mechanism for keeping a page out of Google and is not a secure way to hide private content. So robots.txt is about request behaviour, not rich semantic guidance for LLMs.

sitemap.xml is about URL discovery. Google says a sitemap is a file that provides information about pages and files on a site so search engines can crawl more efficiently. It is a broad inventory, not a curated explainer of the handful of pages that matter most to an AI system answering a question.

noindex is about indexing control where supported. Google says noindex can block indexing of a page so it does not appear in Search results. That is a control signal about inclusion, not a content map for AI tools.

JSON-LD and other structured data are about machine-readable facts on a page. Google says it uses structured data to understand content and the wider web. That is different from handing an LLM-oriented guide to a collection of public resources.

The key outcome of those comparisons is simple. llms.txt may coexist with these files, but it does not replace them. In fact, Google's current AI-features guidance goes further and says there is no need to create new AI text files to appear in Google's AI features. So if you build llms.txt before fixing crawlability, internal links, indexability, canonical content, structured data and metadata hygiene, you are probably optimising the wrong layer first.

Examples

A documentation-heavy SaaS company is the easiest example. Suppose it has product docs, API docs, pricing pages, legal pages, release notes, blog posts and archived material spread across multiple sections. A carefully maintained llms.txt file could point only to the most useful public pages for understanding the product, implementation steps and support model. That may help some AI tools navigate the estate. Even if the external effect is modest, the file can still act as an editorial filter for the team maintaining the docs.

A university or training body could use llms.txt to point AI systems toward current programme information, admissions guidance, accessibility information and public policy pages, while leaving lower-value news items in an optional section or out of the file entirely. That is only sensible if the listed pages are public, current and clearly owned.

A content publisher building an explainer library could use llms.txt after fixing site basics to highlight evergreen cornerstone pages. The value there is not mystical "GEO juice". It is a cleaner expression of which pages best describe the subject area. If the site is already internally coherent, llms.txt might reinforce that coherence.

But there are equally good reasons not to publish one yet. If a site still has broken internal links, duplication problems, weak canonicals, unclear page ownership, or public URLs that should not be promoted to agents, adding llms.txt can simply formalise the wrong map. It can become a neat summary of an untidy estate.

That is why llms.txt should be treated much more like information architecture work than like a quick technical hack.

Common misunderstandings

The biggest misunderstanding is that llms.txt is the new robots.txt. It is not. robots.txt is about crawler access instructions. llms.txt is proposed as a content guide for LLMs and agents. Different job, different guarantees.

The second misunderstanding is that publishing llms.txt will improve AI citations or rankings by default. There is no general official support for that claim. Google's own guidance explicitly says there are no special AI files required for its AI features. So the strongest honest description today is "possible low-cost experiment", not "visibility guarantee".

Another mistake is to treat llms.txt as a replacement for sitemap.xml, noindex or JSON-LD. Those each solve different problems: discovery, index control and structured understanding. A guide file does not replace established crawl and indexing controls.

A further misunderstanding is that llms.txt can protect private content. It cannot. If something should not be accessible or should not be indexed, rely on authentication, access control and the appropriate crawler or indexing directives. Do not rely on a voluntary guidance file to protect anything important.

Risks and boundaries

The first llms.txt risk is strategic distraction. Teams can spend hours debating a new AI-discovery file while the site still lacks clean internal linking, a maintained sitemap, trustworthy titles, useful metadata and consistent public content ownership. That is the wrong order of work.

The second risk is publishing a bad map. A stale llms.txt file can point agents to old pricing, retired documentation, superseded policies or weak pages that you did not actually mean to foreground. Because the file is intentionally concise, every listed URL carries weight.

The third risk is accidental exposure. Some teams may be tempted to include URLs that are technically public but should not be promoted, such as obscure archives, low-quality experiments or pages that create confusion when taken out of context. llms.txt is a signalling layer, so treat it like an editorial asset.

There is also a support boundary. The proposal site presents llms.txt as a proposal to standardise behaviour, not as a finished standard with universal adoption. At the same time, current AI crawler ecosystems are still organised mainly around crawler documentation and robots-style control. OpenAI documents OAI-SearchBot and GPTBot. Anthropic documents ClaudeBot, Claude-SearchBot and Claude-User. Perplexity documents PerplexityBot and Perplexity-User. Those official documents show that today's operational reality still revolves around crawler identity, fetch behaviour and access rules far more than around any universal llms.txt requirement.

Finally, llms.txt is not a substitute for governance. If you publish one, assign an owner, keep it concise, review it regularly and align it with the actual public source of truth. Because the space is still moving quickly, a three-month review cadence is sensible.

What to do next

Treat llms.txt as a possible finishing layer, not a starting point. First make sure the site has crawlable public pages, sensible internal links, a maintained sitemap, appropriate robots behaviour, correct noindex usage where needed, clear canonical content and any structured data that genuinely matches visible content.

Then decide whether a small curated guide would help. If yes, keep it short. List only public pages you would be comfortable having an AI system treat as your best summary of the topic. Avoid link dumping. Avoid speculative or temporary URLs. Reuse existing information architecture instead of inventing a second shadow taxonomy just for bots.

Give the file an owner. Review it at least every three months. If nothing else, that review discipline will tell you whether llms.txt is helping as an editorial instrument. If external AI visibility improves, treat that as a bonus rather than a promise.

Have a question or a suggestion, or want to understand how we research and review these guides? Read about our editorial standards and how to reach us.

FAQs

Is llms.txt an official standard?

Not in the way robots.txt or sitemaps are commonly treated. The official proposal site describes llms.txt as a proposal to standardise a root-level file that helps LLMs use a website at inference time. That wording matters. It signals intent and structure, but not universal adoption, mandatory support or guaranteed downstream behaviour.

Does Google require llms.txt for AI Overviews or AI Mode?

No. Google's current Search Central guidance says there are no additional technical requirements to appear in AI Overviews or AI Mode and that site owners do not need to create new machine-readable files, AI text files or special schema for those features. If your goal is visibility in Google's AI search experiences, the fundamentals still matter more.

Should llms.txt replace robots.txt, sitemap.xml, noindex or JSON-LD?

No. Those all solve different problems. robots.txt manages crawler access instructions, sitemap.xml helps discovery, noindex controls indexing where supported, and JSON-LD provides structured page-level facts. llms.txt, at most, is a curated signpost for LLM-oriented navigation. Different layer, different purpose, different level of maturity.

Who should consider publishing llms.txt now?

Teams with strong public documentation or explainer content can consider it as a low-cost experiment once the core site is already tidy. It is most defensible where there is a clear set of canonical public pages and an owner willing to maintain the file. It is much less defensible as a shortcut for sites with weak information architecture, stale content or unresolved governance issues.

Sources

The /llms.txt file - llms-txt (llms-txt). Core definition of llms.txt as a proposal, plus its stated purpose, format and relationship to existing standards.

‹ What is Perplexity?

What is the EU AI Act? ›