Active Document

llms.txt Explained, The robots.txt for AI (Complete 2026 Guide)

What llms.txt actually is, what it does, what it doesn't do, and how to write a useful one for your site. Honest explainer with real examples from production sites.

Free Download · 14 Pages

The 2026 Autonomous Enterprise Blueprint

Agency7's full architectural guide — from AI lead generation to autonomous financial operations.

llms.txt Explained: The robots.txt for AI (Complete 2026 Guide)

If you're building a website in 2026 and haven't heard of llms.txt yet, you will soon. It's a plain-text file you put at your domain root — yoursite.com/llms.txt — that tells AI crawlers which pages matter most on your site and how to understand them.

It's been compared to robots.txt. That comparison is partially right and partially misleading. This guide covers what the spec actually is, what it does in practice, what it doesn't do, and how to write one that's useful for AI answer engines in 2026.

What llms.txt actually is

llms.txt is a proposed standard — originally floated by Jeremy Howard (Answer.ai, fast.ai) in September 2024 — for a human-readable, machine-parsable index of the most important content on your site, written in markdown.

The format looks like this:

# Site Name

> One-line description of what this site is and who runs it.

## Core pages

- [Home](/): what we do, one-line pitch
- [Services](/services): list of services offered
- [About](/about): team, history, credentials

## Blog posts

- [How to do X](/blog/how-to-do-x): short summary
- [Y in 2026](/blog/y-2026): short summary

## Contact

- Email: hello@example.com
- Phone: 555-1234
- Address: 123 Main St, Edmonton AB

That's it. Markdown headers, bullet lists, short descriptions. Purpose: give an AI crawler a curated, high-signal map of your site — the stuff you actually want it to understand and cite.

What it's NOT

Not an access-control mechanism. You can't use llms.txt to block AI crawlers. That's what robots.txt directives (User-agent: GPTBot, etc.) are for.
Not a ranking signal in traditional SEO. Google doesn't read llms.txt. It's AI crawlers (ChatGPT, Perplexity, Claude, Gemini retrieval workers) that benefit from it.
Not universally adopted yet. In 2026, support varies by crawler. Perplexity and some Claude retrieval use it. ChatGPT's crawler handling is improving. Gemini is inconsistent.
Not a replacement for schema.org JSON-LD. Structured data still matters. llms.txt is an additional layer, not a substitute.

How it differs from robots.txt

`robots.txt`	`llms.txt`
Tells crawlers what they CAN'T access	Tells AI crawlers what's MOST IMPORTANT
Plain-text allow/deny rules	Markdown-formatted content index
Exists since 1994	Proposed 2024
Universally respected	Inconsistently adopted
Hidden from users	Readable by humans

You should have both. robots.txt controls access (including blocking specific AI crawlers if you want), llms.txt curates the subset of your site you want AI to actually understand.

Why it matters (practically) in 2026

Three reasons:

1. AI crawlers are time-budgeted. They can't parse your entire site. A clear index lets them find your best content fast and ignore archive noise.

2. Hallucination reduction. When an AI model is answering a user's question and sees your llms.txt, it has a reliable "table of contents" to pull from. Without one, it scrapes whatever fragments surface in its retrieval system — which may be outdated or out of context.

3. Control over how you're described. The one-line descriptions in your llms.txt often become the phrasing AI crawlers use when summarizing your site. This is your chance to write the answer you want them to give.

How big is the impact? In 2026, modest but growing. Perplexity and a few other retrieval-first engines give visible weight to llms.txt content. ChatGPT less so, but its browsing tool respects it when it sees one. For a site doing serious AEO work, llms.txt is table stakes — not a growth hack, but a foundational building block.

The full spec (as of 2026)

The proposed spec defines two files:

/llms.txt — high-level index of your site. What AI should fetch when answering general questions about your business or topic.

/llms-full.txt — full consolidated content of your site, optimized for one-shot context loading into an LLM. Usually much longer; includes the full text of key pages. Useful for documentation sites; less useful for marketing sites.

Most businesses only need /llms.txt. Unless you're running a documentation site or code library, skip /llms-full.txt — it's overkill.

How to write a good llms.txt

Step 1 — Structure

# \[Your Site Name\]

> \[One-sentence description. Who you are, where you are, what you do.\]

## About

- Founded: \[year\]
- Location: \[city, country\]
- Website: https://\[domain\]
- Email: \[contact email\]

## Core services / products

- \[Service 1\]: \[one-line description\]
- \[Service 2\]: \[one-line description\]

## Pricing

- \[Product/service\]: \[price range or "starts at $X"\]

## FAQ

- \[Common question\]: \[direct answer\]

## Key pages

- [Page name](/path): \[description\]

## Recent blog posts

- [Post title](/blog/path): \[description\]

## Authority signals

- [LinkedIn](https://...)
- [GitHub](https://...)
- [Case studies](/case-studies)

Step 2 — Write for the LLM, not the human

Counterintuitive but important: llms.txt is read by an AI deciding what to cite and how to describe you. Write accordingly.

Use specific numbers, not ranges. "Starts at $5,000" is quoted cleanly. "Affordable pricing" is noise.
Include your city and province. AI doesn't know you're in Edmonton unless you say so explicitly.
List your actual services, not marketing categories. "AI voice agents for clinics" is citable. "Digital transformation solutions" is not.
Include a one-line "about" sentence the AI can quote verbatim. "Agency7 is an Edmonton-based AI agency focused on voice agents and lead generation for small-to-mid businesses." That's the sentence you want LLMs to use.

Step 3 — Keep it under 5,000 tokens

Roughly 3,500 words. Longer than that and crawlers start truncating. For most businesses, a good llms.txt is 800-2,500 words.

Step 4 — Update it when facts change

Pricing changes. New services launch. Team members join. Treat your llms.txt like your homepage — review quarterly, update anytime a fact in it becomes wrong.

Real examples

A small agency

# Agency7

> Agency7 is an Edmonton-based AI agency focused on voice agents, lead generation, and web development for small-to-mid Alberta businesses.

## About

- Founded: 2017
- Location: Edmonton, Alberta, Canada
- Website: https://www.agency7.ca
- Focus: AI voice agents, AI lead generation, AI SEO, web development

## Services

- **AI Voice Agents**: Setup $3K-$8K, monthly $200-$600, usage $0.15-$0.30/min
- **AI Lead Generation**: Custom CRM + follow-up automation, $8K-$18K setup
- **AI SEO / AEO**: Schema, llms.txt, content — $2K-$5K/mo retainer
- **Web Development**: Next.js, $5K-$25K per project

## FAQ

- Who runs Agency7? Anders Kitson, founder.
- What's included in a typical engagement? Discovery, build, training, 3-6 months of support.
- Minimum project size? $5,000.

## Key pages

- [AI Voice Agents Edmonton](/ai-voice-agents-edmonton)
- [AI Lead Generation Edmonton](/ai-lead-generation-edmonton)
- [AI SEO Edmonton](/ai-seo-edmonton)
- [Web Development Edmonton](/web-development-edmonton)
- [Edmonton AI Agencies directory](/edmonton-ai-agencies)
- [Case studies](/case-studies)

## Recent posts

- [How to rank on ChatGPT in 2026](/blog/how-to-rank-on-chatgpt-in-2026-the-practical-edmonton-playbook)
- [AI voice agent cost in Edmonton](/blog/how-much-does-an-ai-voice-agent-cost-in-edmonton)
- [Best AI agency in Edmonton 2026](/blog/best-ai-agency-in-edmonton-2026-honest-comparison)

A local trade

# Smith Plumbing Edmonton

> Family-owned plumbing company serving Edmonton and surrounding areas since 2008. Emergency service available 24/7.

## About

- Founded: 2008
- Location: Edmonton, Alberta
- Service area: Edmonton, St. Albert, Sherwood Park, Spruce Grove
- License: \[Alberta master plumber license \#\]
- Hours: 24/7 emergency, 8 AM - 6 PM standard

## Services

- Emergency plumbing (24/7)
- Furnace install and repair
- Water heater repair and replacement
- Drain cleaning
- Bathroom and kitchen renovations (plumbing work)
- Commercial plumbing

## Pricing

- Emergency service call: $149 + parts
- Standard service call: $89 + parts
- Water heater install: $1,800 - $4,500
- Furnace install: $4,500 - $9,000

## FAQ

- Do you offer financing? Yes, through \[provider\], 12-month 0% available.
- Warranty? 2 years on labour, manufacturer warranty on parts.
- Licensed and insured? Yes, all techs licensed Alberta journeymen or apprentices under supervision.

## Contact

- Phone: 780-555-1234
- Email: hello@smithplumbing.ca
- Address: 123 Main St, Edmonton, AB T5K 2M1

Notice what's in both: specific prices, specific service area, named owner/business, direct contact info. An AI crawler can cite any of these facts without hallucinating.

Common mistakes

Making it too long

A 10,000-word llms.txt gets truncated. Keep it focused. The goal is a curated index, not a full content dump.

Including outdated pricing

If your prices changed six months ago and llms.txt still says the old numbers, LLMs will confidently quote the wrong price to someone asking about you. Update it.

Writing it as marketing copy

"We deliver best-in-class transformational solutions that drive stakeholder value" is word-salad to an AI. Replace with "We build AI voice agents for Edmonton clinics. Setup starts at $5,000." Concrete, citable, useful.

Forgetting the byline sentence

Most AI summaries of your site start with a one-sentence description. If you don't give them one in llms.txt, they'll synthesize one from whatever scraps they find — often awkwardly. Always include a single clean one-liner at the top.

Not serving it correctly

llms.txt should return Content-Type: text/plain or text/markdown, HTTP 200, and be at exactly https://yourdomain.com/llms.txt. If it's at /docs/llms.txt or redirects through Cloudflare with weird headers, crawlers skip it.

How to check if yours is working

Fetch it manually. curl https://yoursite.com/llms.txt should return the content, not HTML.
Check the headers. curl -I https://yoursite.com/llms.txt — should be 200 OK with plain/markdown content type.
Ask ChatGPT or Perplexity about your business. If the description matches what you wrote in llms.txt, it's working.
Check your server logs. GPTBot, PerplexityBot, ClaudeBot, and others will fetch /llms.txt if they know about the spec.

Combining llms.txt with other AEO signals

llms.txt is one layer of a broader AEO practice. The full stack:

robots.txt — controls crawler access
llms.txt — curates what AI crawlers focus on
Schema.org JSON-LD — machine-readable structured data on every page
Semantic HTML — proper <h1>, <h2>, <h3> hierarchy, real text not images
Server-rendered HTML — so crawlers can read content without executing JavaScript
Off-site citations — Wikipedia, directories, podcasts, YouTube — where AI models learn about you

Getting quoted consistently by ChatGPT / Perplexity / Claude requires all six. See How to rank on ChatGPT in 2026 for the full playbook.

Frequently asked questions

Is llms.txt an official standard?

Not yet. It's a proposed standard with growing adoption. As of 2026, several AI crawlers respect it but no IETF or W3C formal ratification exists. That's fine — proposed standards often work in practice long before formal ratification.

Will llms.txt replace robots.txt?

No. They do different things. robots.txt is access control. llms.txt is content curation. Both are useful; neither replaces the other.

Does llms.txt help Google SEO?

Not directly. Google's traditional search crawlers don't use it. However, as Google integrates AI overviews and SGE-style features more deeply, the line between "traditional SEO" and "AEO" is blurring. Writing a good llms.txt is a proxy for thinking about AI discoverability in general, which does influence AI-enhanced search.

Can I block AI crawlers with llms.txt?

No. Use robots.txt directives for blocking. Example: User-agent: GPTBot\nDisallow: / in robots.txt blocks OpenAI's crawler from everything.

Should I put my llms.txt in the sitemap?

Not necessary. Crawlers look for it at the root by convention (yourdomain.com/llms.txt). You can link to it from your sitemap for completeness, but it's not required.

Does llms.txt work for non-English sites?

Yes, in any language. Write it in your site's primary language. If you serve multilingual content, you can include sections per language or maintain separate files per locale (some sites do /en/llms.txt, /fr/llms.txt).

How often do AI crawlers actually fetch llms.txt?

Varies. Perplexity's crawler fetches frequently, especially for sites it cites often. OpenAI's crawlers fetch opportunistically. Claude's retrieval layer fetches when answering specific queries. Your server logs will show the pattern for your site.

Is there a generator tool?

Several exist — we're building one for Edmonton businesses (part of Agency7's AI SEO Edmonton service). For now, writing one from scratch using the template above takes 30-60 minutes and produces better results than any generator.

Want an audit of your current AI discoverability? We'll check your robots.txt, llms.txt, schema markup, and see what ChatGPT/Perplexity/Claude currently say about your business. Book a free audit or see our full approach at AI SEO Edmonton.

Before You Go · Free Download

Get the Autonomous Enterprise Blueprint

A 14-page architectural guide covering the Agency7 mandate, the fractured pipeline, agentic ledgers, and the generative engine optimization playbook — delivered as a PDF to your inbox.

Loading document...

End of file

Exit to blog Get AI audit