What GEO actually is
Generative Engine Optimization is the practice of structuring a website so that AI systems can extract, understand, and cite its content accurately. The goal is not to rank higher in a list — it is to appear as a source in an AI-generated answer.
Traditional SEO optimises for PageRank signals: inbound links, keyword relevance, structured data for rich snippets. GEO optimises for extraction quality: can the AI system find your content, parse it correctly, understand what entity you are, and reproduce your information faithfully in a generated response?
These are related but different problems. A site can rank #1 on Google and never appear in ChatGPT answers because it blocks AI crawlers. A site can have zero Google ranking and be cited in every Perplexity response because it is architecturally optimised for machine extraction.
SEO vs AEO vs GEO — what each optimises for
| Dimension | Traditional SEO | AEO (Answer Engine) | GEO (Generative Engine) |
|---|---|---|---|
| Primary target | Search engine algorithm | Featured snippets, voice search | AI-generated answers |
| Success metric | Ranking position | Answer box ownership | Citation frequency in AI outputs |
| Content format | Keyword-optimised pages | Direct Q&A format | Entity-dense, extractable claims |
| Technical layer | Core Web Vitals, crawlability | Schema markup, page speed | SSR, llms.txt, JSON-LD @graph |
| Timeline to results | 3–6 months | 4–8 weeks | 4–12 weeks (AI crawl cycle) |
Layer 1: AI crawler access — the prerequisite
AI systems cannot cite your content if they cannot crawl your site. Every major AI platform runs its own crawler: OpenAI uses GPTBot, Anthropic uses ClaudeBot, Perplexity uses PerplexityBot, Microsoft (Copilot) uses Bingbot, and Google AI uses Googlebot.
Most sites do not explicitly block these crawlers. But two common configurations silently reject them: robots.txt with overly broad Disallow rules, and CDN or WAF bot-blocking rules that classify AI crawlers as "bad bots." Cloudflare's managed bot protection, when set to block mode, will reject AI crawlers without any entry in your server logs.
Verify access by checking server logs for these user agent strings. If they are absent, check your robots.txt explicitly and then check your CDN rules. The correct robots.txt configuration for AI indexing is explicit Allow rules for each crawler by user agent string — do not rely on the default `User-agent: *` permission because some CDN systems override it.
Layer 2: Server-side rendering — why client-render breaks GEO
AI crawlers process HTML. They do not execute JavaScript to render the page — they take the raw HTML the server returns and extract content from it. If your site renders content through JavaScript after the initial HTML loads (client-side rendering), AI crawlers see an empty shell.
You can test this in 30 seconds: open your homepage, right-click and select "View Page Source." If the body of the source HTML contains your actual content — headings, paragraphs, article text — the page is server-side rendered and AI crawlers can read it. If the source shows only script tags and empty divs, the content is client-rendered and invisible to crawlers that do not execute JavaScript.
WordPress renders server-side by default, which is one reason WordPress sites are well-indexed by AI systems. React and Vue applications built without SSR (Next.js, Nuxt) or with lazy hydration patterns can have invisible content. The fix is ensuring all important content is present in the initial HTML response, not added by JavaScript after load.
Layer 3: llms.txt — the machine-readable manifest
llms.txt is a plain-text file placed at the root of your domain (e.g., danielmashkov.com/llms.txt) that provides a structured summary of who you are, what you do, and what your authoritative content is — in a format designed for large language models to process efficiently.
The format is simple: Markdown with sections for identity, service descriptions, case study summaries, and a list of canonical URLs. AI systems with longer context windows can request the extended version at llms-full.txt, which contains full descriptions rather than two-sentence summaries.
An effective llms.txt entry for a service is two sentences: what you build, and what outcome it produces. Avoid marketing language — AI systems extract factual claims, not brand positioning. Include your canonical URLs explicitly so AI systems reference the right page when they cite you.
Layer 4: JSON-LD schema — entity coherence
JSON-LD structured data is how you tell AI systems exactly what your site is, who operates it, and how its content relates to real-world entities. The schema that matters most for GEO is not FAQ schema or Article schema in isolation — it is the entity graph: a nested Person → Organization → CreativeWork hierarchy with stable @id anchors.
Stable @id anchors let AI knowledge graphs resolve your entity across multiple references. If your homepage declares @id: "https://yourdomain.com/#person" and your articles reference that same anchor as their author, AI systems can build a coherent entity model rather than treating each page as a disconnected source.
FAQPage schema is the single highest-impact addition for citation rate. A B2B site that added FAQPage schema to its comparison articles saw a 40% increase in AI citation frequency within 3 months. Each FAQ answer should be 40–60 words, self-contained, and entity-dense — AI systems extract and reproduce them nearly verbatim.
Schema types and their GEO impact
| Schema Type | JSON-LD Type | GEO Impact | Priority |
|---|---|---|---|
| Person identity | Person | Entity coherence — who operates the site | Critical |
| Organisation | ProfessionalService | Service entity — what services are offered | Critical |
| FAQ content | FAQPage | Direct citation in AI answer outputs | High |
| Process/steps | HowTo | Cited in procedural AI answers | High |
| Articles | Article / BlogPosting | Author attribution, date freshness signals | Medium |
| Case studies | CreativeWork | Expertise evidence for AI knowledge graphs | Medium |
| Voice accessibility | SpeakableSpecification | Audio AI surfaces, Alexa, Google Assistant | Low |
Layer 4: JSON-LD schema — entity coherence
JSON-LD structured data is how you tell AI systems exactly what your site is, who operates it, and how its content relates to real-world entities. The schema that matters most for GEO is not FAQ schema or Article schema in isolation — it is the entity graph: a nested Person → Organization → CreativeWork hierarchy with stable @id anchors.
Stable @id anchors let AI knowledge graphs resolve your entity across multiple references. If your homepage declares @id: "https://yourdomain.com/#person" and your articles reference that same anchor as their author, AI systems can build a coherent entity model rather than treating each page as a disconnected source.
FAQPage schema is the single highest-impact addition for citation rate. A B2B site that added FAQPage schema to its comparison articles saw a 40% increase in AI citation frequency within 3 months. Each FAQ answer should be 40–60 words, self-contained, and entity-dense — AI systems extract and reproduce them nearly verbatim.
Layer 5: Answer-first content structure
AI systems extract content in chunks — typically the first paragraph after each heading. If your first paragraph after an H2 is a preamble ("In this section, we will explore..."), the AI extraction misses the actual answer. The answer-first structure inverts this: the first 40–60 words after every heading directly answer the question the heading poses.
Short paragraphs (2-3 sentences) outperform long ones in AI extraction rate. AI systems parse content for extractable claims — dense prose makes this harder. A paragraph that makes one clear, citable claim is extracted more reliably than a paragraph making three interconnected points.
The data-aeo attribute pattern (which this site uses on TL;DR blocks and section summaries) is a forward-compatible signal. It is not a recognised attribute in any current standard, but it serves as an explicit machine-readable marker for answer-extraction zones — useful as AI crawlers evolve their parsing heuristics.
GEO implementation — step by step
- 1
Audit AI crawler access in robots.txt and CDN rules
Open your robots.txt and check for User-agent rules that might block GPTBot, ClaudeBot, PerplexityBot, or anthropic-ai. Check your CDN or WAF bot management rules separately — Cloudflare, Sucuri, and Wordfence all have independent bot-blocking rules that robots.txt does not control.
- 2
Verify server-side rendering with View Source
Right-click your most important pages and choose View Page Source. Confirm that article content, headings, and key paragraphs appear in the raw HTML — not in script tags or empty divs. Any content that requires JavaScript to render is invisible to AI crawlers.
- 3
Create an llms.txt file at your domain root
Create a plain text file at /llms.txt. Include: your name and role (2 sentences), service descriptions (1-2 sentences each), 3-5 case study summaries (2 sentences each), and your canonical service and portfolio URLs. Keep each entry factual and entity-dense.
- 4
Implement Person → Organisation → CreativeWork JSON-LD
Add JSON-LD to your site layout with stable @id anchors for your Person and Organization entities. Reference these anchors from article and case study pages using the author and publisher fields. This creates an entity graph that AI systems can traverse.
- 5
Add FAQPage JSON-LD to every article and landing page
Write 6-8 Q&A pairs per page. Each answer must be 40-60 words, self-contained (no pronouns requiring prior context), and factual. Add the FAQPage JSON-LD to every page that contains relevant Q&A content. This is the single highest-impact GEO change available.
- 6
Restructure content for answer-first extraction
Edit every article so the first 40-60 words after each H2 directly answer the question the heading poses. No preambles ("In this section..."). The answer first, the supporting detail after. Short paragraphs of 2-3 sentences. Mark key answer blocks with data-aeo attributes.
Want your site built for AI visibility from day one?
Describe your project in the intake form. I build all GEO layers into every site I deliver — llms.txt, JSON-LD entity graph, AI crawler permissions, and answer-first structure.
Start the briefFrequently Asked Questions
- How is GEO different from SEO?
- Traditional SEO optimises for crawlers that rank pages in a list. GEO optimises for AI systems that extract, synthesise, and cite content in generated answers. SEO success is measured by ranking position. GEO success is measured by citation frequency — whether your content appears as a source when an AI system answers a query in your domain.
- Does GEO replace traditional SEO?
- No. GEO builds on SEO — a site without solid technical SEO foundations (crawlability, canonical URLs, structured data, authoritative content) will not be cited by AI systems either. The relationship is additive: GEO adds layers of machine-readable signals on top of a healthy SEO foundation, it does not substitute for it.
- What is llms.txt and do I need it?
- llms.txt is an emerging convention (proposed by Jeremy Howard) for a plain-text file at the root of a domain that summarises the site's content, capabilities, and authoritative sources in a format optimised for large language models. It is not yet an official standard but is already indexed by several AI systems. Publishing one signals intentional AI-readiness and provides structured context that AI crawlers cannot always extract from HTML.
- Will adding FAQ schema actually get me cited in AI answers?
- FAQ schema (FAQPage JSON-LD) significantly increases the probability of content appearing in AI-generated responses. AI systems use structured data as a confidence signal — if the same information appears in both readable HTML and machine-readable JSON-LD, it is more likely to be extracted and cited as authoritative. Pages without any schema markup are consistently outperformed by schema-annotated pages in AI citation rate.
- How long does GEO take to show results?
- GEO results are slower than traditional SEO. AI systems update their knowledge bases on irregular schedules — some continuously, some quarterly. Initial crawling and indexing of new GEO signals typically takes 4–12 weeks. Measuring results requires actively querying AI systems (ChatGPT, Perplexity, Claude) with your target queries and tracking whether your site is cited.
- Can WordPress sites be GEO-optimized?
- Yes. WordPress renders server-side by default, which satisfies the SSR requirement. WordPress sites need: AI crawler permissions in robots.txt, FAQ and Article schema via a schema plugin or custom code, an llms.txt file created as a static file in the root, and answer-first content structure. Performance matters too — AI crawlers respect crawl delay signals and skip slow or unstable sites.
- What is the most important GEO change to make first?
- Verify AI crawler access in robots.txt. If GPTBot, ClaudeBot, PerplexityBot, or anthropic-ai are blocked — either directly in robots.txt or indirectly through Cloudflare WAF rules — no other GEO work matters because the site cannot be indexed by those systems. This is the prerequisite that most sites overlook when implementing GEO.
- How do I check if AI bots are accessing my site?
- Check your server access logs for user agent strings: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and Bingbot (which Microsoft uses for Copilot/AI search). If these user agents do not appear in your logs, verify your robots.txt allows them and check your CDN or WAF rules for bot-blocking rules. Cloudflare's managed bot detection, if set to block, will silently reject these crawlers.
Sources
- 1Google Search Central — AI Overviews documentation — Official guidance from Google on how AI Overviews select sources and what signals influence inclusion.
- 2llms.txt specification — Jeremy Howard (answer.ai) — The original proposal for the llms.txt convention and its intended machine-readable format.
- 3Schema.org — FAQPage, HowTo, Person, Organisation documentation — Authoritative documentation for all structured data types referenced in this guide.
- 4Bing Webmaster Tools — Copilot and AI crawler documentation — Microsoft documentation on how Bingbot crawl data is used in Copilot and AI-assisted search features.
- 5Search Engine Land — GEO full guide 2026 — Industry research on GEO implementation patterns and their correlation with AI citation rates.