Schema markup is the single biggest AI-search lever most sites never pull. When ChatGPT decides which five sources to synthesize into an answer, structured data is how it resolves ambiguity — which Lumina is this, who wrote it, when was it last updated, does this count as an FAQ. We audited the 10 top-ranking articles for schema markup for AI search across the US and DE markets. Zero of them ship FAQPage schema — on articles about schema. The playbook below is built from that gap.

This is the fourth piece in our GEO content cluster. The GEO vs SEO Pillar covers the big-picture comparison. SEO vs GEO vs AEO handles the three-way framework. SEO for AI Search walks through the six tactics. This one is the technical schema deep-dive.

Does Schema Markup Help AI Search?

Yes — but not the way most tools measure it. Rich-result eligibility (stars, FAQ accordions, recipe cards in the SERP) is the old payoff. The new payoff is entity clarity for the LLM.

When GPTBot or ClaudeBot fetches your HTML, the JSON-LD block tells them exactly what entities live on the page: the author, the organization, the publication date, the topic. AI models weight pages with explicit entity declarations higher than pages where they have to guess from the body text.

Google's public documentation confirms AI Overviews use structured data to surface rich results and identify authoritative sources. OpenAI's ChatGPT Search relies on Bing, which indexes and parses JSON-LD. Perplexity operates its own crawler (PerplexityBot) that ingests full HTML including embedded JSON-LD. Gemini reads from Google's index and inherits those structured-data signals directly.

The honest version: a page without schema is not penalized. A page with clean, consistent, layered schema is favored in the 5-source shortlist that becomes the final AI answer. Schema is not a ranking factor — it is a trust and disambiguation signal. Sometimes that is worth more.

SEO Schema vs GEO/AEO Schema

The two stacks overlap on most of the surface. The part that diverges is where every schema guide goes wrong.

DimensionClassic SEO SchemaAI Search Schema (GEO/AEO)
Primary goalRich-result eligibility in Google SERPEntity clarity for LLM retrievers
Critical typesProduct, Review, Recipe, HowTo, FAQPageArticle/BlogPosting, Person, Organization, FAQPage — all linked via @id
Success metricRich snippet appearsBrand named in AI answer
Depth vs breadthNarrow: 2-3 rich-result types per pageLayered: 4-6 connected entities per page
Drift toleranceStrict FAQ text match (Google revokes rich result)Strict FAQ + strict entity linking across pages
Freshness signaldateModified for Google freshness crawlsdateModified and wordCount consistent with visible body

The overlap is real. An Article schema with author, publisher, datePublished, and FAQPage covers both stacks. The 30% that diverges: AI search cares more about @id entity linking across the site, and it is less forgiving of stale or inaccurate wordCount.

The Schema Types AI Engines Actually Use

Based on public documentation from Google's structured-data team and reverse-engineering of which schemas get cited in AI Overviews, these are the types that move the needle:

  • Article / BlogPosting — tells LLMs this is editorial content and the author byline is authoritative. Required fields: headline, datePublished, author, publisher.
  • Person — the byline owner. Needs jobTitle, knowsAbout, and sameAs pointing to LinkedIn or a Wikidata profile if one exists. This is how AI resolves "is this author actually an expert on this topic."
  • Organization — the publisher. Logo, url, sameAs to social profiles. The sameAs array is what connects your brand to the broader entity graph.
  • FAQPage — question-answer pairs in a format AI retrievers can quote directly. The most-underused schema on the web, based on our audit of the top 10 articles above.
  • WebSite — site-level identity with a potentialAction for internal search. Still parsed by AI even when no SERP search box renders.
  • BreadcrumbList — hierarchical context. Helps AI understand topic scope (this page is under /blog/geo/, not /tools/).
  • HowTo — sunset as a Google Rich Result in September 2023 but still read by Perplexity, Gemini, and ChatGPT for step extraction. Keep it on tutorial content.
  • ImageObject with contextualmetadata — Google Lens, Perplexity Vision, and multimodal ChatGPT use this to resolve what an image depicts. Especially valuable for product and diagram content.

What is not a big deal for most pages: SoftwareApplication (unless you ship an app), Event (unless the page is an event listing), Recipe (unless it is a recipe), Product (only on commercial pages). Adding irrelevant schema types adds bytes without moving any signal.

The 5 Schemas That Move the Needle

Ordered by impact, with the specific pattern that works:

1. BlogPosting with @id self-ref and entity refs. Every article should declare "@id":"https://your-site.com/blog/post/#article" and reference author and publisher via {"@id":"..."}. Why: AI retrievers treat @id as "this is the article entity on this URL" — not one of several candidates. They trace the graph to resolve author and publisher without ambiguity.

2. Person schema declared once, referenced everywhere. One canonical Person block at /about/#founder or homepage #founder. Every article's author field is {"@id":"https://your-site.com/#founder"}. This prevents entity-splitting — if you declare Person inline on every article, AI may count them as different people with slightly different metadata.

3. FAQPage strict-synced to visible FAQ. Every question in the schema exactly matches a visible .faq-item. Same order. Same wording. Same HTML entities. Google revokes FAQ rich results on drift. AI summarizers do not revoke but they score drift lower on trust. Lumina uses a 9-line Perl script to validate this automatically before every commit.

4. Organization with sameAs social profiles. Your Organization schema has sameAs:[LinkedIn company page, Twitter/X, GitHub, Crunchbase, Wikidata entry if present]. The sameAs array is what resolves your brand to the Wikidata and entity graphs AI models are trained on. This is the single biggest "anonymous brand to known entity" conversion.

5. Accurate dateModified. Update it when the content changes. Not when you fix a typo in an HTML comment. Reality check: the 9 top-ranking articles that declared it had a 164-day average staleness. Three were over six months old; the oldest hit 354 days. Accurate freshness is a differentiator, not a baseline.

Live Audit · 2026-04-19

Top 10 articles for "schema markup for AI search" — what they miss.

Ran Lumina's Schema Validator against 5 top-ranked English articles (thehoth.com, seoptimer.com, schemaapp.com, evertune.ai, cmimediagroup.com) and 5 top-ranked German articles (hubspot.de, tryrivo.ai, az-direct.ch, rato-digital.de, seoptimer.com/de). Search Engine Land and Third Wunder returned 403 on fetch — a data point in itself.

10/10
miss FAQPage schema
Not a single top-ranking article ships FAQPage — the one schema AI engines most consistently quote. Every one writes about schema. None ships it. The lowest-effort first-mover win on the topic.
164 days
average dateModified age
Oldest: az-direct.ch at 354 days (April 2025). Three articles are over six months old. AI search tooling evolves monthly; six-month-old explainers are stale before they rank.
1/5
EN articles use @id entity refs
Only thehoth.com links Article to Person via @id in the English top 5. DE is far ahead at 3/5 (hubspot, tryrivo, rato-digital). The one entity pattern AI summarizers reward most is the one US-English content skips.
3/4 wrong
declared wordCount accuracy
6 articles skip wordCount entirely. Of the 4 that declare it, only tryrivo.ai is within ±10%. thehoth.com declares 1,612. Actual: 2,700. Off by 40%. A stale wordCount is an AI-citation red flag.
5–14 types
schema-type range
cmimediagroup.com ships 8 types but zero Article or BlogPosting on an article page. schemaapp.com ships 13 types including DefinedTerm and PropertyValue, yet still no FAQPage. Breadth without targeting is the pattern.
0/10
visible FAQ + matching schema
No competitor has a visible FAQ section with matching FAQPage schema. Google penalizes FAQ drift with rich-result revocation — but you cannot drift from nothing. The easiest win on the page everyone leaves unclaimed.

Run the same audit on any URL →

JSON-LD vs Microdata vs RDFa

Use JSON-LD. Google has officially preferred it since 2018. Every major AI retriever — ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews — parses JSON-LD reliably. Microdata still works but adds HTML noise without benefits in 2026. RDFa is valid but rarely used.

The only reason to still ship Microdata: a CMS plugin generates it and you cannot turn it off. In that case, layer JSON-LD on top — the two coexist without conflict, and Google prefers the JSON-LD version when both are present.

One JSON-LD block, inserted at the bottom of <head>, with one @graph holding all the entities for the page. That is the canonical modern pattern.

Common Schema Mistakes That Kill AI Citations

Six patterns we see repeatedly in client audits and in our own competitor analysis:

  • FAQ drift. Schema has five questions; HTML has seven. Or the schema text is paraphrased rather than exact-match. Google revokes FAQ rich results on this. AI summarizers do not revoke, but trust drops silently.
  • Orphan Person schema. You ship a Person block but never link it via @id from Article.author. AI cannot connect the byline to the brand, and the author signal does not land.
  • Stale dateModified. Content changes, dateModified does not. Or — the reverse, equally bad — dateModified bumps on a CSS-only change and Google learns to discount your freshness signal entirely.
  • wordCount that lies. Schema says 1,612. Page has 2,700. AI sees the mismatch and down-weights the trust score. Omit wordCount if you cannot keep it in sync.
  • Inline entity bloat. Every article declares Organization + Person + WebSite inline. AI counts each as a distinct entity and the brand signal fragments. Use one canonical declaration on the homepage plus @id references everywhere else.
  • Invented schema fields. applicationArea does not exist on schema.org. Invented fields do not throw errors but they silently invalidate the whole block in strict validators. Use only Schema.org-documented fields.

How to Validate Schema for AI Search

Two tools, two distinct jobs:

  • Google Rich Results Test — the Google-specific source of truth. Catches gaps that would disqualify you from Google rich-result eligibility. Use for every schema commit that touches a page with rich-result potential.
  • Lumina's Schema Validator — the AI-retrieval source of truth. Validates entity linking across pages (@id resolution), FAQPage strict-sync against visible HTML, deprecated types, wordCount freshness. Dogfooded against 72 of Lumina's own pages before every release.

For AI search specifically, run both. Google tells you about rich results. Lumina tells you about citation signals. They are complementary, not redundant.

FAQ

Does schema markup help AI search engines?+
Yes. Schema.org JSON-LD tells AI retrievers exactly what entities live on your page — author, organization, publication date, topic. AI models weight pages with clean structured data higher than pages where they have to guess from HTML. No schema is not a penalty. Clean layered schema is a favorable signal in the 5-source shortlist that becomes the final AI answer.
What is the difference between SEO schema and AI search schema?+
SEO schema optimizes for Google rich results — stars, FAQ accordions, recipe cards in the SERP. AI search schema optimizes for entity clarity so LLMs can trace your content back to a named author and organization. About 70% of the types overlap (Article, FAQPage, Organization). The 30% that diverge: AI search cares more about @id entity linking and accurate dateModified.
Which schema types should I add first for AI search?+
Four, in this order. Article or BlogPosting for editorial content with headline, datePublished, and author. Person for the author with jobTitle and sameAs LinkedIn. Organization for the publisher with logo and sameAs social profiles. FAQPage for any Q&A content. Link them via @id references so AI can trace the article back to a specific human and brand.
JSON-LD, Microdata, or RDFa for AI search?+
JSON-LD. Google has preferred it since 2018. Every major AI retriever — ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews — parses JSON-LD reliably. Microdata still works but adds HTML noise. RDFa is fine but rare. One block at the bottom of head, one graph, multiple @type entries. Done.
How do I know if my schema is helping AI citations?+
Indirectly. There is no GSC for ChatGPT in 2026. What you can do: validate with Lumina's Schema Validator (pass means the infrastructure is sound), track GA4 referral traffic from chatgpt.com, perplexity.ai, claude.ai, and gemini.google.com, and manually check each major AI platform for your target queries and record when your brand appears. The trend line is the signal.
Does llms.txt replace schema markup?+
No. llms.txt is a discovery and permissions file — here are my canonical pages, please use them. It has no documented ranking or citability effect from any major AI engine as of 2026. Schema.org JSON-LD is the semantic layer. The two are complementary: schema says what the content is, llms.txt says which pages matter. Ship schema first. llms.txt is an optional hint.

Where to Start

If you want AI-search-ready schema this week, do these five things in order:

Validate your current schema

Run Lumina's Schema Validator on your top 5 pages. Most sites find 2-3 gaps per page: missing @id refs, stale dateModified, FAQPage drift. Free, no signup.

Schema Validator →
Declare entities once, reference everywhere

Person and Organization live on your homepage or /about/. Every article's author and publisher are {"@id":"..."} refs. One source of truth, zero entity splitting.

See Lumina's pattern →
Add FAQPage to every FAQ section

Any <h2>FAQ</h2> block deserves FAQPage schema. Strict-match the text. Run a verify-sync check before every deploy — drift revokes rich results.

Schema Validator →
Audit the competitors ranking for your keyword

Run the same audit we ran here. You will find schemas they skip — those are first-mover wins. FAQPage is almost always one of them in 2026.

GEO Readiness Check →
Track AI referral traffic

Set up GA4 source tracking for chatgpt.com, perplexity.ai, claude.ai, gemini.google.com. Volumes are small today but the trend line in six months is what matters.

GA4 Dashboard →

Validate your schema against AI-citation signals

Lumina's free Schema Validator catches the exact gaps this audit found: missing FAQPage strict-sync, orphan Person blocks, stale dateModified, wordCount drift, broken @id refs. One paste or URL, no signup.

Run the Schema Validator →