Schema markup is the single biggest AI-search lever most sites never pull. When ChatGPT decides which five sources to synthesize into an answer, structured data is how it resolves ambiguity — which Lumina is this, who wrote it, when was it last updated, does this count as an FAQ. We audited the 10 top-ranking articles for schema markup for AI search across the US and DE markets. Zero of them ship FAQPage schema — on articles about schema. The playbook below is built from that gap.
This is the fourth piece in our GEO content cluster. The GEO vs SEO Pillar covers the big-picture comparison. SEO vs GEO vs AEO handles the three-way framework. SEO for AI Search walks through the six tactics. This one is the technical schema deep-dive.
Does Schema Markup Help AI Search?
Yes — but not the way most tools measure it. Rich-result eligibility (stars, FAQ accordions, recipe cards in the SERP) is the old payoff. The new payoff is entity clarity for the LLM.
When GPTBot or ClaudeBot fetches your HTML, the JSON-LD block tells them exactly what entities live on the page: the author, the organization, the publication date, the topic. AI models weight pages with explicit entity declarations higher than pages where they have to guess from the body text.
Google's public documentation confirms AI Overviews use structured data to surface rich results and identify authoritative sources. OpenAI's ChatGPT Search relies on Bing, which indexes and parses JSON-LD. Perplexity operates its own crawler (PerplexityBot) that ingests full HTML including embedded JSON-LD. Gemini reads from Google's index and inherits those structured-data signals directly.
The honest version: a page without schema is not penalized. A page with clean, consistent, layered schema is favored in the 5-source shortlist that becomes the final AI answer. Schema is not a ranking factor — it is a trust and disambiguation signal. Sometimes that is worth more.
SEO Schema vs GEO/AEO Schema
The two stacks overlap on most of the surface. The part that diverges is where every schema guide goes wrong.
| Dimension | Classic SEO Schema | AI Search Schema (GEO/AEO) |
|---|---|---|
| Primary goal | Rich-result eligibility in Google SERP | Entity clarity for LLM retrievers |
| Critical types | Product, Review, Recipe, HowTo, FAQPage | Article/BlogPosting, Person, Organization, FAQPage — all linked via @id |
| Success metric | Rich snippet appears | Brand named in AI answer |
| Depth vs breadth | Narrow: 2-3 rich-result types per page | Layered: 4-6 connected entities per page |
| Drift tolerance | Strict FAQ text match (Google revokes rich result) | Strict FAQ + strict entity linking across pages |
| Freshness signal | dateModified for Google freshness crawls | dateModified and wordCount consistent with visible body |
The overlap is real. An Article schema with author, publisher, datePublished, and FAQPage covers both stacks. The 30% that diverges: AI search cares more about @id entity linking across the site, and it is less forgiving of stale or inaccurate wordCount.
The Schema Types AI Engines Actually Use
Based on public documentation from Google's structured-data team and reverse-engineering of which schemas get cited in AI Overviews, these are the types that move the needle:
- Article / BlogPosting — tells LLMs this is editorial content and the author byline is authoritative. Required fields: headline, datePublished, author, publisher.
- Person — the byline owner. Needs
jobTitle,knowsAbout, andsameAspointing to LinkedIn or a Wikidata profile if one exists. This is how AI resolves "is this author actually an expert on this topic." - Organization — the publisher. Logo, url, sameAs to social profiles. The sameAs array is what connects your brand to the broader entity graph.
- FAQPage — question-answer pairs in a format AI retrievers can quote directly. The most-underused schema on the web, based on our audit of the top 10 articles above.
- WebSite — site-level identity with a potentialAction for internal search. Still parsed by AI even when no SERP search box renders.
- BreadcrumbList — hierarchical context. Helps AI understand topic scope (this page is under /blog/geo/, not /tools/).
- HowTo — sunset as a Google Rich Result in September 2023 but still read by Perplexity, Gemini, and ChatGPT for step extraction. Keep it on tutorial content.
- ImageObject with contextualmetadata — Google Lens, Perplexity Vision, and multimodal ChatGPT use this to resolve what an image depicts. Especially valuable for product and diagram content.
What is not a big deal for most pages: SoftwareApplication (unless you ship an app), Event (unless the page is an event listing), Recipe (unless it is a recipe), Product (only on commercial pages). Adding irrelevant schema types adds bytes without moving any signal.
The 5 Schemas That Move the Needle
Ordered by impact, with the specific pattern that works:
1. BlogPosting with @id self-ref and entity refs. Every article should declare "@id":"https://your-site.com/blog/post/#article" and reference author and publisher via {"@id":"..."}. Why: AI retrievers treat @id as "this is the article entity on this URL" — not one of several candidates. They trace the graph to resolve author and publisher without ambiguity.
2. Person schema declared once, referenced everywhere. One canonical Person block at /about/#founder or homepage #founder. Every article's author field is {"@id":"https://your-site.com/#founder"}. This prevents entity-splitting — if you declare Person inline on every article, AI may count them as different people with slightly different metadata.
3. FAQPage strict-synced to visible FAQ. Every question in the schema exactly matches a visible .faq-item. Same order. Same wording. Same HTML entities. Google revokes FAQ rich results on drift. AI summarizers do not revoke but they score drift lower on trust. Lumina uses a 9-line Perl script to validate this automatically before every commit.
4. Organization with sameAs social profiles. Your Organization schema has sameAs:[LinkedIn company page, Twitter/X, GitHub, Crunchbase, Wikidata entry if present]. The sameAs array is what resolves your brand to the Wikidata and entity graphs AI models are trained on. This is the single biggest "anonymous brand to known entity" conversion.
5. Accurate dateModified. Update it when the content changes. Not when you fix a typo in an HTML comment. Reality check: the 9 top-ranking articles that declared it had a 164-day average staleness. Three were over six months old; the oldest hit 354 days. Accurate freshness is a differentiator, not a baseline.
Top 10 articles for "schema markup for AI search" — what they miss.
Ran Lumina's Schema Validator against 5 top-ranked English articles (thehoth.com, seoptimer.com, schemaapp.com, evertune.ai, cmimediagroup.com) and 5 top-ranked German articles (hubspot.de, tryrivo.ai, az-direct.ch, rato-digital.de, seoptimer.com/de). Search Engine Land and Third Wunder returned 403 on fetch — a data point in itself.
@id in the English top 5. DE is far ahead at 3/5 (hubspot, tryrivo, rato-digital). The one entity pattern AI summarizers reward most is the one US-English content skips.JSON-LD vs Microdata vs RDFa
Use JSON-LD. Google has officially preferred it since 2018. Every major AI retriever — ChatGPT, Claude, Perplexity, Gemini, Google AI Overviews — parses JSON-LD reliably. Microdata still works but adds HTML noise without benefits in 2026. RDFa is valid but rarely used.
The only reason to still ship Microdata: a CMS plugin generates it and you cannot turn it off. In that case, layer JSON-LD on top — the two coexist without conflict, and Google prefers the JSON-LD version when both are present.
One JSON-LD block, inserted at the bottom of <head>, with one @graph holding all the entities for the page. That is the canonical modern pattern.
Common Schema Mistakes That Kill AI Citations
Six patterns we see repeatedly in client audits and in our own competitor analysis:
- FAQ drift. Schema has five questions; HTML has seven. Or the schema text is paraphrased rather than exact-match. Google revokes FAQ rich results on this. AI summarizers do not revoke, but trust drops silently.
- Orphan Person schema. You ship a Person block but never link it via
@idfromArticle.author. AI cannot connect the byline to the brand, and the author signal does not land. - Stale dateModified. Content changes, dateModified does not. Or — the reverse, equally bad — dateModified bumps on a CSS-only change and Google learns to discount your freshness signal entirely.
- wordCount that lies. Schema says 1,612. Page has 2,700. AI sees the mismatch and down-weights the trust score. Omit wordCount if you cannot keep it in sync.
- Inline entity bloat. Every article declares Organization + Person + WebSite inline. AI counts each as a distinct entity and the brand signal fragments. Use one canonical declaration on the homepage plus
@idreferences everywhere else. - Invented schema fields.
applicationAreadoes not exist on schema.org. Invented fields do not throw errors but they silently invalidate the whole block in strict validators. Use only Schema.org-documented fields.
How to Validate Schema for AI Search
Two tools, two distinct jobs:
- Google Rich Results Test — the Google-specific source of truth. Catches gaps that would disqualify you from Google rich-result eligibility. Use for every schema commit that touches a page with rich-result potential.
- Lumina's Schema Validator — the AI-retrieval source of truth. Validates entity linking across pages (
@idresolution), FAQPage strict-sync against visible HTML, deprecated types, wordCount freshness. Dogfooded against 72 of Lumina's own pages before every release.
For AI search specifically, run both. Google tells you about rich results. Lumina tells you about citation signals. They are complementary, not redundant.
FAQ
Where to Start
If you want AI-search-ready schema this week, do these five things in order:
Run Lumina's Schema Validator on your top 5 pages. Most sites find 2-3 gaps per page: missing @id refs, stale dateModified, FAQPage drift. Free, no signup.
Person and Organization live on your homepage or /about/. Every article's author and publisher are {"@id":"..."} refs. One source of truth, zero entity splitting.
Any <h2>FAQ</h2> block deserves FAQPage schema. Strict-match the text. Run a verify-sync check before every deploy — drift revokes rich results.
Run the same audit we ran here. You will find schemas they skip — those are first-mover wins. FAQPage is almost always one of them in 2026.
GEO Readiness Check →Set up GA4 source tracking for chatgpt.com, perplexity.ai, claude.ai, gemini.google.com. Volumes are small today but the trend line in six months is what matters.
GA4 Dashboard →Validate your schema against AI-citation signals
Lumina's free Schema Validator catches the exact gaps this audit found: missing FAQPage strict-sync, orphan Person blocks, stale dateModified, wordCount drift, broken @id refs. One paste or URL, no signup.