Schema Markup: The Complete Guide (2026)

Schema markup is the cheapest SEO win most sites never ship. You add a JSON block to the head of a page, you get richer Google results, clearer AI citations, and an entity graph that tells search engines exactly who wrote what and when. It is a solved problem with a free open vocabulary at schema.org. And yet, when I ran Lumina's Schema Validator against the 5 top-ranking guides on Google for schema markup, the pattern was bleak: zero of them ship FAQPage schema on a page about schema, the average dateModified staleness across all 5 is 380 days (Ahrefs leads at 963), and Umbraco's 9,563-word guide ships only a bare WebPage block — no Article, no BlogPosting, no Organization.

This guide is the complete evergreen reference for schema markup in 2026. What the vocabulary is, which formats Google actually wants, the eight types that cover 95% of sites, how to add schema by hand and through WordPress, how to validate it, which types Google quietly sunset, and the five mistakes I see on client audits every single week. Code examples throughout. No fluff. If you want the narrower AI-search-specific angle instead, read Schema Markup for AI Search — this one covers everything.

What Schema Markup Actually Is

Schema markup is a shared vocabulary for describing what a page is about. It was launched in 2011 and is governed today by schema.org, a project founded by Google, Microsoft, Yahoo, and Yandex and coordinated via the W3C Schema.org Community Group since 2015. The vocabulary now defines more than 800 types and 1,500 properties covering articles, products, events, people, organizations, recipes, reviews, software, courses, medical conditions, and more.

The vocabulary itself is just a taxonomy. The markup part is how you wire it into an HTML page so crawlers can read it. In 2026 that means JSON-LD: a small script block placed in the head of the page that declares entities and their relationships.

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Schema Markup: The Complete Guide",
  "author": { "@type": "Person", "name": "Julien El-Bahy" },
  "datePublished": "2026-04-21"
}
</script>

That eight-line block tells every search engine and AI retriever on the web: this page is an Article, here is the headline, here is the author, here is when it was published. Without that block, Google reads your page as an HTML blob and has to guess at those answers from the body text. With the block, the answers are explicit. The ambiguity is gone.

Schema markup is not a ranking factor in the traditional "add it and move up 3 positions" sense. Google's Search Central team has said this on record repeatedly, most persistently via John Mueller across Search Off the Record episodes and SEO office-hours. What schema does is qualify you for rich results (the enhanced SERP blocks with stars, FAQs, prices, breadcrumbs, video thumbnails) and disambiguate your content for AI retrievers like ChatGPT Search, Perplexity, Claude, Gemini, and Google AI Overviews. Higher click-through rate on the same SERP position is not a ranking bump on paper, but the traffic looks identical.

JSON-LD vs Microdata vs RDFa

Three formats exist. One of them is the answer in 2026.

JSON-LD (JavaScript Object Notation for Linked Data) lives in a <script type="application/ld+json"> block in the head. It is separate from the visible HTML, which means you can add, edit, and validate it without touching the rest of the page. Google's Search Central docs explicitly recommend JSON-LD as the preferred format. Every major AI retriever parses it reliably.

Microdata sprinkles attributes directly into the visible HTML: itemscope, itemtype, itemprop. It works but it clutters your markup and makes diffs harder to read. Old CMS plugins still generate it. Leave them alone if they work; layer JSON-LD on top, the two coexist.

RDFa is the W3C-native format. Valid, rare, mostly used in academic publishing and government data portals. If your stack already emits it, fine. If you are starting fresh in 2026, you will not pick it.

Pick JSON-LD. Stop thinking about the format.

One script block in the head, one @graph wrapping all your entities, done. Every example in this guide uses JSON-LD. Every rich result Google supports accepts JSON-LD. Every AI retriever reads it. There is no scenario in 2026 where Microdata or RDFa is the better choice for a site you are actively building.

The Schema Types That Matter Most

Schema.org defines 823 types. The top eight cover 95% of real websites. Rank them by what your page actually is, not by what looks impressive in a validator.

1. Article / BlogPosting

Editorial content. Blog posts, news, guides. Tells crawlers this is not a product, not a landing page: it's a piece of writing with a byline and a date. Required fields per Google: none. Recommended: headline, author, datePublished, dateModified, image, publisher. The most-used schema on the web after Organization.

{
  "@context": "https://schema.org",
  "@type": "BlogPosting",
  "headline": "Schema Markup: The Complete Guide",
  "datePublished": "2026-04-21",
  "dateModified": "2026-04-21",
  "author": { "@id": "https://your-site.com/#founder" },
  "publisher": { "@id": "https://your-site.com/#organization" }
}

2. Product

Anything for sale. Google rewards this with price, availability, and star-rating display in the SERP. Required: name plus one of review, aggregateRating, or offers. Skip it at your own risk on a product page. The CTR lift from a star snippet is real and measurable.

3. LocalBusiness

Any business with a physical address. Powers the Google Knowledge Panel, the Maps pack, opening hours display. Required: name and address. Recommended: telephone, openingHoursSpecification, geo, image, priceRange. The single highest-impact schema for any business with a storefront.

4. FAQPage

Question-answer blocks. Eligibility tightened in 2023. Google now only shows the FAQ rich result for authoritative government and health sites in most markets. But AI retrievers still quote FAQPage text directly when answering user queries. The strict rule: the schema question and answer text must match the visible HTML word-for-word. Drift revokes the rich result.

5. BreadcrumbList

Hierarchical navigation. Shows breadcrumbs in the SERP instead of the raw URL. Tiny visual win, non-negotiable on sites with a deep structure (e-commerce, large content hubs). Required: itemListElement array with position, name, and item per entry.

6. Organization

Your brand identity. Logo, URL, social profiles via sameAs, legal name, founding date. Declared once on your homepage at /#organization, referenced everywhere else via @id. The sameAs array is how AI models connect your brand to its Wikidata entry and broader entity graph.

7. Person

The author byline. Declared once at /about/#founder or similar, referenced on every article. Recommended: jobTitle, knowsAbout, sameAs pointing to LinkedIn, GitHub, Twitter. This is how AI resolves whether the byline is an actual expert on the topic.

8. Event

Anything with a start date and a location: webinar, concert, conference, sale event. Required: name, startDate, location. Recommended: endDate, eventStatus, eventAttendanceMode (online / offline / mixed), offers for paid events.

Niche-but-valuable types

Recipe. Cooking content. Strong rich-result support with preparation time, calories, rating stars.
HowTo. Step-by-step tutorials. Google sunset the rich result in September 2023 but Perplexity, Claude, ChatGPT still read it for step extraction. Keep it on tutorials.
VideoObject. Any page where a video is the main content. Powers the video thumbnail in Google Search.
SoftwareApplication / WebApplication. Apps and tools. Lumina's tool pages use WebApplication.
Review / AggregateRating. User reviews of any reviewable thing. Usually embedded inside Product or LocalBusiness rather than standalone.
Course. Google documented a "Course info" rich result in late 2023 and it remains supported today. Not deprecated, despite what some older guides still claim.
JobPosting. Listings. Powers Google for Jobs.

What not to ship: schema types that do not match what your page actually is. A Dataset block on a blog post, a Vehicle block on a homepage, a MedicalCondition block on a marketing page. Irrelevant types add bytes and invite Google to flag your markup as misleading.

How to Add Schema Markup to a Page

Three common paths. Pick the one that matches your stack.

Plain HTML: paste and ship

If you control the HTML directly (static site, hand-coded page, framework template), add the JSON-LD block at the bottom of the <head>. One block per page, one @graph wrapping all entities, separate blocks for FAQPage if you have one:

<head>
  <!-- ...other head tags... -->
  <script type="application/ld+json">
  {
    "@context": "https://schema.org",
    "@graph": [
      {
        "@type": "BlogPosting",
        "@id": "https://your-site.com/blog/post/#article",
        "headline": "...",
        "author": { "@id": "https://your-site.com/#founder" },
        "publisher": { "@id": "https://your-site.com/#organization" },
        "datePublished": "2026-04-21",
        "dateModified": "2026-04-21"
      },
      {
        "@type": "Person",
        "@id": "https://your-site.com/#founder",
        "name": "Author Name",
        "jobTitle": "Title",
        "sameAs": ["https://www.linkedin.com/in/..."]
      },
      {
        "@type": "Organization",
        "@id": "https://your-site.com/#organization",
        "name": "Your Brand",
        "url": "https://your-site.com",
        "logo": "https://your-site.com/logo.png"
      }
    ]
  }
  </script>
</head>

That one block handles an entire article page. Person and Organization are declared once, referenced via @id across every other article. No duplication.

WordPress — plugin or theme

If you run WordPress, schema is almost certainly already half-set-up. Yoast SEO, Rank Math, and The SEO Framework all emit baseline Article, Organization, WebSite, and BreadcrumbList schema out of the box. Rank Math and Yoast both ship dedicated FAQPage and HowTo blocks in the Gutenberg editor.

The gotcha: default plugin schema is generic. Organization.sameAs will be empty until you fill in social URLs in plugin settings. Author bios often ship as bare Person with only a name, no jobTitle, no sameAs. The fix is the Schema Settings page in whichever plugin you use, not custom code. Set it once per site, inherit everywhere.

For complex schema that plugins do not cover (LocalBusiness with multiple locations, Product variants with ProductGroup, Event series), use a Custom Fields plugin plus an add_action('wp_head', ...) hook in a child theme to inject hand-written JSON-LD. Do not fight the plugin; layer alongside it.

Next.js, React, Vue, Astro: framework patterns

Every modern framework has a head-injection primitive. Next.js: next/head or the App Router's metadata API with a <Script> of type application/ld+json. React with React Helmet: <Helmet><script type="application/ld+json">{JSON.stringify(schema)}</script></Helmet>. Vue Nuxt: useHead composable with a script entry. Astro: plain <script> tag in the component head, with Astro stripping framework overhead so the markup ships as static HTML.

Common mistake across all frameworks: dangerouslySetInnerHTML with a JSON string that has been double-stringified, producing escaped quotes inside the output and an invalid block. Always pass a plain object through JSON.stringify() once, never twice. If you see " in the rendered HTML, you double-encoded somewhere.

Does Schema Markup Still Help SEO in 2026?

Direct answer: it does not push your blue-link position up the page. It does three other things that still matter.

Rich results. Qualifying schema types reveal enhanced SERP blocks: stars on products, prices on offers, FAQ accordions on qualifying government and health sites, recipe cards with prep time and calories, breadcrumb trails replacing the raw URL, video thumbnails for VideoObject. CTR on rich-result listings is measurably higher than plain blue links at the same position across every third-party study I have seen (Ahrefs, SISTRIX, Backlinko, Rockdove among them). The exact lift varies by rich-result type and industry, but the direction is consistent.

AI citation. ChatGPT, Claude, Perplexity, and Gemini all parse JSON-LD when ingesting pages. The clearer your entity graph, the more likely your brand is cited by name (rather than summarized anonymously) in AI answers. There is no public study quantifying this yet. AI referral traffic in GA4 is still a single-digit-percent slice for most sites, but the trend line is what matters, not the current month. In Lumina's own GEO Readiness Checker, schema signals account for ~35% of the 42-check audit score: JSON-LD presence (weight 10), Organization schema (weight 8), absolute @id (5), WebSite schema (3), Person schema (3), sameAs array (3), FAQPage (3).

Knowledge Graph. Organization and Person schema with a strong sameAs array (LinkedIn, Wikidata, GitHub, Crunchbase, Twitter, industry directories) is how Google connects your brand to its Knowledge Graph entity. Once you have an entity, you get the blue-check brand panel in the SERP, your logo shows in answer cards, your name resolves without disambiguation.

What schema does not do: it does not fix thin content, broken internal links, slow pages, or a site that is not crawlable. If the underlying page is weak, no JSON-LD block rescues it.

Live Audit · 2026-04-21

Top 5 ranking pages for “schema markup” on Google: what they miss.

Ran Lumina's Schema Validator (JS-rendered fetch) against the 5 organic article results in position 3 through 7 on google.com for schema markup: Semrush, Ahrefs, Umbraco, Schema App, Moz. These are the guides new readers most often land on.

0/5

ship FAQPage schema

Not Semrush, not Ahrefs, not Moz, not Schema App, not Umbraco. Every one writes about schema types including FAQPage. Zero demonstrate it. The easiest first-mover win on the topic.

WebPage only

Umbraco's entire stack

Umbraco's 9,563-word guide at position 5 on Google ships just a single WebPage block. No Article, no BlogPosting, no Organization, no BreadcrumbList, no Person. Bare minimum schema on a page literally titled "Schema Markup."

380 days

avg dateModified staleness

Ahrefs leads at 963 days stale (Sept 2023). Schema App 435. Moz 378. Semrush fresh at 78. Umbraco freshest at 48. Wide gap between the top and bottom. Consistency across an evergreen topic is rare.

31 vs 1

@id ref range

Schema App ships 31 @id refs. The gold standard. Umbraco 1, Semrush 1, Moz 2, Ahrefs 12. Only Schema App and Ahrefs treat entity linking as a design principle; everyone else touches it accidentally.

23% off

Ahrefs wordCount accuracy

Ahrefs is the only one of the 5 that declares wordCount in schema. It says 1,838. Actual article body: 2,377 words. A 23% understatement, stale since September 2023. A wordCount that lies is an AI-retriever trust mismatch.

0/5

visible FAQ sections

No visible <h2>FAQ</h2> block on any of the 5. No .faq-item elements. The FAQ gap is not just missing schema; the competitors do not structure the content as FAQ at all. Open goal.

Run the same audit on any URL →

Deprecated & Sunset Schema Types

Google has quietly deprecated a handful of schema types. Guides published in 2022 still list them as current. Lumina's Schema Validator tracks seven in its DEPRECATED_TYPES list, plus ClaimReview as a special case. Four categories cover every state a schema type can be in:

Active and fully supported. The big eight above (Article, Product, LocalBusiness, FAQPage, BreadcrumbList, Organization, Person, Event) plus Recipe, VideoObject, Review, AggregateRating, Course, JobPosting, QAPage, DiscussionForumPosting, ProfilePage, MathSolver, ProductGroup. All qualify for current rich results. Keep shipping.

Active but with eligibility restrictions. FAQPage rich result is now limited to authoritative government and health sites in most markets. The schema is still valid for every site; the SERP accordion just will not appear for most. HowTo rich result was sunset by Google on September 14, 2023; the schema remains valid and Perplexity, Claude, and ChatGPT still read it for step extraction. ClaimReview is being phased out of Google Search display but remains supported by Google's Fact Check Explorer Tool.

Sunset entirely from Google Search. The exact list in Lumina's Schema Validator as of 2026-04-21: HowTo, EducationalOccupationalCredential, Vehicle, PracticeProblems, SpecialAnnouncement, EstimatedSalary, LearningVideo. All seven are valid schema.org markup and some still help AI retrievers with entity understanding, but none trigger a dedicated Google rich result anymore. If your CMS still ships SpecialAnnouncement blocks from the COVID era, clean them out on the next touch.

Commonly miscategorized as deprecated. Two trip people up. Course is the big one. Google documented a "Course info" rich result in late 2023 and it remains supported today, and Coursera, edX, MIT OpenCourseWare currently rank for it. Some 2022 guides still list it as sunset; they are wrong. QAPage is the other: its rich result is smaller than it used to be but the type remains supported (I wrongly had it in my own Schema Validator's DEPRECATED_TYPES list until an external-URL audit caught it on 2026-04-12). If a validator flags either of these as deprecated in 2026, the validator is out of date.

Severity matters. Deprecated schema is not broken schema.

A sunset type still validates. It is still valid schema.org markup. AI retrievers still use it for entity understanding. The only downside is that you will not get a Google SERP rich result for it. If the type accurately describes your content, leaving it in place is fine. Actively stripping valid-but-unsupported schema from a page is rarely worth the effort.

The Most Common Schema Mistakes

I have audited more than a hundred client sites and dogfooded Lumina's Schema Validator against 72 internal pages and 21 external sites spanning recipes, news, shopping, tutorials, reviews, and reference content. Six patterns produce most of the production failures.

1. FAQ drift. Schema declares five questions; the visible HTML has six. Or the schema's acceptedAnswer.text is a summary of the visible answer rather than the exact text. Google revokes FAQ rich result eligibility on this. Their validator enforces word-for-word text match. A single word edit in the HTML without a parallel edit in the schema is enough to break it. The fix is process, not code: strict-sync the schema and the HTML in the same commit, run a verify script before every deploy.

2. Orphan Person or Organization. You declare a Person block with author metadata but never link it via @id from Article.author. Or the Article's author is {"@type":"Person","name":"Author"} inline, while a separate Person block with the real sameAs array sits unconnected on the same page. AI retrievers treat the inline byline and the orphan block as two different people. Entity signal fragments. The fix: one canonical Person declaration, referenced via @id from every article.

3. Stale dateModified. Two mirror-image bugs. Version A: content changes, dateModified does not. AI retrievers and Google both use dateModified as a freshness signal, and they compare it against your actual last-edit date in their crawl history. A schema that says "modified 2023" on a page that visibly got a 2026 rewrite is a mismatch. Version B, equally bad: dateModified bumps on every touch. CSS-only changes, typo fixes in HTML comments, nav-menu updates. Google learns your freshness signal is unreliable and discounts it entirely. Fix: bump dateModified only on meaningful content changes.

4. Invented fields. applicationArea does not exist on schema.org. Neither does targetAudience on Article. Neither does coverImage. Inventing fields does not throw an error in most validators (they silently ignore unknown properties), but Google's Rich Results Test flags them and strict AI-parser pipelines discount the whole block. Use only schema.org-documented fields. When in doubt, check the vocabulary page for the type you are using.

5. Schema breadth without schema relevance. You ship 14 schema types on a blog post. Article, BlogPosting, WebPage, WebSite, Organization, Person, ImageObject, BreadcrumbList, ListItem, PropertyValue, DefinedTerm, ReadAction, EntryPoint, SearchAction. Half of those do not describe what the page is. They are hedges. Adding a DefinedTerm block when the page is not defining a term signals to Google that your markup is padding rather than describing. Ship 3-5 well-linked entities, not 14 drive-by ones.

6. Raw HTML tags in FAQ Answer.text. Google whitelists 13 tags inside Answer.text — h1-h6, br, ol, ul, li, a, p, div, b, strong, i, em. Every other raw tag is silently stripped. Prose like “elements such as <nav>, <main>, <aside>” renders as “elements such as , , ” after parsing. I caught this on Lumina's own site on May 9, 2026 — 14 of 656 Answer texts across 102 FAQPage files had the bug, including the Semantic Checker and the Image SEO Checker, both of which mention HTML tag names in their FAQ prose. The fix: escape every angle bracket in Answer.text as < / > regardless of whether the tag is whitelisted. The visible HTML on the page already does this; the JSON-LD just needs to match.

How to Validate Schema Markup

Three tools, three distinct jobs. Every schema-touching commit should run through all three, since they catch different failure classes.

Google Rich Results Test. The Google-specific source of truth. Takes a URL or a code snippet, reports which rich result types you qualify for, and flags missing required fields. This is what Google's crawler actually sees. Use it on every schema-touching commit to a production page.
Schema.org Validator. The vocabulary-level source of truth. Reports syntax errors, invalid fields, and ambiguous type references against the full 823-type schema.org vocabulary. Catches "invented fields" faster than the Google tester does.
Lumina Schema Validator. Free, no signup, bulk mode, and three checks the others do not run. Full breakdown below.

Lumina's Schema Validator exists because the other two miss three production failure classes I kept hitting on client audits:

FAQPage strict-sync against visible HTML. The validator extracts both the JSON-LD and every visible .faq-item on the page, compares them question-by-question, answer-by-answer. Google's tester passes schema that mismatches the visible text; Google Search then quietly revokes the rich result weeks later. Lumina's tool catches the mismatch before the deploy.
Cross-page @id resolution. If your Article's author is {"@id":"https://your-site.com/#founder"} (the canonical pattern), the validator follows that reference to the actual declaration on your homepage or about page, then validates the resolved Person entity. Google's tester validates each page in isolation and misses this entirely.
Up-to-date deprecation catalog. Built from Google's current structured-data docs, not generic schema.org theory. Flags HowTo as sunset, ClaimReview as phasing out, warns against invented fields like applicationArea. Does not wrongly flag Course or QAPage as deprecated (I fixed those during my own Schema Validator sweep on 2026-04-12).

The workflow: validate with Schema.org while you are writing the block, spot-check with Google Rich Results Test before you ship, run Lumina's Schema Validator on the live URL after deploy. Bulk audits across a site always go through Lumina's tool, since neither of the others does batch.

FAQ

What is schema markup in simple terms?+

Schema markup is a shared vocabulary for labeling what a page is about so search engines and AI models can read it without guessing. The spec lives at schema.org. In practice, you drop a JSON-LD block in the head of the page that says: this is an Article, the author is a Person named X, the publisher is an Organization named Y, it was updated on Z. Google, Bing, ChatGPT, Perplexity, Gemini all parse it the same way.

Does schema markup help with SEO rankings in 2026?+

Schema is not a direct ranking factor. Google's Search Central team has said this on record repeatedly, most persistently via John Mueller. It does not move positions on its own. What it does: qualifies you for rich results (stars, FAQ accordions, recipe cards, product prices in the SERP) and gives AI retrievers an unambiguous entity graph. Both lift click-through rate. Higher CTR on an existing position is indistinguishable from a ranking bump in practice.

What are the most important types of schema markup?+

Eight cover 95% of sites. Article or BlogPosting for editorial pages. Product for e-commerce. LocalBusiness for any business with a physical location. FAQPage for Q&A sections. BreadcrumbList for site hierarchy. Organization and Person for the publisher and author identities. Event for anything with a startDate and location. Recipe and HowTo are niche but have strong rich-result support when they fit.

JSON-LD, Microdata, or RDFa: which format should I use?+

JSON-LD. Google's Search Central docs explicitly recommend it as the preferred format. One script block in the head, no HTML pollution, clean diff in version control. Microdata still works but sprinkles attributes through your markup. RDFa is valid but rare outside academic publishing. Every AI retriever (ChatGPT, Claude, Perplexity, Gemini) reads JSON-LD reliably. Pick JSON-LD and never think about the format choice again.

How do I validate my schema markup?+

Two tools, two jobs. Google Rich Results Test (search.google.com/test/rich-results) tells you whether you qualify for a specific rich-result type in Google SERP. The Schema.org validator (validator.schema.org) tells you whether the syntax is legal against the full vocabulary. Lumina's free Schema Validator does both plus cross-checks FAQPage text against visible HTML, and that last check is where most production schema silently fails.

What are the most common schema markup mistakes?+

Six patterns account for most audit findings. FAQ drift — the schema text no longer matches the visible HTML word for word. Orphan Person blocks — author declared but never linked via @id from the Article. Stale dateModified — the timestamp never moves even when you edit the body. Inventing fields that do not exist on schema.org (applicationArea, targetAudience on an Article). Adding 15 types when 3 would do — breadth without relevance adds bytes, not signal. Raw HTML tags in FAQ Answer.text — Google strips anything outside the 13-tag whitelist, so prose mentioning tag names like <nav> needs escaping in the JSON.

Where to Start

Five moves, in order. Fewer than 90 minutes of work for a small site, two to three hours for a medium one:

Audit what you already have

Run Lumina's Schema Validator on your top 5 pages: homepage, 2 category or landing pages, 2 articles. Most sites discover that half the expected schema is missing and the other half has stale fields. Free, no signup.

Schema Validator →

Ship Organization + Person once

One Organization block on the homepage with logo, URL, and a rich sameAs array. One Person block on your about page. Every other page references them via @id. This is the entity backbone.

See Lumina's pattern →

Add Article or Product to every content page

Blog posts get BlogPosting. Product pages get Product. Author fields reference your canonical Person. Publisher fields reference your canonical Organization. No inline duplicates.

Validate output →

Add FAQPage to every FAQ section

Any <h2>FAQ</h2> block on the site deserves matching FAQPage schema. Strict-match the text. Add a verify-sync check to your deploy pipeline, since the drift revokes the rich result.

FAQ cross-check →

Validate before every deploy

Wire Google Rich Results Test into your release checklist. Lumina's Schema Validator for bulk URL audits. Catch regressions before they ship rather than when a client reports a missing rich result.

Schema Validator →

Audit your schema markup in 30 seconds

Lumina's free Schema Validator catches the exact gaps this guide describes: missing FAQPage strict-sync, orphan Person blocks, stale dateModified, deprecated types, invented fields. One paste or URL, no signup, bulk mode available.

Run the Schema Validator →