Most AI crawlers — GPTBot, ClaudeBot, CCBot, PerplexityBot — read the raw HTML your server returns and stop there. They do not execute JavaScript. For sites that ship a React, Vue, or Angular shell with the actual content hydrated client-side, the crawler sees a blank shell: navigation, placeholders, and a JavaScript bundle — no articles, no headlines, no teasers.
I ran 50 of the largest DACH media homepages through the Lumina JS vs No-JS tool: one fetch for raw HTML, one for Chrome-rendered DOM, word counts compared. 35 sites returned clean comparable data. Seven of those 35 hide a quarter or more of their text behind JavaScript. News.at hides 100 %: the raw HTML contains nothing but navigation.
Two ways a site can disappear from ChatGPT
Active block: robots.txt or WAF rules tell AI bots to go away. A deliberate decision, usually made once in 2023 and rarely revisited.
Passive hiding: the site technically lets AI bots in, but the content is rendered client-side. Raw HTML contains a shell with no actual text. This is usually not a decision at all; it is a side effect of a CSR-first framework choice.
A site can do both. Tagesschau blocks all AI bots in robots.txt AND hides 35 % of its homepage text behind Vue hydration. Double lockout.
What I found in their robots.txt files
A separate study on the same 50 sites, published last week. I checked the deliberate-block layer: robots.txt plus a live server-response check per bot. Four headline findings:
Why JavaScript rendering matters for AI crawlers
Every big AI company splits its web presence into two kinds of bots. Training crawlers fetch pages in bulk and feed language models. Answer crawlers fetch on demand when a user asks something. For most of them, bulk crawling skips JavaScript entirely — rendering every page in a headless browser is expensive, slow, and unnecessary if the content is in the raw HTML.
OpenAI's documentation on GPTBot confirms it reads raw HTML. Anthropic's ClaudeBot does too. Common Crawl — the foundation for dozens of open-source LLMs and a training source for nearly all of them — snapshots raw HTML only. Perplexity's training crawler follows the same pattern. GoogleBot and Gemini do render JavaScript, but with a delay of hours to days, and even then the rendered pages feed a different index than the immediate one.
So when a site renders its content client-side — meaning the text is not in the HTML the server sends, only in the JavaScript bundle that runs afterwards — classic AI crawlers see a shell. Navigation in the header, a footer, maybe a cookie banner. No articles. No teasers. No headlines. The site is technically "online", but invisible to everything except Google's and Gemini's second-pass renderers.
The core findings
Of the 35 sites with clean comparable data:
- 23 sites hide 5 % or less of their text without JavaScript — solid server-side rendering, AI crawlers see what users see
- 2 sites hide 6–10 % — mild hydration gap, mostly fine
- 3 sites hide 11–25 % — noticeable hydration gap
- 4 sites hide 26–50 % — serious hydration gap, the crawler story changes
- 3 sites hide 51–100 % — the content is effectively client-side-only
Median text-hidden across the 35: 1 %. Average: 13.6 % (pulled up by the extreme outliers). Per country, Austria stands alone at 26.5 % average, far above Germany (6.3 %) and Switzerland (4.3 %) — four Austrian heavy-JS outliers do all the lifting.
Average text hidden without JavaScript, by country
The 7 heavy-JS sites
These seven hide at least a quarter of their homepage text behind JavaScript. Every number was measured three times; spread across runs was 0–2 %, so these are reproducible, not one-off flukes.
- News.at (AT): 100 % hidden — 7 words raw vs 3,071 rendered. The raw HTML contains navigation and script tags, no articles.
- ServusTV (AT): 88 % hidden — 250 raw, 2,170 rendered. Next.js, CSR-first configuration.
- Tiroler Tageszeitung (AT): 79 % hidden — 582 raw, 2,808 rendered. Angular app.
- Kleine Zeitung (AT): 39 % hidden — 1,494 raw, 2,430 rendered. Vue.
- Tagesschau (DE): 35 % hidden — 1,771 raw, 2,723 rendered. Vue.
- BR (DE): 35 % hidden — 1,524 raw, 2,353 rendered.
- WirtschaftsWoche (DE): 33 % hidden — 1,303 raw, 1,950 rendered. Angular.
For comparison, Der Spiegel, Die Zeit, Bild, SRF, and ORF all hide less than 10 %. Svelte-based Spiegel and Zeit specifically hide 1 % each. The framework is not the cause. Next.js can do full SSR; Angular can render server-side; Vue has Nuxt. Every heavy-JS site above could ship a fully rendered HTML payload and chose not to. The cause is configuration.
Austria dominates the heavy-JS list
Four of the seven are Austrian: News.at, ServusTV, Tiroler Tageszeitung, Kleine Zeitung. The 13 cleanly-measured Austrian sites average 26.5 % text-gap. The rest of the DACH sample averages under 7 %.
The rest of the Austrian sample is normal: Der Standard (paywall), Salzburger Nachrichten, Oberösterreichische Nachrichten, Wiener Zeitung, Profil, Heute, Falter, Trend, Heute, ORF — all under 10 % text-gap. Four extreme outliers carry the average up. Without them, Austria would sit near Germany at around 5–7 %.
What the outliers have in common: tabloid or regional TV content, all rebuilt in the last three years on React-family frameworks, all running in CSR-first mode. For a national broadcaster like ServusTV or a regional like Tiroler Tageszeitung, that architectural choice means the homepage renders fine for human browsers but is empty for AI crawlers that do not run a full render pipeline.
The ARD double lockout
The first lock is the robots.txt. Tagesschau, BR, NDR, and WDR block all 16 AI crawlers in their robots.txt with an identical policy file — clearly coordinated ARD-wide (see the companion robots.txt study for the full breakdown). The second lock is JavaScript: Tagesschau and BR also hide 35 % of their homepage text behind Vue hydration. NDR and WDR were not in this study's top-50 sample, but they run the same ARD tech stack, so the same pattern likely applies.
This is double lockout. Even an AI crawler that ignores robots.txt — some do — still sees only 65 % of Tagesschau's homepage. A crawler that respects robots.txt sees nothing at all. The only bots that get the full page are GoogleBot and Gemini (because they render JS), and those two are allowed everywhere anyway.
ZDF would have been the interesting comparison point — ZDF blocks zero AI crawlers at the robots.txt layer, the mirror opposite of ARD. But ZDF's homepage exceeds 2 MB of raw HTML, past the tool's size limit. I cannot measure it cleanly. Given how heavy the page is, it is likely also CSR-rendered, which would put it in the same JS-hidden bracket as Tagesschau despite the opposite robots.txt stance.
Case study: News.at shows 100 % only after JS
To double-check the most extreme finding, I fetched news.at raw HTML directly (bypassing the tool, same proxy endpoint) and counted words manually. The body contained 559 words total — but only 7 in the main-content region. The remaining 552 were navigation strings: "Suche", "ABO", "Menü", "Aktuell", "Politik", "Wirtschaft", "Menschen", and so on down the category tree.
No article text. No headline. No teaser. No author names. No dates. Nothing of the content that makes news.at a news site. All of that lives in the JavaScript bundle that Next.js hydrates after the page loads.
For a human browser, this is fine — you see the full site within 2 seconds. For GPTBot, ClaudeBot, or CCBot, the page is effectively a blank. When a user asks ChatGPT "What's in the news in Austria today?", news.at has zero sentences available to cite. Even when ChatGPT searches the web live for the answer, the raw HTML response has no news to work with. The site is invisible in AI search.
Consent walls — a different AI barrier
Six sites in the sample return fewer than 100 words in both raw and rendered fetches: Der Standard, Die Presse, Falter, t3n, NZZ, and 20 Minuten. That is consent-wall territory. The server ships a cookie dialog with maybe 40–60 words of legal boilerplate and nothing else until a human clicks "Accept".
This is not a JavaScript problem. It is a different access pattern. Classic HTTP crawlers see almost nothing (the consent dialog text), rendered Chrome sees almost nothing (the same dialog, because it cannot click Accept), and human visitors see full content only after interacting. For AI crawlers, the practical result is the same as full hiding: nothing to cite. Six of 50 tested sites (12 %) use this pattern.
Consent walls are legally motivated — German and Austrian DSGVO enforcement has made clear that cookie consent must be actively given before tracking. But the side effect for AI visibility is direct: these sites have optimized for consent-compliance and accidentally dropped out of AI answers.
Methodology
- Sample: 50 highest-reach DACH media (18 AT, 25 DE, 7 CH), same list as the companion robots.txt study
- Tool: Lumina JS vs No-JS — runs two parallel fetches via the same Cloudflare Worker: raw HTTP GET for No-JS, headless Chrome render via Cloudflare Browser Rendering for JS
- Measurement: text content from
<main>,<article>, or the body minus<nav>/<header>/<footer>/<aside>and cookie-banner classes. Scripts and styles stripped. Words split on whitespace. Text-hidden % = (rendered − raw) / rendered × 100. - Reproducibility check: the 7 heavy-JS sites plus 3 sanity sites (Spiegel, Zeit, ORF) were each re-run 3 × to check stability. Spread across runs was 0–2 %. Findings are not one-off anomalies.
- Manual verification: the most extreme finding (news.at at 100 %) was spot-checked by fetching the raw HTML directly via the same worker and counting words outside the tool. Confirmed: 7 main-content words, 552 navigation words, zero article text.
- Browser-automated runner: every URL was executed through the tool's actual UI via
window.go(url)in a live browser session, not through a shortcut API path. The numbers below are what any user running the tool on the same URL would see today. - Raw data: the full JSON with all 50 results, plus the Perl analyzer that produces the summary, are on GitHub.
The 15 sites we could not cleanly measure
Out of 50 sites, 35 returned clean comparable data. The other 15 split into distinct categories, each a small finding in itself:
- Consent-wall / paywall (6 sites): Der Standard, Die Presse, Falter, t3n, NZZ, 20 Minuten. Under 100 words in both fetches. Covered in section above.
- Tool size limit (2 sites): Süddeutsche Zeitung and ZDF homepages exceed 2 MB raw / 3 MB rendered. The Lumina tool cannot fetch pages that large. For these two I have no data. An inner article URL would likely work; for a like-for-like homepage comparison, they are excluded.
- Raw HTTP blocked, JS works (2 sites): Focus.de returns HTTP 403 to raw GET, Tages-Anzeiger redirects in a cookie-dependent loop. Both sites work fine in rendered Chrome. Classic AI crawlers see nothing (they don't render JS), GoogleBot gets through (it does). Same pattern I documented for kurier.at and capital.de in the robots.txt study.
- Both fetchers blocked (2 sites): oe24 and Heise return empty bodies to both the raw fetch and the rendered fetch. Aggressive bot detection on the Cloudflare Worker IP range. Real browsers still load the sites fine, and GoogleBot's own IPs are almost certainly whitelisted.
- SSL/TLS cert problem (1 site): Puls24 returned HTTP 526 on both fetchers — Cloudflare reports an invalid cert at the origin. Until that is fixed, the site is unreachable for any client that validates certs.
- Browser rendering fails, raw HTML OK (1 site): Frankfurter Rundschau returns 3,558 words via raw HTTP but the rendered-Chrome endpoint errors out with a 422. Partial data — I can confirm the raw HTML has content, but cannot verify a JS-render gap.
- Rendered Chrome blocked at edge (1 site): Blick.ch is an interesting case. Raw HTTP returns 1,890 words of real content. The rendered-Chrome fetch comes back with a 15-word "Access Denied" page from Akamai. Blick's edge filter is configured to let classic HTTP through but block Cloudflare's Browser Rendering IP range. Manual verification confirmed this is an infrastructure-level block, not a consent overlay.
These 15 sites are excluded from all country averages and percentages in this article. Only the 35 cleanly measured sites are counted. I mention the exclusion categories here because each one is itself a small finding about how these sites interact with automated crawlers.
Takeaway
There are two ways to disappear from ChatGPT, and this article covers the one nobody decided. The explicit-policy decision — "we will block GPTBot" — at least has a name, an owner, a meeting where it was discussed. The JavaScript-hiding decision does not. Seven sites ship a homepage where the actual content only exists after a JavaScript framework hydrates. That was not a GEO decision. It was a frontend team picking the default configuration of Next.js, Angular, or Vue and shipping.
For News.at the effect is total: zero content available to AI crawlers that do not render JS. For Tagesschau the effect compounds with the deliberate robots.txt block: double lockout. For Kleine Zeitung and WirtschaftsWoche it is a third of their content silently missing from the AI training feed.
The fix is not migrating off the framework. The fix is flipping the configuration to SSR-first (every framework mentioned supports it) or ship Static Site Generation. Spiegel and Zeit are both on Svelte and hide 1 % of text each. It is achievable.
Whether Austrian regional media, BR, Tagesschau or News.at will do this work is a different question. The first step is just knowing it is there — most frontend teams I have talked to do not check their AI-crawler visibility at all.
FAQ
Check your own JS-rendering gap
The same test I ran in this study — for your domain. Free, no signup, raw HTML vs Chrome-rendered in 20 seconds.
Open the JS vs No-JS Tool →