Most AI crawlers — GPTBot, ClaudeBot, CCBot, PerplexityBot — read the raw HTML your server returns and stop there. They do not execute JavaScript. For sites that ship a React, Vue, or Angular shell with the actual content hydrated client-side, the crawler sees a blank shell: navigation, placeholders, and a JavaScript bundle — no articles, no headlines, no teasers.

I ran 50 of the largest DACH media homepages through the Lumina JS vs No-JS tool: one fetch for raw HTML, one for Chrome-rendered DOM, word counts compared. 35 sites returned clean comparable data. Seven of those 35 hide a quarter or more of their text behind JavaScript. News.at hides 100 %: the raw HTML contains nothing but navigation.

Two ways a site can disappear from ChatGPT

Active block: robots.txt or WAF rules tell AI bots to go away. A deliberate decision, usually made once in 2023 and rarely revisited.

Passive hiding: the site technically lets AI bots in, but the content is rendered client-side. Raw HTML contains a shell with no actual text. This is usually not a decision at all; it is a side effect of a CSR-first framework choice.

A site can do both. Tagesschau blocks all AI bots in robots.txt AND hides 35 % of its homepage text behind Vue hydration. Double lockout.

Companion study

What I found in their robots.txt files

A separate study on the same 50 sites, published last week. I checked the deliberate-block layer: robots.txt plus a live server-response check per bot. Four headline findings:

52 %
block GPTBot & ClaudeBot
0 %
block Googlebot
16/19
ARD blocks all AI bots (Tagesschau, NDR, WDR, BR)
0/19
ZDF blocks none (same country, opposite stance)
Read the full robots.txt study →

Why JavaScript rendering matters for AI crawlers

Every big AI company splits its web presence into two kinds of bots. Training crawlers fetch pages in bulk and feed language models. Answer crawlers fetch on demand when a user asks something. For most of them, bulk crawling skips JavaScript entirely — rendering every page in a headless browser is expensive, slow, and unnecessary if the content is in the raw HTML.

OpenAI's documentation on GPTBot confirms it reads raw HTML. Anthropic's ClaudeBot does too. Common Crawl — the foundation for dozens of open-source LLMs and a training source for nearly all of them — snapshots raw HTML only. Perplexity's training crawler follows the same pattern. GoogleBot and Gemini do render JavaScript, but with a delay of hours to days, and even then the rendered pages feed a different index than the immediate one.

So when a site renders its content client-side — meaning the text is not in the HTML the server sends, only in the JavaScript bundle that runs afterwards — classic AI crawlers see a shell. Navigation in the header, a footer, maybe a cookie banner. No articles. No teasers. No headlines. The site is technically "online", but invisible to everything except Google's and Gemini's second-pass renderers.

The core findings

Of the 35 sites with clean comparable data:

Median text-hidden across the 35: 1 %. Average: 13.6 % (pulled up by the extreme outliers). Per country, Austria stands alone at 26.5 % average, far above Germany (6.3 %) and Switzerland (4.3 %) — four Austrian heavy-JS outliers do all the lifting.

Average text hidden without JavaScript, by country

Based on 35 cleanly measurable DACH media homepages
26.5% Austria 13 sites 6.3% Germany 19 sites 4.3% Switzerland 3 sites 50% 25% 0%

The 7 heavy-JS sites

These seven hide at least a quarter of their homepage text behind JavaScript. Every number was measured three times; spread across runs was 0–2 %, so these are reproducible, not one-off flukes.

For comparison, Der Spiegel, Die Zeit, Bild, SRF, and ORF all hide less than 10 %. Svelte-based Spiegel and Zeit specifically hide 1 % each. The framework is not the cause. Next.js can do full SSR; Angular can render server-side; Vue has Nuxt. Every heavy-JS site above could ship a fully rendered HTML payload and chose not to. The cause is configuration.

Austria dominates the heavy-JS list

Four of the seven are Austrian: News.at, ServusTV, Tiroler Tageszeitung, Kleine Zeitung. The 13 cleanly-measured Austrian sites average 26.5 % text-gap. The rest of the DACH sample averages under 7 %.

The rest of the Austrian sample is normal: Der Standard (paywall), Salzburger Nachrichten, Oberösterreichische Nachrichten, Wiener Zeitung, Profil, Heute, Falter, Trend, Heute, ORF — all under 10 % text-gap. Four extreme outliers carry the average up. Without them, Austria would sit near Germany at around 5–7 %.

What the outliers have in common: tabloid or regional TV content, all rebuilt in the last three years on React-family frameworks, all running in CSR-first mode. For a national broadcaster like ServusTV or a regional like Tiroler Tageszeitung, that architectural choice means the homepage renders fine for human browsers but is empty for AI crawlers that do not run a full render pipeline.

The ARD double lockout

The first lock is the robots.txt. Tagesschau, BR, NDR, and WDR block all 16 AI crawlers in their robots.txt with an identical policy file — clearly coordinated ARD-wide (see the companion robots.txt study for the full breakdown). The second lock is JavaScript: Tagesschau and BR also hide 35 % of their homepage text behind Vue hydration. NDR and WDR were not in this study's top-50 sample, but they run the same ARD tech stack, so the same pattern likely applies.

This is double lockout. Even an AI crawler that ignores robots.txt — some do — still sees only 65 % of Tagesschau's homepage. A crawler that respects robots.txt sees nothing at all. The only bots that get the full page are GoogleBot and Gemini (because they render JS), and those two are allowed everywhere anyway.

ZDF would have been the interesting comparison point — ZDF blocks zero AI crawlers at the robots.txt layer, the mirror opposite of ARD. But ZDF's homepage exceeds 2 MB of raw HTML, past the tool's size limit. I cannot measure it cleanly. Given how heavy the page is, it is likely also CSR-rendered, which would put it in the same JS-hidden bracket as Tagesschau despite the opposite robots.txt stance.

Case study: News.at shows 100 % only after JS

To double-check the most extreme finding, I fetched news.at raw HTML directly (bypassing the tool, same proxy endpoint) and counted words manually. The body contained 559 words total — but only 7 in the main-content region. The remaining 552 were navigation strings: "Suche", "ABO", "Menü", "Aktuell", "Politik", "Wirtschaft", "Menschen", and so on down the category tree.

No article text. No headline. No teaser. No author names. No dates. Nothing of the content that makes news.at a news site. All of that lives in the JavaScript bundle that Next.js hydrates after the page loads.

For a human browser, this is fine — you see the full site within 2 seconds. For GPTBot, ClaudeBot, or CCBot, the page is effectively a blank. When a user asks ChatGPT "What's in the news in Austria today?", news.at has zero sentences available to cite. Even when ChatGPT searches the web live for the answer, the raw HTML response has no news to work with. The site is invisible in AI search.

Six sites in the sample return fewer than 100 words in both raw and rendered fetches: Der Standard, Die Presse, Falter, t3n, NZZ, and 20 Minuten. That is consent-wall territory. The server ships a cookie dialog with maybe 40–60 words of legal boilerplate and nothing else until a human clicks "Accept".

This is not a JavaScript problem. It is a different access pattern. Classic HTTP crawlers see almost nothing (the consent dialog text), rendered Chrome sees almost nothing (the same dialog, because it cannot click Accept), and human visitors see full content only after interacting. For AI crawlers, the practical result is the same as full hiding: nothing to cite. Six of 50 tested sites (12 %) use this pattern.

Consent walls are legally motivated — German and Austrian DSGVO enforcement has made clear that cookie consent must be actively given before tracking. But the side effect for AI visibility is direct: these sites have optimized for consent-compliance and accidentally dropped out of AI answers.

Methodology

The 15 sites we could not cleanly measure

Out of 50 sites, 35 returned clean comparable data. The other 15 split into distinct categories, each a small finding in itself:

These 15 sites are excluded from all country averages and percentages in this article. Only the 35 cleanly measured sites are counted. I mention the exclusion categories here because each one is itself a small finding about how these sites interact with automated crawlers.

Takeaway

There are two ways to disappear from ChatGPT, and this article covers the one nobody decided. The explicit-policy decision — "we will block GPTBot" — at least has a name, an owner, a meeting where it was discussed. The JavaScript-hiding decision does not. Seven sites ship a homepage where the actual content only exists after a JavaScript framework hydrates. That was not a GEO decision. It was a frontend team picking the default configuration of Next.js, Angular, or Vue and shipping.

For News.at the effect is total: zero content available to AI crawlers that do not render JS. For Tagesschau the effect compounds with the deliberate robots.txt block: double lockout. For Kleine Zeitung and WirtschaftsWoche it is a third of their content silently missing from the AI training feed.

The fix is not migrating off the framework. The fix is flipping the configuration to SSR-first (every framework mentioned supports it) or ship Static Site Generation. Spiegel and Zeit are both on Svelte and hide 1 % of text each. It is achievable.

Whether Austrian regional media, BR, Tagesschau or News.at will do this work is a different question. The first step is just knowing it is there — most frontend teams I have talked to do not check their AI-crawler visibility at all.

FAQ

Why does JavaScript rendering matter for AI search visibility?+
Most classic AI crawlers — GPTBot, ClaudeBot, CCBot, and PerplexityBot — do not execute JavaScript. They only read the raw HTML the server returns. If a site renders its content client-side via React, Vue, or Angular hydration, those crawlers see an empty shell. GoogleBot and Gemini do render JS but with delay; the other major LLM training crawlers do not.
Which frameworks showed the worst text-gaps in this study?+
Next.js (News.at, ServusTV) and Angular (Tiroler Tageszeitung, WirtschaftsWoche) had the largest text-gaps. Vue was mixed: Kleine Zeitung hid 39 %, while ORF — also on Vue — hid only 7 %. Svelte, used by Der Spiegel and Die Zeit, averaged 1 % across runs. The framework is not the cause; configuration is. SSR-first setups produce complete raw HTML. CSR-first setups do not.
Does Google see content hidden behind JavaScript?+
Yes. GoogleBot renders JavaScript, though with a delay of hours to days. Gemini inherits the same rendering pipeline. But classic AI training crawlers do not: GPTBot, ClaudeBot, and CCBot (the feed for most open-source LLMs) only see the raw HTML. A site scoring 35 % text-hidden is fully indexed by Google and trained on by ChatGPT with only 65 % of its content.
Is this the same as blocking AI crawlers via robots.txt?+
No. Blocking via robots.txt or WAF rules is an active decision. Hiding content behind JavaScript is usually accidental — a side effect of CSR-first framework choices. A site can block GPTBot deliberately AND still hide content unintentionally. Tagesschau does both: it blocks all AI crawlers in robots.txt AND hides 35 % of its homepage text behind Vue hydration.
How can I check JS-rendering gaps on my own site?+
Run your URL through the Lumina JS vs No-JS tool. It fires two parallel fetches — raw HTTP and Chrome-rendered — then reports the text-hidden percentage, detected frameworks, and the specific words missing from raw HTML. Free, no signup. For the full DACH study the same tool produced every number in this article.

Check your own JS-rendering gap

The same test I ran in this study — for your domain. Free, no signup, raw HTML vs Chrome-rendered in 20 seconds.

Open the JS vs No-JS Tool →