DACH Media JS Study: 7 Sites Hide 1/3+ of Their Text

Most AI crawlers — GPTBot, ClaudeBot, CCBot, PerplexityBot — read the raw HTML your server returns and stop there. They do not execute JavaScript. For sites that ship a React, Vue, or Angular shell with the actual content hydrated client-side, the crawler sees a blank shell: navigation, placeholders, and a JavaScript bundle — no articles, no headlines, no teasers.

I ran 50 of the largest DACH media homepages through the Lumina JS vs No-JS tool: one fetch for raw HTML, one for Chrome-rendered DOM, word counts compared. 35 sites returned clean comparable data. Seven of those 35 hide a quarter or more of their text behind JavaScript. News.at hides 100 %: the raw HTML contains nothing but navigation.

Two ways a site can disappear from ChatGPT

Active block: robots.txt or WAF rules tell AI bots to go away. A deliberate decision, usually made once in 2023 and rarely revisited.

Passive hiding: the site technically lets AI bots in, but the content is rendered client-side. Raw HTML contains a shell with no actual text. This is usually not a decision at all; it is a side effect of a CSR-first framework choice.

A site can do both. Tagesschau blocks all AI bots in robots.txt AND hides 35 % of its homepage text behind Vue hydration. Double lockout.

Companion study

What I found in their robots.txt files

A separate study on the same 50 sites, published last week. I checked the deliberate-block layer: robots.txt plus a live server-response check per bot. Four headline findings:

52 %

block GPTBot & ClaudeBot

0 %

block Googlebot

16/19

ARD blocks all AI bots (Tagesschau, NDR, WDR, BR)

0/19

ZDF blocks none (same country, opposite stance)

Read the full robots.txt study →

Why JavaScript rendering matters for AI crawlers

Every big AI company splits its web presence into two kinds of bots. Training crawlers fetch pages in bulk and feed language models. Answer crawlers fetch on demand when a user asks something. For most of them, bulk crawling skips JavaScript entirely — rendering every page in a headless browser is expensive, slow, and unnecessary if the content is in the raw HTML.

OpenAI's documentation on GPTBot confirms it reads raw HTML. Anthropic's ClaudeBot does too. Common Crawl — the foundation for dozens of open-source LLMs and a training source for nearly all of them — snapshots raw HTML only. Perplexity's training crawler follows the same pattern. GoogleBot and Gemini do render JavaScript, but with a delay of hours to days, and even then the rendered pages feed a different index than the immediate one.

So when a site renders its content client-side — meaning the text is not in the HTML the server sends, only in the JavaScript bundle that runs afterwards — classic AI crawlers see a shell. Navigation in the header, a footer, maybe a cookie banner. No articles. No teasers. No headlines. The site is technically "online", but invisible to everything except Google's and Gemini's second-pass renderers.

The core findings

Of the 35 sites with clean comparable data:

23 sites hide 5 % or less of their text without JavaScript — solid server-side rendering, AI crawlers see what users see
2 sites hide 6–10 % — mild hydration gap, mostly fine
3 sites hide 11–25 % — noticeable hydration gap
4 sites hide 26–50 % — serious hydration gap, the crawler story changes
3 sites hide 51–100 % — the content is effectively client-side-only

Median text-hidden across the 35: 1 %. Average: 13.6 % (pulled up by the extreme outliers). Per country, Austria stands alone at 26.5 % average, far above Germany (6.3 %) and Switzerland (4.3 %) — four Austrian heavy-JS outliers do all the lifting.

Average text hidden without JavaScript, by country

Based on 35 cleanly measurable DACH media homepages

The 7 heavy-JS sites

These seven hide at least a quarter of their homepage text behind JavaScript. Every number was measured three times; spread across runs was 0–2 %, so these are reproducible, not one-off flukes.

News.at (AT): 100 % hidden — 7 words raw vs 3,071 rendered. The raw HTML contains navigation and script tags, no articles.
ServusTV (AT): 88 % hidden — 250 raw, 2,170 rendered. Next.js, CSR-first configuration.
Tiroler Tageszeitung (AT): 79 % hidden — 582 raw, 2,808 rendered. Angular app.
Kleine Zeitung (AT): 39 % hidden — 1,494 raw, 2,430 rendered. Vue.
Tagesschau (DE): 35 % hidden — 1,771 raw, 2,723 rendered. Vue.
BR (DE): 35 % hidden — 1,524 raw, 2,353 rendered.
WirtschaftsWoche (DE): 33 % hidden — 1,303 raw, 1,950 rendered. Angular.

For comparison, Der Spiegel, Die Zeit, Bild, SRF, and ORF all hide less than 10 %. Svelte-based Spiegel and Zeit specifically hide 1 % each. The framework is not the cause. Next.js can do full SSR; Angular can render server-side; Vue has Nuxt. Every heavy-JS site above could ship a fully rendered HTML payload and chose not to. The cause is configuration.

Austria dominates the heavy-JS list

Four of the seven are Austrian: News.at, ServusTV, Tiroler Tageszeitung, Kleine Zeitung. The 13 cleanly-measured Austrian sites average 26.5 % text-gap. The rest of the DACH sample averages under 7 %.

The rest of the Austrian sample is normal: Der Standard (paywall), Salzburger Nachrichten, Oberösterreichische Nachrichten, Wiener Zeitung, Profil, Heute, Falter, Trend, Heute, ORF — all under 10 % text-gap. Four extreme outliers carry the average up. Without them, Austria would sit near Germany at around 5–7 %.

What the outliers have in common: tabloid or regional TV content, all rebuilt in the last three years on React-family frameworks, all running in CSR-first mode. For a national broadcaster like ServusTV or a regional like Tiroler Tageszeitung, that architectural choice means the homepage renders fine for human browsers but is empty for AI crawlers that do not run a full render pipeline.

The ARD double lockout

The first lock is the robots.txt. Tagesschau, BR, NDR, and WDR block all 16 AI crawlers in their robots.txt with an identical policy file — clearly coordinated ARD-wide (see the companion robots.txt study for the full breakdown). The second lock is JavaScript: Tagesschau and BR also hide 35 % of their homepage text behind Vue hydration. NDR and WDR were not in this study's top-50 sample, but they run the same ARD tech stack, so the same pattern likely applies.

This is double lockout. Even an AI crawler that ignores robots.txt — some do — still sees only 65 % of Tagesschau's homepage. A crawler that respects robots.txt sees nothing at all. The only bots that get the full page are GoogleBot and Gemini (because they render JS), and those two are allowed everywhere anyway.

ZDF would have been the interesting comparison point — ZDF blocks zero AI crawlers at the robots.txt layer, the mirror opposite of ARD. But ZDF's homepage exceeds 2 MB of raw HTML, past the tool's size limit. I cannot measure it cleanly. Given how heavy the page is, it is likely also CSR-rendered, which would put it in the same JS-hidden bracket as Tagesschau despite the opposite robots.txt stance.

Case study: News.at shows 100 % only after JS

To double-check the most extreme finding, I fetched news.at raw HTML directly (bypassing the tool, same proxy endpoint) and counted words manually. The body contained 559 words total — but only 7 in the main-content region. The remaining 552 were navigation strings: "Suche", "ABO", "Menü", "Aktuell", "Politik", "Wirtschaft", "Menschen", and so on down the category tree.

No article text. No headline. No teaser. No author names. No dates. Nothing of the content that makes news.at a news site. All of that lives in the JavaScript bundle that Next.js hydrates after the page loads.

For a human browser, this is fine — you see the full site within 2 seconds. For GPTBot, ClaudeBot, or CCBot, the page is effectively a blank. When a user asks ChatGPT "What's in the news in Austria today?", news.at has zero sentences available to cite. Even when ChatGPT searches the web live for the answer, the raw HTML response has no news to work with. The site is invisible in AI search.

Six sites in the sample return fewer than 100 words in both raw and rendered fetches: Der Standard, Die Presse, Falter, t3n, NZZ, and 20 Minuten. That is consent-wall territory. The server ships a cookie dialog with maybe 40–60 words of legal boilerplate and nothing else until a human clicks "Accept".

This is not a JavaScript problem. It is a different access pattern. Classic HTTP crawlers see almost nothing (the consent dialog text), rendered Chrome sees almost nothing (the same dialog, because it cannot click Accept), and human visitors see full content only after interacting. For AI crawlers, the practical result is the same as full hiding: nothing to cite. Six of 50 tested sites (12 %) use this pattern.

Consent walls are legally motivated — German and Austrian DSGVO enforcement has made clear that cookie consent must be actively given before tracking. But the side effect for AI visibility is direct: these sites have optimized for consent-compliance and accidentally dropped out of AI answers.

Methodology

Sample: 50 highest-reach DACH media (18 AT, 25 DE, 7 CH), same list as the companion robots.txt study
Tool: Lumina JS vs No-JS — runs two parallel fetches via the same Cloudflare Worker: raw HTTP GET for No-JS, headless Chrome render via Cloudflare Browser Rendering for JS
Measurement: text content from <main>, <article>, or the body minus <nav>/<header>/<footer>/<aside> and cookie-banner classes. Scripts and styles stripped. Words split on whitespace. Text-hidden % = (rendered − raw) / rendered × 100.
Reproducibility check: the 7 heavy-JS sites plus 3 sanity sites (Spiegel, Zeit, ORF) were each re-run 3 × to check stability. Spread across runs was 0–2 %. Findings are not one-off anomalies.
Manual verification: the most extreme finding (news.at at 100 %) was spot-checked by fetching the raw HTML directly via the same worker and counting words outside the tool. Confirmed: 7 main-content words, 552 navigation words, zero article text.
Browser-automated runner: every URL was executed through the tool's actual UI via window.go(url) in a live browser session, not through a shortcut API path. The numbers below are what any user running the tool on the same URL would see today.
Raw data: the full JSON with all 50 results, plus the Perl analyzer that produces the summary, are on GitHub.

The 15 sites I couldn't cleanly measure

Out of 50 sites, 35 returned clean comparable data. The other 15 split into distinct categories, each a small finding in itself:

Consent-wall / paywall (6 sites): Der Standard, Die Presse, Falter, t3n, NZZ, 20 Minuten. Under 100 words in both fetches. Covered in section above.
Tool size limit (2 sites): Süddeutsche Zeitung and ZDF homepages exceed 2 MB raw / 3 MB rendered. The Lumina tool cannot fetch pages that large. For these two I have no data. An inner article URL would likely work; for a like-for-like homepage comparison, they are excluded.
Raw HTTP blocked, JS works (2 sites): Focus.de returns HTTP 403 to raw GET, Tages-Anzeiger redirects in a cookie-dependent loop. Both sites work fine in rendered Chrome. Classic AI crawlers see nothing (they don't render JS), GoogleBot gets through (it does). Same pattern I documented for kurier.at and capital.de in the robots.txt study.
Both fetchers blocked (2 sites): oe24 and Heise return empty bodies to both the raw fetch and the rendered fetch. Aggressive bot detection on the Cloudflare Worker IP range. Real browsers still load the sites fine, and GoogleBot's own IPs are almost certainly whitelisted.
SSL/TLS cert problem (1 site): Puls24 returned HTTP 526 on both fetchers — Cloudflare reports an invalid cert at the origin. Until that is fixed, the site is unreachable for any client that validates certs.
Browser rendering fails, raw HTML OK (1 site): Frankfurter Rundschau returns 3,558 words via raw HTTP but the rendered-Chrome endpoint errors out with a 422. Partial data — I can confirm the raw HTML has content, but cannot verify a JS-render gap.
Rendered Chrome blocked at edge (1 site): Blick.ch is an interesting case. Raw HTTP returns 1,890 words of real content. The rendered-Chrome fetch comes back with a 15-word "Access Denied" page from Akamai. Blick's edge filter is configured to let classic HTTP through but block Cloudflare's Browser Rendering IP range. Manual verification confirmed this is an infrastructure-level block, not a consent overlay.

These 15 sites are excluded from all country averages and percentages in this article. Only the 35 cleanly measured sites are counted. I mention the exclusion categories here because each one is itself a small finding about how these sites interact with automated crawlers.

Takeaway

There are two ways to disappear from ChatGPT, and this article covers the one nobody decided. The explicit-policy decision — "I'll block GPTBot" — at least has a name, an owner, a meeting where it was discussed. The JavaScript-hiding decision does not. Seven sites ship a homepage where the actual content only exists after a JavaScript framework hydrates. That was not a GEO decision. It was a frontend team picking the default configuration of Next.js, Angular, or Vue and shipping.

For News.at the effect is total: zero content available to AI crawlers that do not render JS. For Tagesschau the effect compounds with the deliberate robots.txt block: double lockout. For Kleine Zeitung and WirtschaftsWoche it is a third of their content silently missing from the AI training feed.

The fix is not migrating off the framework. The fix is flipping the configuration to SSR-first (every framework mentioned supports it) or ship Static Site Generation. Spiegel and Zeit are both on Svelte and hide 1 % of text each. It is achievable.

Whether Austrian regional media, BR, Tagesschau or News.at will do this work is a different question. The first step is just knowing it is there — most frontend teams I have talked to do not check their AI-crawler visibility at all.

FAQ

Why does JavaScript rendering matter for AI search visibility?+

Most classic AI crawlers — GPTBot, ClaudeBot, CCBot, and PerplexityBot — do not execute JavaScript. They only read the raw HTML the server returns. If a site renders its content client-side via React, Vue, or Angular hydration, those crawlers see an empty shell. GoogleBot and Gemini do render JS but with delay; the other major LLM training crawlers do not.

Which frameworks showed the worst text-gaps in this study?+

Next.js (News.at, ServusTV) and Angular (Tiroler Tageszeitung, WirtschaftsWoche) had the largest text-gaps. Vue was mixed: Kleine Zeitung hid 39 %, while ORF — also on Vue — hid only 7 %. Svelte, used by Der Spiegel and Die Zeit, averaged 1 % across runs. The framework is not the cause; configuration is. SSR-first setups produce complete raw HTML. CSR-first setups do not.

Does Google see content hidden behind JavaScript?+

Yes. GoogleBot renders JavaScript, though with a delay of hours to days. Gemini inherits the same rendering pipeline. But classic AI training crawlers do not: GPTBot, ClaudeBot, and CCBot (the feed for most open-source LLMs) only see the raw HTML. A site scoring 35 % text-hidden is fully indexed by Google and trained on by ChatGPT with only 65 % of its content.

Is this the same as blocking AI crawlers via robots.txt?+

No. Blocking via robots.txt or WAF rules is an active decision. Hiding content behind JavaScript is usually accidental — a side effect of CSR-first framework choices. A site can block GPTBot deliberately AND still hide content unintentionally. Tagesschau does both: it blocks all AI crawlers in robots.txt AND hides 35 % of its homepage text behind Vue hydration.

How can I check JS-rendering gaps on my own site?+

Run your URL through the Lumina JS vs No-JS tool. It fires two parallel fetches — raw HTTP and Chrome-rendered — then reports the text-hidden percentage, detected frameworks, and the specific words missing from raw HTML. Free, no signup. For the full DACH study the same tool produced every number in this article.

Check your own JS-rendering gap

The same test I ran in this study — for your domain. Free, no signup, raw HTML vs Chrome-rendered in 20 seconds.

Open the JS vs No-JS Tool →

7 of 35 DACH Media Hide a Third or More of Their Content Behind JavaScript

Two ways a site can disappear from ChatGPT

What I found in their robots.txt files

Why JavaScript rendering matters for AI crawlers

The core findings

Average text hidden without JavaScript, by country

The 7 heavy-JS sites

Austria dominates the heavy-JS list

The ARD double lockout

Case study: News.at shows 100 % only after JS

Methodology

The 15 sites I couldn't cleanly measure

Takeaway

FAQ

Check your own JS-rendering gap

Julien El-Bahy

Related tools & articles

7 of 35 DACH Media Hide a Third or More of Their Content Behind JavaScript

Two ways a site can disappear from ChatGPT

What I found in their robots.txt files

Why JavaScript rendering matters for AI crawlers

The core findings

Average text hidden without JavaScript, by country

The 7 heavy-JS sites

Austria dominates the heavy-JS list

The ARD double lockout

Case study: News.at shows 100 % only after JS

Consent walls — a different AI barrier

Methodology

The 15 sites I couldn't cleanly measure

Takeaway

FAQ

Check your own JS-rendering gap

Julien El-Bahy

Related tools & articles

JS vs No-JS Tool

Crawler Access Checker

GEO Readiness

Companion: DACH robots.txt study