Generative AI has become a first-stop surface for shopping research. When a consumer asks “what's the best anti-age cream for sensitive skin”, three brands capture the consideration set — not forty.
How AI recommends French skincare.
A sector analysis of how ChatGPT, Gemini and Claude currently answer 201 shopping queries on anti-age skincare. 1809 AI responses, 47 brands observed.
The largest independent measurement of AI skincare recommendations in France.
1 809
AI RESPONSES ANALYZED
Claude · Gemini · ChatGPT, 3 runs each.
201
SHOPPING QUERIES RUN
190 generic · 11 targeted (age + price).
3
FRONTIER ENGINES TESTED
claude-sonnet-4-5 · gemini-2.5-flash · gpt-4o
3
RUNS PER QUERY
Each (prompt, engine) fired 3× for stability.
47
SKINCARE BRANDS TRACKED
Dictionary of active French-market brands.
5 780
BRAND CITATIONS EXTRACTED
3 823 tracked · 1 957 untracked.
Every answer read, every citation mapped, every brand attributed — in French, on French consumer intent.
A new discovery layer has emerged — and most brands have not noticed.
Unlike Google, AI engines answer in prose. Rank and citation context matter more than CTR or backlinks. The levers that moved SEO do not move AEO.
We query in French, on brands French consumers actually buy. US-trained benchmarks under-represent how European shoppers phrase their questions.
47 skincare anti-age brands in the dictionary. 190 shopping queries per category. Three engines per prompt. The sample is large enough to detect positioning, small enough to be interpretable.
1809
AI responses analyzed for this report.
47 brands cited out of 47 tracked.
3823
Brand citations extracted across the three engines.
Plus 1957 untracked mentions (see §11).
47 brands observed. 42.3% of answers mention one of 3.
42.3%
of responses cite at least 1 of the top-3 brands.
57.1%
of responses cite at least 1 of the top-10 brands.
29.3%
of responses cite the leader.
31
brands share 21.1% of citation volume.
0.589
inequality across the brand distribution.
The category does not behave like a 47-horse race. It behaves like a three-horse race with a long, very long tail. Two metrics tell the same story: top 3 capture 31.8% of total citation volume, and appear in 42.3% of all AI responses. Everything else is statistical noise from the leader's perspective — and everything else is the entire strategic opportunity from a challenger's perspective.
But the race depends on the question. On 190 generic queries — "meilleure crème anti-âge", "routine efficace" — the top 3 brands capture 31.3% of citation volume. On the 11 targeted queries where the consumer specifies age or budget (n = 99 responses — indicative trend), the same top-3 share tightens to 36.9% — the leaders are even harder to dislodge once the query gets specific. Generic visibility ≠ specific visibility.
Three engines, three palmares.
Only 1 brand in top 3 across all 3 engines.
The same question. Three different answers. The AI market is already fragmenting — each engine has its own consideration set, and a brand's visibility on one does not translate to the next.
The questions behind the answers.
Our corpus of 190 shopping queries reflects real consumer intent observed in French skincare search patterns. Here are 20 representative queries, grouped by intent type, alongside the top insight each type surfaces.
meilleure crème anti-âge
quelle crème hydratante choisir
routine skincare anti-âge efficace
produits anti-rides qui marchent
crème anti-âge moins de 40€
soin visage bio certifié
crème sans parabène ni sulfate
anti-âge pour peaux sensibles
comment traiter taches brunes
solution rides front profondes
crème pour cernes marqués
soin anti-âge ménopause
quelle marque française choisir
différence dermatologique vs bio
crème pharmacie ou parfumerie
marque heritage vs nouvelle génération
crème pour femme 45 ans
soin premiers signes vieillissement
routine après 50 ans
anti-âge peau mature sèche
20 queries shown. 181 others measured. Intent patterns vary — and so do the brands recommended.
Concentration varies by what consumers ask.
| Intent type | N responses | Top-3 response coverage | Avg brand density |
|---|---|---|---|
| Pure intent | 369 | 50.9% | 2.92 |
| Constrained | 585 | 56.2% | 2.48 |
| Problem–solution | 360 | 2.8% | 0.14 |
| Comparative | 270 | 38.9% | 2.11 |
| Persona-based | 225 | 59.1% | 2.99 |
| Age-segmented tag | 45· | 77.8% | 3.64 |
| Price-tiered tag | 54· | 53.7% | 3.09 |
· = n < 100 responses, indicative trend · "tag" rows are pseudo-types based on prompt tags and overlap with the five enum types above.
Top-3 response coverage ranges from 2.8% (most fragmented: problem–solution) to 77.8% (most concentrated: age-segmented). The more specific the question, the tighter the leaders' grip — and when the consumer asks about a real skin problem, the market fragments into ingredient-first answers that bypass brand names entirely.
88.1% of prompts. 3 different top picks.
88.1%
of prompts where the 3 engines disagree on the #1 brand.
11.9%
of prompts where all 3 engines align on the same #1.
0.45
avg top-5 set similarity across the 3 engine pairs (0 = no overlap, 1 = identical).
Optimizing for one engine means being invisible on the two others. A brand on the ChatGPT top-5 overlaps with Gemini's top-5 with a Jaccard of 0.25 — the two platforms converge on roughly 25% of the same consideration set.
Run-to-run instability compounds this divergence: only 30.3% of prompts produce the same top-3 brand set across 3 independent runs on the same engine (n = 165 prompts). The "AI recommends" answer is not a single signal — it's a distribution.
The price AI recommends.
- Lowest medianGemini · 25 €
- Highest medianClaude · 40 €
- Claude vs GeminiClaude recommends 1.6× higher on the median
- Median recommended ChatGPT 30 € · Gemini 25 € · Claude 40 €
- Why median, not mean Luxury prompts pull Claude's arithmetic mean to 182 € — a single €800 outlier distorts the headline. The median is the price the typical answer suggests.
Three engines. Three relationships with price. The consumer's budget literally changes which brand wins. n = 39 price-bearing responses — indicative trend
Who does AI believe? — A measurement question.
97.7%
of Gemini answers rely on live search sources.
12.2
URLs per grounded response.
~11.9
avg source citations per Gemini response, across the sample.
1281 distinct publishers ground 97.7% of Gemini's answers.
1281
distinct domains appear in the grounding URLs we resolved.
8.9%
of citation weight goes to the 3 most-cited publishers.
36.1%
of citation weight comes from .fr / .be / .ch domains.
Publishers are public-facing media properties — naming them serves industry transparency. Brands are anonymized as measurement subjects.
SOURCE: resolved 7155/7183 Gemini grounding redirectors (99.6% resolution rate). Weighting: each (response, domain) pair = 1, deduplicated within response.
Gemini proxies every grounded URL through vertexaisearch.cloud.google.com; those redirectors expire within hours, blocking publisher-level attribution on archived runs. An empirical authority map — resolving each redirect to its final publisher in real time — ships on the AEOBrand roadmap. Meanwhile, the aggregate grounding metrics below are the best available proxies for the editorial surface area AI engines rely on.
Being cited on ChatGPT does not mean being cited on Gemini.
±3.0 pp
between ChatGPT and Gemini SoV for the same brand.
88.1%
of prompts where the three engines do not agree on the #1 brand.
31.5%
of the leader's presence is single-engine — fragile to a single model update.
Physical market share — the ranking inside a pharmacy aisle — does not predict AI visibility. Training corpora and grounding mixes are the upstream determinants.
12.8%
of brands appear on only 1 of 3 engines.
74.5%
of brands appear on all 3 engines simultaneously.
±0.84
avg std-dev of a brand's rank across engines where it is cited.
Same brands, different rankings. 74.5% of brands appear on all 3 engines — but only 11.9% are agreed upon as #1 (see §07). The engines share a vocabulary but not a hierarchy.
AI mentions brands that our dictionary does not know.
40.2%
of AI responses contain at least one brand our dictionary doesn't cover.
1.08
untracked brand mentions per response.
1957
citations in this sample that we could not map to any tracked brand.
- 1 Unknown #1 139 mentions
- 2 Unknown #2 122 mentions
- 3 Unknown #3 84 mentions
- 4 Unknown #4 66 mentions
- 5 Unknown #5 58 mentions
Not all are hallucinations in the strict sense — some are real brands that escaped our dictionary (dictionary expansion is the fix). But a non-trivial share are products invented on the fly by the engine, especially on comparative or persona-based queries. Product misattribution — where a real brand is credited with a product it does not make — is a second-order risk that manual sampling keeps surfacing. Both will be tracked explicitly in the Premium tier.
Five patterns that separate cited brands from the rest.
-
A / CONSISTENCY
Consistency beats peak visibility.
A brand cited 12 times out of 60 responses with stable phrasing outperforms a brand cited 18 times with erratic positioning. AI engines reward editorial coherence — the same signal, repeated.
-
B / AUTHORITY
Press santé authority beats paid content.
The editorial surface AI grounds on is dominated by long-established verticals. A single mention on a trusted health publisher moves more weight than a dozen blog placements. Budget flows into the wrong places.
-
C / LANGUAGE
English content hurts French visibility.
French-language queries draw on French-language training data first. Brands that translate their US content instead of producing original French editorial pay a visibility tax that compounds every quarter.
-
D / CONSENSUS
Cross-engine consensus is the new brand equity signal.
Being cited on one engine is luck. Being cited on two is positioning. Being cited on all three on the same prompt is moat. Only 12% of prompts show that kind of agreement today — the highest-value target in AEO is moving from 1/3 to 3/3.
-
E / ABSENCE
Not being cited is worse than being cited last.
Rank 7 is a conversion problem. Absence is an identity problem. The 0 brands in the dictionary that did not surface once in 1809 responses face a structural visibility issue before any optimization can help.
- Category
- Skincare anti-age (France)
- Corpus
- 201 shopping queries · 47 brands tracked · 47 observed in this run
- Prompt design
- 190 generic queries (pure intent, constrained, problem–solution, comparative, persona-based) + 11 targeted queries (5 age, 6 price). Authored in French by native speakers; no machine translation from US prompts.
- Engines
- Claude (claude-sonnet-4-5) · Gemini (gemini-2.5-flash) · ChatGPT (gpt-4o)
- Responses analyzed
- 1809 successful, error-free calls (201 × 3 engines × 3 runs)
- Citations extracted
- 5780 total · 3823 tracked · 1957 untracked (66.1% tracking rate)
- Avg response length
- ~337 words overall. Per-engine: Claude ~178, Gemini ~559, ChatGPT ~273.
- Avg brand density
- 3.2 brand mentions per response overall. Per-engine: Claude 2.34, Gemini 2.17, ChatGPT 1.83 brands / response (tracked only).
- Extraction
- Claude tool_use, JSON schema-enforced. One structured extraction per response. Retries with exponential backoff on transient provider errors.
- Batch cost
- $5.57 generation + ~$27.76 extraction = ~$33.33 total API spend on this corpus.
- Concurrency
- 5 per engine at generation. Extraction throttled to 2 concurrent to stay under the Claude Sonnet 4.5 rate cap (50 RPM · 8,000 output tokens/min).
- Anonymization
- No individual brand is named in this document. All brands mapped to stable aliases "Brand A"…"Brand Z", "Brand AA"… so rank positions can be followed without disclosing identities.
- Conflicts
- Zero commercial relationship with any tracked brand, engine provider or publisher.
- Sentiment
- Not covered in this edition — sentiment analysis ships in the Premium tier.
- Engine response length (median / max)
- Claude: median 178 / max 231 words · Gemini: median 544 / max 1932 words · ChatGPT: median 268 / max 453 words
- Listing behavior
- Share of responses listing ≥3 tracked brands: Claude 42.8% · Gemini 32.3% · ChatGPT 33.0%. The "discuss vs list" split reveals engine conversational style.
- Leader-first rate
- Share of rank-1 citations where the engine's top brand (on this corpus) appears first: Claude 14.1% · Gemini 9.7% · ChatGPT 22.9%.
- Batch start / end
- 21 April 2026, 08:29 UTC → 21 April 2026, 08:57 UTC
- Run duration
- Batch generation: 27m09s. Extraction completed separately; throttled by Sonnet 4.5 output rate cap. Total wall clock (generation + extraction) approx. 3h 30m.
- Token consumption
- 45 684 input tokens · 953 022 output tokens. Per engine (output): Claude 222 881 · Gemini 516 733 · ChatGPT 213 408.
- Rate-limit retries
- The extract pass encountered rate-limit responses on the Claude Sonnet 4.5 output cap (8,000 output tokens/min · 50 RPM), which forced a drop from concurrency 8 to 2 mid-run. All rate-limited calls were auto-retried with exponential backoff; final success rate 1809/1809.
- Reproducibility
- Run identifier 81fc7648-28a2-42a0-9827-5ae60b9b1f08. Raw responses, structured extractions and publisher-resolution cache retained. Full dataset available on request.
- Field date
- 21 April 2026 — 21 April 2026
- Refresh
- Quarterly.