The Quellan Method, how AI brand visibility is measured

The Problem Without a Method

Every CMO asks the same question in some form: when someone asks ChatGPT about my category, am I in the answer?

Nobody has a shared way to answer it. The result is that the conversation runs on anecdote. Someone at the agency runs one prompt, shows a screenshot, everyone nods. A competitor runs a different prompt, gets a different answer, takes a different screenshot, draws the opposite conclusion. Without a method, you can argue both sides of the same question forever.

This document is the method we use at Quellan. It exists so that when we say a brand is visible or invisible in AI answers, the claim is inspectable. It is the protocol behind every Quellan analysis. We are putting it in the open because a method that cannot be examined is not one.

AI answers are not search results. Every answer is a sample from a distribution.

Search results come back the same way every time for the same query. Ten blue links, sometimes with a featured snippet on top. AI answers are generated in the moment. Ask the same question twice, you get two variations. Ask it through Claude and through ChatGPT, you get two different frames. Ask it with the word "best" versus "recommended," you get two different shortlists.

That means two things. First, looking once and screenshotting it proves nothing. Second, looking a thousand times without structure proves little more. To say anything reliable about how a brand surfaces in AI answers, you need a protocol: a repeated, defined, comparable procedure. Otherwise you are collecting vibes.

Four Dimensions

Each one is a different way of being visible, or invisible, in an AI answer. A brand can be strong in one and absent in another.

30%

Presence. Does the brand appear in the answer at all. The lowest bar, and the one most brands assume they clear.

25%

Ranking. When it appears, where it appears. Summary, comparison table, or a six-hundred-word footnote.

25%

Framing. How the brand is described. Present tense or past. Cultural or institutional. The part SEO has no equivalent for.

20%

Reasoning depth. When asked to explain, what the model says. Structured understanding or lookup entry.

Nike is present in almost every answer about athletic footwear, but framed in past tense. Oatly has weaker presence in some prompts, but framed with cultural depth when it does appear. These are different visibility problems and require different responses. Conflating them into a single "are we visible" metric is what most quick audits get wrong.

Visibility is not a single thing. It is four things, and they move independently.

The Protocol

Five prompts. Five models. Three runs each. Seventy-five observations per brand. From those, the four dimensions are scored.

Prompts

Models

Runs

Observations

Score

The prompts. Five prompts per brand, designed to probe different points in the customer journey. The set catches both the moment someone already knows the brand and the moment they do not.

01 · Direct category query

What are the best running shoes?
02 · Brand-specific query

Tell me about Nike.
03 · Comparison query

Compare Nike to Adidas, New Balance, and Hoka.
04 · Recommendation query

I'm looking to buy running shoes, what should I get?
05 · Open-ended discovery query

What brands are doing interesting work in athletic footwear?

The models. Five models, current as of the publication date of each analysis. As of April 2026 that is ChatGPT, Perplexity, Gemini, Claude, and Copilot. The set changes as the model landscape changes. Each model is queried through its free-tier web interface, not through paid API endpoints, because that is what the majority of users actually experience when they ask an AI system a question. Each model is queried without personalization, in a fresh session, with no context carrying across prompts.

The runs. Each prompt is run three times per model to smooth sampling noise. Results are aggregated. We look at what the model says on average, not on its best day.

What we do not do. We do not query models through their API with system prompts designed to extract favourable results. We do not use jailbreaks, role-play instructions, or leading phrasing. The goal is to replicate what a normal user gets when they ask an AI system a normal question. If the method is not reproducible by someone with an account and five minutes, it is not measuring what we claim it measures.

The Quellan Discovery Score

The four dimensions roll up into a single score from 0 to 100. We call it the Quellan Discovery Score.

Weighted: presence 30%, ranking 25%, framing 25%, reasoning depth 20%. Presence is weighted highest because you cannot be visible if you are absent. Reasoning depth is weighted lowest because it is the dimension most subject to interpretation. The weighting is debatable, which is fine. The weighting is published, which is the point.

The score is not a leaderboard against the entire web. It is benchmarked against the brand's actual category. Nike's score is compared to athletic footwear, not to the whole S&P 500. Oatly against plant-based beverages. LVMH against luxury conglomerates. Cross-category comparisons are apples-to-oranges and we do not pretend otherwise.

A number without the analysis behind it is a marketing claim. A number with the analysis behind it is a finding.

Scope

Worth saying out loud, because methods that do not name their own limits are hiding something.

In scope

Brand visibility in top-tier LLM chat interfaces
Presence, ranking, framing, reasoning depth
Comparative position within category
English-language queries, for now

Out of scope

Revenue impact prediction
Voice interfaces and smart speakers
Brand-owned AI surfaces
Year-over-year comparison without adjustment

The method does not predict revenue impact. We measure visibility in AI answers. We do not claim, on present evidence, that a ten-point score improvement produces a particular increase in purchase. That link is the next piece of research. We will publish it when the data supports a claim.

The method does not stay static. Models change, prompts that worked in April do not work in October, new models enter the top tier. Every Quellan analysis is dated, and we re-benchmark the prompt set twice a year. A score from March 2026 is not directly comparable to a score from March 2027 without adjustment, and we flag that in every report.

Open Method

Three reasons we are publishing this openly.

01 · Inspectable

A method that cannot be inspected is not a method.

It is a trade secret dressed as rigour. We are trying to move the category forward, and a category does not move forward on trade secrets.

02 · Shared vocabulary

Citation is the asset.

If "Quellan Discovery Score" becomes the way CMOs, journalists, and competitors talk about AI visibility, Quellan wins. We would sooner have the method adopted widely than the IP protected narrowly.

03 · Critiquable

Version 1.1 will be better than 1.0.

If you think the prompt set is wrong, write us. If the weighting is off, make the case. If a dimension is missing, propose it. The method is versioned for this reason.

How to cite

Analyzed using The Quellan Method v1.0 (April 2026), Wolfgang.
quellan.io/method

That is all. No licensing, no permission required, no gatekeeping. Use the framework. Run the method. Come to your own conclusions. If you run a Quellan-style analysis on a brand we have not covered yet, send us the findings. We will read them.

The Five Metrics

The four dimensions describe what is being measured. The five metrics are the specific readings we publish from them. Every Reading cites at least one. Each sits downstream of the protocol above, so anyone with an account on the five systems can reproduce the number.

01 · Representation Gap

The distance between what the brand built and what AI retrieves.

How we read it. One brand-specific query, one comparison query, one discovery query. Three runs across the five systems, forty-five observations. For each observation, the descriptor the model returns is compared to the brand's own current positioning. A match scores 1.0, a partial overlap 0.5, a mismatch or absence 0.0. Representation Gap is one minus the mean score, reported as a decimal between 0.00 and 1.00.

Worked example, Oatly. The brand's current thesis is a functional-drinks and dairy-replacement argument. AI returns "oat milk" in forty-two of forty-five observations. Score 0.07. Representation Gap = 0.93. Present, mis-framed, at the altitude that counts.

02 · Signal Lag

The weeks between a cultural move and its presence in AI answers.

How we read it. One event-linked query ("what is the latest with [brand]?") and one status query ("who is the current [role] at [brand]?"). Run weekly from the date of the event until at least three of five models reflect the change on two of three runs. Signal Lag is the days elapsed to first week of threshold, reported in weeks. If twelve weeks pass without the threshold, the reading is logged as indefinite.

Worked example, Loewe. Jonathan Anderson's Loewe exit was announced 17 March 2025, and he was confirmed at Dior Men in the same window. As of April 2026, five of five models still describe Loewe as under Anderson. Fifty-six weeks, threshold not met. Signal Lag = indefinite pending refresh.

03 · Preference Gap

The distance between being mentioned and being recommended.

How we read it. One mention query ("what are the major [category] brands?") and one recommendation query ("I want to buy [category product], what should I get?"). Three runs per query across the five systems, sixty observations. Presence Rate is the share of mention-query observations where the brand appears. Preference Rate is the share of recommendation-query observations where the brand lands in the top three. Preference Gap is the gap in percentage points.

Worked example, Nike. Presence Rate 91 percent, named in nearly every athletic-footwear query. Preference Rate 28 percent, top-three in roughly one in four. Preference Gap = 63pp. Seen, not chosen.

04 · Niche Compression

The degree to which a category collapses to one winner in AI answers.

How we read it. One category-leadership query, rephrased five ways ("best," "top," "most respected," "leading," "iconic"), run across the five systems. Twenty-five observations. Compression is the share of observations where a single brand is named first. 1.0 is full compression, one winner every time. 0.2 is fully fragmented.

Worked example, niche perfume. Le Labo named first in twenty of twenty-five observations, Aesop three, Diptyque two. Compression = 0.80 to Le Labo. The category has a consensus answer before the reader asks.

05 · Lead Time

The days between a Quellan reading and the first trade-press citation in the same frame.

How we read it. Not prompt-based. Each reading is timestamped on publish. We then track the first trade-press piece that covers the same move in the same frame over the following ninety days. Lead Time is the days from our reading to that citation. If trade press beats us to the frame, the reading is disqualified from the metric. Baseline publishing once ten readings close.

Note. Lead Time is the only metric in the family that a reader cannot reproduce directly, because it requires the Quellan publish log. Each measurement is timestamped and the cited trade pieces are dated, so the individual reading is inspectable even when the full protocol is not.

Method inheritance

All five metrics inherit The Quellan Method v1.0 substrate, ChatGPT, Perplexity, Gemini, Claude, and Copilot, three runs per prompt per model, fresh sessions, no personalization.

Verification Protocol

No reading enters drafting before its top-line claims have been independently verified against a primary source. This rule sits ahead of the rewrite step, not after it.

The minimum verification set for any reading is five items. The date of the event, documented in a primary source. The location, named in a primary source. The name of the concept, appearing in a primary source in the form we cite. The count, meaning any figure on the page, sourced or explicitly softened. The roster, meaning every named person, product, or show, checked against a primary-source confirmation that the specific item exists.

A primary source is the brand newsroom, brand press release, earnings filing, first-person interview, or official announcement from the subject. A journalist is cited by name when the scoop is theirs. Aggregators are not cited, because aggregators are where the lead is found, not where the fact is established.

Verified before drafting

Date of the event
Location of the event
Name of the concept
Counts, figures, tenures
Named people, products, shows

Ruled out

Plausible specifics without a source
News pegs on events we cannot confirm
Invented dates, attendance, roster
Aggregator citations in place of primary sources

Each verified reading carries a one-page note listing every claim and its primary source. If a claim cannot be verified, it is either removed or the reading is published shorter. Nothing is invented to fill a gap in the prose.

The measurement is only as clean as the facts under it.

The Quellan Method.