White Paper · AEO Research

The Two-Clock Model: A Research Paper on the Perception Gap in Answer Engine Optimization

AISearch Global — Framework Research · July 2026 · 8 pages · 6 sources cited
Read the shorter article
Abstract. An AEO score measures whether a site can be read by machines. It does not measure whether AI platforms have updated what they say about the business behind that site. This paper names and examines the space between those two measurements: the perception gap. Drawing on recent research into knowledge cutoffs, temporal misalignment, retrieval freshness, and generative engine optimisation, we argue the gap is a structural feature of how large language models and retrieval systems work, not evidence that optimisation work has failed. We define the two clocks that produce it, walk through the mechanisms behind each, present client data illustrating the gap in practice, and set out the three levers that narrow it over time: citation building, external mentions, and answer-format content. We close with a working diagnostic for telling an expected lag apart from a genuine problem.

1. Introduction

A business runs an AEO program. Its structural score climbs from the teens into the nineties within weeks. Then someone asks ChatGPT, Perplexity, or Gemini about that business, and the model either says nothing or gets it wrong. The natural conclusion is that the AEO work didn't take. That conclusion is usually wrong, and it's wrong because it treats two separate questions as one.

The first question is whether a site is machine-readable: schema in place, entities defined, content structured so a crawler or retriever can parse it. The second question is whether an AI platform has actually updated what it knows and says about the business. Most AEO reporting only ever answers the first question, because it's the one that's easy to measure and moves on a timeline the practitioner controls. The second question is harder, slower, and governed by systems no agency or business can directly operate.

This paper is about the space between those two answers — what we call the perception gap. It is not a general survey of AEO tactics. It is an explanation of one specific, underexamined mechanism: why structural progress and AI recognition move at different speeds, what research into how language models actually work tells us about why that happens, and how to tell a normal lag apart from a genuine failure.

2. Why Machines Don't Update on Your Schedule

Five separate strands of research point at the same conclusion from different angles: the systems that decide what an AI platform says about a business do not, and structurally cannot, move at the speed a website can be edited.

2.1 Knowledge cutoffs are not a clean line

AI providers publish a knowledge cutoff date alongside each model release, and it's treated as a hard boundary. Cheng et al. (2024) show it isn't one. Their analysis of open pretraining datasets finds that a model's demonstrated knowledge for a given topic or resource often diverges from its stated cutoff, for two reasons: older material persists inside newer training data dumps than the dump dates suggest, and deduplication processes that remove near-identical text can inadvertently strip out or distort recently updated pages. The practical result is that "the model was trained up to [date]" tells you far less than it appears to about what the model actually knows, or when, about a specific entity.

2.2 Models default to older knowledge even when newer knowledge exists

Zhao et al. (2024) tested this directly. Using 20,000 time-sensitive questions spanning 2000–2023, they found that models such as LLaMA2 — despite a pretraining cutoff in 2022 — mostly answered as though their knowledge stopped around 2019. Deliberately realigning the model to its actual cutoff year improved answer accuracy by up to 62%, but that realignment is a research intervention, not a feature switched on inside consumer products. In practice, a model's internal sense of "now" runs behind its stated cutoff by default. For a business whose most relevant activity — new positioning, new offers, recent coverage — happened in the last year, that default lag matters more than the headline cutoff date.

2.3 Facts decay at different rates, and the model doesn't sort them

Zhang and Choi (2023) introduce the idea of fact duration: some facts are stable for years, others change within months, and a model has no built-in way to tell which is which. A business's name and industry category are durable facts. Its current positioning, differentiators, and recent work are volatile facts — and volatile facts are exactly what determines whether a model recommends one business over a competitor. The category of information that matters most for AI visibility is the category most prone to staying stale in a model's memory.

2.4 Retrieval doesn't solve this on its own

Modern AI search products increasingly ground answers in retrieval — pulling from an index rather than relying purely on trained-in memory. That helps, but it isn't real-time. Current research on retrieval system design describes a persistent staleness–latency–cost tradeoff: most production retrieval pipelines refresh on a fixed cycle rather than continuously, which creates windows where the index is out of date by design, not by accident. The index behind an AI platform's answer updates on its own cadence — not the moment a business publishes something new.

2.5 New and smaller entities carry the gap longest

Badhe, Shah, and Kathrotia (2026) describe how long-tail knowledge — information that's sparse, low-frequency, or newly formed relative to a model's training data — is disproportionately lost to mechanisms like gradient dilution and representational interference during training. A business three months into an AEO program is, from a model's point of view, exactly this kind of long-tail entity: thin, recent, under-represented relative to larger or longer-established competitors. That isn't a failure of the AEO work. It's a description of where a young or small brand structurally sits in a model's knowledge, until enough external signal accumulates to move it out of the tail.

2.6 Structural optimisation is real, and it works — on its own axis

None of this means structural work is wasted. Aggarwal et al.'s GEO paper (2024), presented at KDD, is the strongest available evidence that content-level interventions — adding citations, statistics, quotations, and clearer structure — can lift a page's visibility inside generative engine responses by up to 40% in controlled benchmark testing. That's the fast clock, empirically confirmed. What the GEO paper doesn't claim, because it isn't what it measures, is how quickly any specific production model updates its live output for any specific business after those changes are made. That's the slow clock's territory, and it's a different question.

2.7 Even "visible" isn't a fixed state

Schulte, Bleeker, and Kaufmann (2026) measured how stable AI search answers are across repeated identical queries. Cited-source overlap between separate runs of the same query was only 32–43%. Day-to-day brand overlap averaged 45–59%. Their conclusion: AI search visibility is a distribution, not a single-point outcome, and a one-off check is not a reliable read on it. This is a third reason the perception clock looks slow and uneven rather than a clean line climbing upward — because even where a business is being surfaced, that surfacing is probabilistic, not a switch that stays flipped.

3. The Two-Clock Model

Put together, the mechanisms above describe two independent systems running in parallel whenever AEO work is done. They don't run at the same speed, and only one of them takes direction from the business doing the work.

DimensionStructural clockPerception clock
What it isHow machine-readable the site is: schema, entity signals, content format, crawlabilityWhat AI platforms actually know and cite about the business
What moves itDirect edits to the siteTraining cycles, retrieval refresh, citation accumulation
Who controls itThe business / its agencyThe AI platforms
Typical speedDays to weeksWeeks to months
Supporting evidenceAggarwal et al., 2024 (GEO)Cheng et al. 2024; Zhao et al. 2024; Zhang & Choi 2023; Schulte et al. 2026

The structural clock is the one every AEO checklist measures, because it's the one that responds directly to work. Schema goes in, signals fire, the score moves — visibly, often daily. The perception clock is different in kind, not just speed. Publishing does not produce an instant update in what a model knows or says, because that update depends on cycles the business doesn't operate: when a model was last trained, when a retrieval index was last refreshed, and how much third-party citation has accumulated in the meantime.

4. The Perception Gap

The perception gap is the space between a high structural score and low, absent, or inaccurate AI recognition. It is the central concept in this paper, because it's the point where most businesses — and most agencies — misdiagnose what's actually happening.

4.1 Definition

Formally: the perception gap is the measurable difference, at a given point in time, between a business's structural AEO score and its AI perception score across the platforms it's being tracked against. It is not a single number so much as a widening-then-narrowing curve over the life of an AEO engagement.

4.2 Why the gap is the expected state, not a failure

Every mechanism reviewed in Section 2 points the same direction. Stated cutoffs don't match effective cutoffs (Cheng et al.). Models default to knowledge older than even their stated cutoff (Zhao et al.). The facts that matter most for AI recommendation are the ones most prone to staying stale (Zhang & Choi). Retrieval indexes refresh on a schedule, not on demand. New or small entities sit in the long tail of a model's knowledge until enough external signal pulls them out (Badhe, Shah & Kathrotia). And even where a business is being surfaced, that surfacing is a probability distribution, not a fixed state (Schulte, Bleeker & Kaufmann).

Taken together, these findings describe a perception clock that cannot move at the structural clock's speed, by construction. A wide gap in the early months of an AEO engagement is not a sign that the work failed. It is the default, predictable output of how these systems are built.

4.3 When the gap becomes a genuine signal

The gap stops being routine and starts being a real problem when it fails to narrow over an extended period despite sustained citation-building effort. Based on client implementation data (Section 5), the gap is typically widest around months one and a half to three, when structural work is essentially complete but AI recognition has barely moved — and it should show measurable narrowing by month six. If it hasn't narrowed at all by that point, the diagnostic questions are:

  • Is the business actually accumulating third-party citations and mentions, or only publishing on its own site? Self-published content does not move the perception clock the way external validation does.
  • Is this a category-wide pattern or a business-specific one? If competitors in the same space are equally unrecognised, the lag may simply reflect how slowly that category is covered by AI training and retrieval sources generally.
  • Is there conflicting or negative information elsewhere online that a model may be weighting instead of, or alongside, the business's own material?

5. Empirical Illustration: Client Zero

AISearch Global's own implementation data offers a single working illustration of the two clocks in practice. Over a twelve-month engagement, the structural score moved from 19 to 70 within the first month and past 90 by month two, then plateaued — the fast clock, moving exactly as Section 2.6's research predicts a well-executed structural program should. The perception score across the three AI platforms tracked (36, 30, and 41 respectively at the low point) barely moved at the same stage. It took roughly nine months for perception to reach where the structural score already stood in week two.

This is one case, not a controlled study, and Schulte, Bleeker, and Kaufmann's (2026) finding that AI visibility is a distribution rather than a single-point outcome is a reason to treat any individual timeline as illustrative rather than predictive. AEO and GEO outcomes vary by market, competitive density, and which AI platforms a business is being measured against. The value of the Client Zero data isn't the specific numbers — it's the shape of the curve: fast, early structural gains; a wide, slow-closing perception gap; and gradual convergence over roughly two to three quarters, driven by accumulating external signal rather than further site edits. See the full case study at Client Zero and the live Visibility Audit Dashboard.

6. Closing the Gap: Three Levers

The perception clock cannot be forced to run faster directly — there is no setting that makes a model re-learn a business sooner. It can only be influenced indirectly, by giving AI platforms more high-quality material to find, evaluate, and reference. Three levers do this, and each maps onto a specific mechanism from Section 2.

6.1 Citation building

Getting referenced by external sources AI platforms already trust — directories, industry publications, review platforms — gives retrieval indexes fresh, authoritative material to surface the next time they refresh. This addresses the retrieval-staleness mechanism in Section 2.4 directly: it doesn't make the index refresh faster, but it improves what's waiting to be picked up when it does.

6.2 External mentions

Independent third-party coverage is a stronger perception signal than anything a business publishes about itself. This is the most direct countermeasure to the long-tail knowledge problem in Section 2.5: a business becomes less long-tail, in a model's terms, in proportion to how much other, independent sources are saying about it. Models weight what others say about a business more heavily than what it says about itself.

6.3 Answer-format content

Content written to directly answer the specific questions a business's customers actually ask gets picked up and cited more readily than generic marketing copy. This is the practical application of the GEO findings in Section 2.6 — structuring content the way Aggarwal et al.'s benchmark shows measurably improves inclusion in generated answers. This is precisely what the Citation Consistency layer of the AEO Traction Stack is built for — the layer that starts the perception clock in earnest, rather than only optimising the page itself.

7. Implications for AEO/GEO Practice

For agencies: sell and report on two workstreams, not one. A single AEO score, presented alone, sets a client up to expect a linear result the underlying systems cannot produce. Reporting both scores side by side, with the gap explained as a mechanism rather than a mystery, is what separates a structural-fix vendor from a genuine AEO practice.

For businesses: measure both clocks, expect the gap to be widest early, and treat premature abandonment as the largest single cause of wasted AEO spend. Walking away at month two, because the structural score looks finished and AI recognition doesn't yet reflect it, cuts the engagement off exactly when the citation-building work that closes the gap is meant to begin.

For measurement generally: Schulte, Bleeker, and Kaufmann's finding that cited-source overlap runs as low as 32–43% across repeated identical queries means a single perception check, at a single point in time, is not a reliable read. Any claim about where a business stands on AI recognition should be based on repeated measurement across time and across prompts, not one dashboard snapshot.

8. Limitations

The Client Zero data in Section 5 is a single illustrative case, not a controlled study, and the timelines it shows should not be read as a guarantee for any other business, market, or competitive set.

The research cited in Section 2 studies model behaviour at a general level. Publicly benchmarked, mechanism-level findings about any single production consumer AI product — ChatGPT, Gemini, or Perplexity specifically, at any given moment — are far scarcer, and those platforms continue to change their training and retrieval systems on their own timelines. The specific figures cited here will keep shifting; the underlying structural claim — that two independent clocks exist and move at different speeds — is what this paper argues holds regardless of which figures are current.

9. Conclusion

An AEO score and an AI's actual recognition of a business are answers to two different questions, produced by two different systems, on two different timelines. The structural clock is fast, direct, and fully within a business's control. The perception clock is slow, indirect, and governed by training cycles, retrieval refresh schedules, and citation accumulation that belong to the platforms, not the business. The space between them — the perception gap — is not a sign the work has failed. It is the normal, well-evidenced relationship between these two clocks in the early months of any AEO engagement, and it closes through sustained citation building, external mentions, and answer-format content, not through more schema.

Frequently Asked Questions

What is the two-clock model in AEO?

It describes two parallel timelines in AEO work. The structural clock measures how machine-readable a site is and moves fast, in days or weeks, under the business's direct control. The perception clock measures what AI platforms actually know and cite about the business, governed by training cycles and retrieval refresh schedules, and moves slowly, often over months, outside the business's control.

Why did our AEO score improve but AI still doesn't recognise us?

This is the expected result of the perception gap, not a failure. Structural work moves the fast clock. AI platforms only update what they know about a business on their own training and retrieval cycles — the slow clock — which lags behind even after structural work is complete.

Can the perception clock be sped up directly?

Not directly. There is no setting that makes a model re-learn a business faster. It can only be influenced indirectly, through citable content and accumulating external mentions over time.

Is a gap between AEO score and AI recognition a sign something is broken?

No. A gap between a high structural score and low AI recognition is the normal state early in AEO work. It only signals a genuine problem if it fails to narrow over an extended period — roughly six months — despite sustained citation-building effort.

Why does a single AI-visibility check sometimes contradict a check done a week earlier?

Because AI search answers are probabilistic. Research on repeated identical queries (Schulte, Bleeker & Kaufmann, 2026) found cited-source overlap of only 32–43% between runs, and day-to-day brand-mention overlap of 45–59%. A single check is a sample from a distribution, not a fixed reading — which is why perception should be tracked over repeated measurements, not one snapshot.

References

  1. Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24). arXiv:2311.09735.
  2. Badhe, S., Shah, D., & Kathrotia, N. (2026). Long-Tail Knowledge in Large Language Models: Taxonomy, Mechanisms, Interventions and Implications. arXiv:2602.16201.
  3. Cheng, J., Marone, M., Weller, O., Lawrie, D., Khashabi, D., & Van Durme, B. (2024). Dated Data: Tracing Knowledge Cutoffs in Large Language Models. arXiv:2403.12958.
  4. Schulte, J., Bleeker, M., & Kaufmann, P. (2026). Don't Measure Once: Measuring Visibility in AI Search (GEO). arXiv:2604.07585.
  5. Zhang, M. J. Q., & Choi, E. (2023). Mitigating Temporal Misalignment by Discarding Outdated Facts. Proceedings of EMNLP 2023. arXiv:2305.14824.
  6. Zhao, B., Brumbaugh, Z., Wang, Y., Hajishirzi, H., & Smith, N. A. (2024). Set the Clock: Temporal Alignment of Pretrained Language Models. Findings of the Association for Computational Linguistics: ACL 2024. arXiv:2402.16797.
  7. AISearch Global (2026). The Two-Clock Model: Why Your AEO Score and AI Perception Move at Different Speeds.
  8. AISearch Global (2026). AEO Traction Stack.
  9. AISearch Global (2026). AISearch Global: Client Zero.

Want the research applied to your business?

An AI Visibility Audit shows exactly where your structural score and AI perception stand today, side by side.

Book an AI Visibility Audit