A Mental Model for How LLMs Actually Choose (For Marketers)

Visibility Architecture: How It All Fits Together Sorilbran Stone · Five-Talent Strategy House · 2026

3 Retrieval Failures.
3 LLM Behaviors.
3 Visibility Tiers.

You can access the full hub by clicking the button below.

What This Is

Between February and April 2026, three interconnected concepts landed in the same place at the same time for me. They weren’t planned as a stack. They emerged from the work — from multiple fetch tool failures, a conversation with a stubborn AI, a disambiguation campaign, testing a new service in my consultancy, and a content creation session that went 80 minutes longer than expected.

What they add up to is a complete operating system for AI visibility strategy. A mental model for LLM behavior optimization – essentially, how machines find, evaluate, and choose who to recommend, and what you do about each layer.

This is a mental model that I’m still thinking through and iteratively testing. But it’s formed enough to help you if you’re a practitioner struggling to figure out where AI visibility fits.

Read This First: The Map

Two frameworks. Six concepts. One system. Here’s the full picture before you read anything else. This document is organized around two frameworks that describe the same system from two different angles.

The behavioral framework — what LLMs do:

Recall — the machine answers from encoded knowledge, without retrieving anything. Retrieval — the machine goes out to find a current answer in real time. Inference — the machine decides it already knows enough and constructs an answer without retrieving.

The positional framework — where your brand sits in the machine’s understanding:

Legibility — the machine can identify you accurately. It knows who you are. Eligibility — the machine considers you credible enough to include in an answer. Recommendability — the machine chooses you when constructing an answer.

The behaviors tell you what the machine is doing. The tiers tell you what it thinks of you while it’s doing it. A Legibility gap means recall fails. An Eligibility gap means inference gets it wrong. A Recommendability gap means retrieval finds you but can’t use you.

The retrieval failures section is where both systems break down in the real world. The behaviors section is the terrain map. The tiers section is the destination map. Read in sequence for the full picture. Jump to the section where you’re most stuck if you need triage.

This page is dense. But it contains links to more anecdotal, narrative-driven breakouts of each topic. Let’s get into it.

The AEO Gap We Sense, But MisIdentify

Why current AEO strategy is incomplete — and what it’s missing.

It’s likely we haven’t yet learned to give proper weight to the fundamental differences in how large language models behave compared to search engines — to say nothing of the generative nature of the tech itself. Same query, same browser, same user across different systems will turn up different answers every time.

I would venture a bet that AEO strategies — many of them, at least — are fundamentally SEO tactics. Only the outcomes are now monitored in AI visibility tracking tools, or in AI visibility tracking features inside legacy SEO tools.

By now, most marketers would say that SEO delivers better results and greater ROI than paid ads — between $5 and $10 for every $1 spent, according to FirstPageSage.com. But the return on AEO? Largely undocumented. We haven’t yet reached the point where we have a significant body of historical data from which to confidently build repeatable processes.

We’re figuring it out.

The terminology alone tells the story. Fewer than a third of practitioners used consistent terminology throughout 2025 — the same discipline gets called GEO, AEO, GSO, LLMO, and AI visibility depending on who’s writing. The tools changed, the dashboards changed, the vocabulary changed. The underlying strategic logic, for most practitioners, stayed the same. Visibility Engineer – that’s what I started calling myself after it became clear I was no longer a content marketing coordinator.

The big problem is that search engines and large language models don’t behave the same way. While we understand how a search engine indexes and ranks, an LLM does three structurally different things — and it may do any one of them at any given moment.

In the absence of historical data and transparency, the only stable anchor is behavior. Engineer visibility around behaviors — both your ideal customer’s and the LLMs they use to find you.

There are three.

The Three LLM Behaviors

Understanding these three behaviors is the difference between optimizing for one thing and optimizing for the whole system.

Recall — what the machine already knows

The machine answers from what it already knows. No search, no retrieval, no real-time pull. This is parametric knowledge — information encoded within the weights and biases of the model during training. It’s what shapes how a machine understands the world, recognizes entities, and forms default answers. When a machine knows who you are without having to look you up, that’s recall doing its job.

Recall is the long game. Training data is a lagging indicator — what got encoded reflects a version of the world that existed at some point before the model was released. A 2024 Johns Hopkins study found that effective knowledge cutoffs frequently lag stated cutoffs by years, not months, because training datasets pull heavily from cached and deduplicated older versions of web content. The cadence gap between what the machine encoded and what’s actually true right now isn’t a fixed window. It’s industry-dependent. In stable industries, a six to twelve month lag is manageable. In fast-moving industries — technology, finance, anything disrupted in the last eighteen months — training data may be encoding a world that no longer exists.

For a technical breakdown of how training data, retrieved data, and live web data function as distinct layers in AI systems, this piece from Firecrawl is worth the read — it’s the developer-facing mirror of what this document maps for brand and visibility strategy.

This makes recall a behavior you manage through probability, not precision. The strategic play: show up consistently in the kinds of sources training data pulls from, so that when the next training cycle runs, there’s a reasonable chance your ideas get encoded. You can’t control what gets in. You can increase the odds.

Retrieval — what the machine goes out to find

The machine goes out to find a current answer in real time. This is the layer most AEO strategy is already teaching you to optimize for. Be crawlable. Be findable. Be structured in a way machines can parse.

But retrieval carries more strategic weight than most practitioners are currently giving it — specifically in fast-moving industries. When training data is encoding a lagging version of reality, retrieval becomes the primary accuracy mechanism. It’s the correction layer. It’s how the machine updates what it thinks it knows.

Retrieval is also the layer most responsible for personalized answers. When someone asks an AI to recommend something and the answer feels tailored to their specific context, that’s retrieval doing its job. That’s important to understand when you’re trying to optimize to show up consistently in “best of” and “recommend me” queries. Those queries are about who has the most relevant signal for that specific user, at that specific moment, in that specific model.

That’s SO important to understand.

Worth noting: ChatGPT mentions brands 3.2x more often than it provides clickable citations, making brand recognition — not just content volume — the primary driver of AI visibility. That’s the generative in Generative AI. The answer isn’t static. It’s constructed fresh every time.

Inference — what the machine constructs when it decides it already knows

The machine decides it already knows enough and constructs an answer without retrieving anything. No error message. No flag. Just an answer — accurate or not — assembled from what the machine is already holding. This is the most dangerous failure mode because it’s the most invisible. The machine isn’t broken. It made a judgment call.

And inference is also a diagnostic. What the machine constructs when it decides not to retrieve tells you exactly where your stack is weakest. Wrong identity inference → Legibility gap. Wrong credibility inference → Eligibility gap. Wrong capability inference → Recommendability gap. The mirror shows you where the leak is.

A note on industry resistance.

Not all machines respond equally to retrieval signals across all industries. LLMs apply more scrutiny — and resist being influenced more — in high-stakes categories: health, finance, legal, safety. The barrier to eligibility is structurally higher in these spaces. A practitioner needs to account for two variables simultaneously: how fast the industry is moving — which determines how much retrieval has to compensate for lagging training data — and how resistant the machine is to being influenced in that space — which determines how high the eligibility bar actually is and what kinds of corroboration will move the needle.

The Three Retrieval Failures

Where behavior problems become visible — and what to do about each one.

Two of these announce themselves. One doesn’t.

Retrieval Failure 1: Page too dynamic to crawl → Fix the page.

The AI tried to read your page and couldn’t parse it. The architecture is too complex — too much JavaScript, too much dynamic content assembling itself in real time. The crawler sees the skeleton of the page, not the finished version.

I’ve seen this many times — with brands that have paid a web developer who’s good at designing websites, but doesn’t know enough about search marketing to understand that the site they’ve built won’t be able to generate organic traffic for the company. That means your messaging, your services, your proof points will only be seen if you tell people to go and take a look. The data will consistently show that you’re top-heavy with branded searches — disproportionately relying on being in the room, outbound, and paid ads to drum up traffic because the site remains largely invisible to the machine.

The fix is structural: simpler page architecture, static rendering, less visual complexity. A page that looks great in a browser but means nothing to a crawler is not a visible page.

Retrieval Failure 2: Site blocking bots → Fix the mirror.

The AI tried to reach your page and the site turned it away — deliberately. Major publications do this. Some platforms do it by default. When this happens, the AI doesn’t stop looking. It mirrors: it assembles an answer from whatever other sources on the web have covered, cited, or summarized the same content. Press mentions, third-party citations, partner pages, external profiles — those become the answer.

If you haven’t invested time, energy, and effort into building corroborating signals for your brand — that all-important secondary layer — and what’s known about you is thin, outdated, or wrong, that’s what someone gets when the machine returns an answer. The AI doesn’t stop to correct. It mirrors what’s there, then confidently delivers it as a response. Even if it’s completely wrong.

Fixing the mirror means building and maintaining the secondary layer intentionally — so that when machines can’t access you as the primary source of truth, what they reach for is still accurate.

Retrieval Failure 3: Machine already thinks it knows → Fix the signal.

Man, this one annoys the heck out of me. Because it doesn’t announce itself. The AI doesn’t throw an error — it just answers. Confidently. From inference rather than retrieval. It looked at the URL, the context of the query, what it already knew about the topic — and decided it probably had enough information to answer without actually going to your page. Sometimes it’s right. Often it isn’t. I’d credit this behavior with being the source of many experiences where the user ends up feeling like the machine is hallucinating.

The fix here isn’t a page fix. I initially thought that had to be it, but it’s not. And the fix isn’t a prompt to force the machine to comply — your ICP doesn’t search that way. The fix is the signal. And “fix” is such a misleading word because you’re not going to fix it — not completely. Too many moving pieces. But you can plan for it by integrating inference into your visibility strategy. The machine’s confidence level is based on what it thinks it already knows — which is shaped by how much accurate, corroborated information about you exists across the web. Build enough of that signal in the right places, and the machine is more likely to check before assuming.

Read in sequence

01 · Start here Build a Canonical Bio The first brick. Identity infrastructure. Everything else builds on top of this. 02 · Field diagnostics What to Do When the Fetch Tool Fails Three retrieval failures. Three different fixes. Here’s how to tell them apart. 03 · Terrain map 3 LLM Behaviors Your AEO Strategy Has to Optimize For Recall. Retrieval. Inference. Three behaviors, three optimization requirements. 04 · Destination map Legibility, Eligibility, Recommendability The three visibility tiers. Where you’re trying to land — and which one is your bottleneck.

sorilbran.com · Five-Talent Strategy House · 2026

The Three Visibility Tiers: Legibility, Eligibility, Recommendability

The destination map. Where you’re trying to land — and what happens when you try to skip a tier.

Each tier has a distinct definition, a distinct set of failure modes, and a distinct set of signals that move you forward. The order is non-negotiable. You cannot skip a tier. Attempting to optimize for Recommendability before Legibility is solved wastes resources and produces results that look like strategy but perform like noise.

Tier 1: Legibility — Can the machine identify you accurately without visiting your website?

Legibility is the degree to which an AI system can identify, parse, and connect the information that constitutes a person or business as a coherent entity. A legible entity is one the machine can describe accurately, consistently, and without inference.

This is entity infrastructure. My bread and butter — where I live, and why I formalized the Minimum Viable Knowledge Graph as a repeatable process. Your name, your role, your area of expertise — consistent across every platform, every profile, every place your name appears online. Schema markup that tells machines how to categorize you. A canonical bio on a domain you own that everything else points back to and agrees with.

When multiple sources say the same thing about you, machines build that picture with confidence. When sources conflict, machines hedge. Hedging language — “she claims,” “according to their website,” “details may vary” — is the machine telling anyone who asks: I’m not sure who this person is right now. Legibility is the first brick. Nothing else holds without it.

Legibility primarily shapes Recall. What the machine has encoded about you determines what it defaults to when it doesn’t retrieve. Remove a major authority signal and the machine doesn’t fail at retrieval — it reverts to whatever version of you it encoded before that signal existed. That’s a Legibility problem manifesting as a Recall failure.

The machine isn’t making this up. Content with consistent entity information across websites, social platforms, and third-party sites is 28–40% more likely to be referenced by AI systems.

Failure mode: You build for Eligibility and Recommendability, generate content, earn mentions, optimize for snippets — and the machine attributes all of it to someone else, or to a fragmented version of you that doesn’t match current reality. The work doesn’t compound. It leaks.

Tier 2: Eligibility — Does the machine consider you credible enough to include in an answer?

Eligibility is the degree to which an AI system assesses an entity as credible, authoritative, and trustworthy on a given topic. An eligible entity is one the machine determines belongs in the answer set for a specific query — not just one that can be identified, but one that has earned the right to be included.

A machine can identify you perfectly and still not include you. Legibility tells the machine who you are. Eligibility tells it whether you belong in the answer. This is the proof layer — the signals that tell a machine your expertise is real, documented, and verified by sources other than yourself. Third-party press mentions. Guest articles on credible publications. Podcast appearances. Documented client outcomes with specific, attributable results. Consistent publishing on the same topic cluster over time — because one article is a claim, but twelve articles over eighteen months is a pattern the machine reads as expertise.

Eligibility is built through corroboration. And corroboration strategy has to account for who the client is and what signals they realistically have access to. Traditional authority signals — VC funding, Crunchbase listings, research budgets, mainstream press coverage — are structurally inaccessible to many of the founders who most need visibility strategy. That’s not a personal deficit. It’s a structural gap. For these clients, compensating signals aren’t a workaround — they’re the strategy. Notion Marketplace listings. Amazon author pages. Local press. Industry association citations. Podcast networks. Community platforms. These are the authority nodes available to founders operating outside the traditional credentialing systems, and they do real work in the machine’s eligibility calculus when they’re built deliberately and consistently.

Eligibility influences Inference. When the machine decides it already knows enough to answer without retrieving, the secondary signal layer is what it reaches for. Strong corroboration means accurate inference. Thin corroboration means the mirror is wrong.

The data backs this up. Research consistently shows that 89% of AI-cited links originate from earned media — not brand-owned channels. AI systems don’t primarily learn about you from your own website. They learn from third-party coverage, analyst mentions, industry roundups, and editorial references in publications they already trust. Which means the secondary layer isn’t supplementary. It’s load-bearing.

Failure mode: The machine knows who you are but hedges — “according to their website,” “she claims,” “the company states.” Unconfirmed claims don’t get recommended. They get footnoted.

Tier 3: Recommendability — When the machine is constructing an answer, does it choose you?

Recommendability is the degree to which an AI system selects a specific entity to include in a generated answer, as distinct from merely recognizing the entity as credible. A recommendable entity is one whose content is structured, specific, and citable enough that the machine can extract and use it to answer a precise question — and attribute it with confidence rather than with hedging language.

Eligibility earns you a seat at the table. Recommendability determines whether you get called on. This is content architecture — specifically, whether your content is built in forms that AI systems can lift, use, and attribute without modification.

Definitional snippets: named concepts defined in two clean sentences that the machine can pull for direct-answer queries. Answer-first structure: leading with the answer, then supporting it, because content that buries the answer in paragraph four gets passed over for content that leads with it. Named proprietary frameworks: a named concept with a definition makes the author the definitional source — the machine cites them by name when the concept appears in a query.

This is the layer most AI visibility advice skips entirely. It assumes that if you are visible and credible, you will be recommended. That’s not true. You can be legible, eligible, and still not recommended — because your content was written for human readers who scroll and scan, not for machines that extract and cite.

Retrieval and Legibility together inform Recommendability. Both have to be working.

Failure mode: The branded search trap. Strong performance on branded queries, thin or absent performance on unbranded category queries. The machine knows who you are. It considers you credible. But when it’s constructing an answer to a question your brand should be answering, it reaches for someone whose content is structured in a form it can actually use.

How the Behaviors and Tiers Connect

This is where the two frameworks become one system.

Legibility shapes what the machine recalls. The entity structure you build — the consistency, the schema, the canonical signals — is what gets encoded and recalled by default. A weak Legibility layer means recall fails before any other behavior even activates.

Retrieval shapes how what’s recalled gets interpreted. When the machine goes out to find current information, what it finds either confirms or complicates what it already knows. Strong retrieval signals reinforce accurate recall. Weak retrieval signals leave the machine relying on whatever it encoded — which, depending on the cadence gap, may be significantly out of date.

Those two together inform Recommendability. Whether the machine chooses to include you in a generated answer depends on whether it knows you accurately and whether it found content it could actually use. Both inputs have to be working.

Inference is the pressure test for the whole stack. It doesn’t map cleanly to one tier because it’s cross-cutting — activated by weaknesses anywhere in the chain. What the machine constructs when it decides not to retrieve is a real-time diagnostic of which tier is the weakest link. Inference output isn’t just a failure to manage. It’s information.

The Diagnostic: Where Are You in the Stack?

Identify the bottleneck before you build anything. Optimizing for the wrong tier wastes resources and produces results that don’t compound.

Three tools give a practitioner a complete picture:

MVKG health scoring — proactive. The Minimum Viable Knowledge Graph scorecard maps the entity structure across its five nodes and surfaces where the architecture is thin. A weak entity structure is the best predictor of inference failure — because the machine can only recall and infer accurately from what’s been built for it to find. Run this before strategy begins.

Inference output analysis — reactive. Ask the machine questions about the brand without web search enabled. What it constructs from memory alone tells you what’s encoded, what’s missing, and what it’s getting wrong. Wrong identity recall → Legibility gap. Hedged credibility signals → Eligibility gap. Misrepresented capabilities → Recommendability gap.

Hedge signals mapping — confirmatory. Track the language AI systems use when they respond to queries about the brand. Hedging language — she claims, according to their website, may vary by source — is the machine’s confidence score made visible. Where hedging appears tells you which tier the machine doesn’t yet trust. Consistent hedging on identity signals → Legibility work remains. Consistent hedging on expertise claims → Eligibility work remains. Absence of hedging with weak category presence → Recommendability is the bottleneck.

Blind Spot Accounting

Every machine has a knowledge cutoff. Every industry has a velocity. Where those two things diverge is where strategies fail.

A practitioner building visibility strategy needs to estimate two things before the work begins: how far the machine’s knowledge lags behind current reality for this specific industry, and how resistant the machine is to being corrected through the retrieval layer in this specific space.

For stable industries, training data and current reality are close enough that recall is a reliable base. For fast-moving industries, the cadence gap may be severe enough that retrieval has to do the heavy lifting — not just for discoverability, but for accuracy. The machine may be confidently recommending a competitor that pivoted, a tool that was acquired, a regulation that changed, a company that no longer exists.

Blind spot accounting isn’t a fully formed methodology yet. But it’s a real diagnostic step — probing the machine’s knowledge cutoff for the relevant topic cluster, estimating the lag, and building retrieval signals specifically to compensate for what the machine doesn’t know it doesn’t know. The goal isn’t to make the machine omniscient. It’s to make sure that when the machine reaches for information about this brand in this industry, what it finds is current enough to be useful and accurate enough to be trusted.

A visibility strategy built without accounting for the machine’s blind spots is a strategy built on an incomplete map.

The Through-Line

Three retrieval failures tell you what’s breaking. Three LLM behaviors tell you what you’re optimizing for. Three visibility tiers tell you where you’re going.

Each is a different altitude of the same system. The fetch tool piece is ground level — field diagnostics. The behaviors piece is the terrain map — understanding the environment you’re navigating. The tiers piece is the destination map — knowing what winning looks like before you start.

They’re designed to be read in sequence. They interlink. And they build on the one piece that started all of it — the canonical bio — which is the first brick every other layer rests on.

Read in sequence

sorilbran.com · Five-Talent Strategy House · 2026

About the Author

Sorilbran Stone

AI Visibility Engineer and founder of Five-Talent Strategy House in Detroit. She helps founders and marketing teams build the infrastructure that gets them found — and accurately represented — in AI-generated answers.