Citations Without Clicks
A reference dossier on Answer Engine Optimization in 2026: what the category actually is, what its tools can and cannot measure, and how to decide whether to buy, build, or wait.

The category, honestly
Answer Engine Optimization, or AEO, is the practice of measuring and improving how a brand shows up inside AI-generated answers. The surfaces are no longer just Google's ten blue links. They include ChatGPT, Gemini, Perplexity, Claude, Google AI Overviews, Google AI Mode, Bing Copilot, and the rest of the systems that synthesize an answer instead of ranking pages. The unit of work moves with it. Traditional SEO optimizes pages for rankings and click-through. AEO monitors prompts, mentions, citations, source selection, sentiment, and share of voice inside answers that may never produce a click at all.
The why-now is real, but messier than vendors imply. Q2 2026 is not yet complete, so there is no clean number for the share of informational queries answered inline without a click. What exists is a stack of proxies.
Semrush tracked AI Overviews on 15.69% of all queries in November 2025, after a July peak of 24.61%, and found AI Overviews bleeding from informational into commercial, transactional, and navigational intent. Pew Research Center found that when an AI summary appeared, users clicked a traditional result in 8% of visits versus 15% without one, and clicked a link inside the summary in only 1%. Similarweb reported zero-click news searches climbing from 56% to 69% between May 2024 and May 2025. Ahrefs' December 2025 rerun of its AI Overviews study found a 58% lower click-through rate for the top-ranking page when an AI Overview was present.
Most operationally relevant: Similarweb reported AI platform visits up 28.6% year-over-year into January 2026, while AI referrals to external sites stayed flat. Attention is moving into AI interfaces faster than referral traffic is moving out of them.
The category-formation signal is also strong. HubSpot launched HubSpot AEO on April 14, 2026, at $50 per month. Profound sells Answer Engine Insights and Prompt Volumes, with prompt-volume data sourced from double opt-in consumer panels. AthenaHQ, Otterly.AI, Goodie, Bluefish, Ahrefs Brand Radar, Semrush AI Visibility Toolkit, and Adobe LLM Optimizer all ship some mix of monitoring, citation tracking, prompt-volume estimation, and recommendations.
The disagreement is not whether AI answers matter. It is whether AEO is a new discipline or a new label for overlapping SEO, digital PR, brand authority, content structure, and analytics work.
Mike King at iPullRank uses GEO, Generative Engine Optimization, and frames the target as technical infrastructure and authority signals. The original 2023 GEO research paper from Aggarwal and collaborators reported visibility gains up to 40% in benchmark settings. Aleyda Solis treats AEO as high-overlap with SEO but with different retrieval style, optimization targets, and metrics. SEOFOMO's practitioner survey found the naming itself unsettled, with only 4% of respondents using AEO as their preferred label.
The honest definition: AEO is an emerging measurement and content-operations layer for AI answer surfaces. It is not separate from SEO. It is also not reducible to it. It adds new reporting targets, new uncertainty, and new buyer questions.
The rest of this dossier explains what each step in the baseline is testing for. The tools section names what AEO vendors actually measure and what they cannot prove. The measurable-versus-noise section separates real signal from vendor framing. The buy/build/wait section names the threshold the baseline data has to clear.
What the tools actually do
Most AEO tools do four jobs. They run or observe prompts and record whether a brand appears. They parse answers for mentions, citations, sentiment, and competitors. They estimate prompt demand or generate prompt sets. They recommend content, PR, review, or technical changes intended to lift inclusion in future answers. The hard part is that these jobs sit at very different confidence levels. Counting a brand mention in a captured answer is measurable. Estimating the universe of real prompts is partly modeled. Claiming that a recommended change caused a citation lift is usually speculative unless the vendor shows controlled tests.
| Tool | Public price | What it measures | What it does not prove |
|---|---|---|---|
| HubSpot AEO | $50/month, $45 annual | Visibility, sentiment, prompt tracking, competitor and citation analysis across ChatGPT, Gemini, Perplexity | No published statistical confidence, sample-size methodology, or repeated-run variance handling |
| HubSpot AEO Grader | Free | One-shot brand sentiment, recognition, share of voice, source quality across OpenAI, Perplexity, Gemini | Measures broad characterization more than durable citation inclusion |
| Profound | Custom; plan prompt counts public | Answer Engine Insights monitors AI responses; Prompt Volumes estimates real prompts from double opt-in panels across ChatGPT, Gemini, Claude, Perplexity | Panel data still under-samples enterprise, B2B, and niche buyer behavior |
| AthenaHQ | Self-serve from $295/month | Visibility, competitor monitoring, citation intelligence, prompt-volume estimation, content optimization across nine engines | Credit math gets expensive at scale; proprietary citation engine and volume estimates need careful causal reading |
| Otterly.AI | $29 / $189 / $489 monthly | User-defined prompts run across ChatGPT, AI Overviews, Perplexity, Copilot; AI Mode and Gemini as add-ons | Reports what happened on the tracked prompt set, not whether the prompt set represents the buyer universe |
| Goodie | Demo-only; tiers by prompt count | Explorer, Pro, and Enterprise tiers cover 100 to 500+ prompts and 3,000 to 15,000+ AI responses across up to eleven engines | Price opacity makes ROI hard to assess pre-demo; "optimization actions" count is not business impact |
| Bluefish AI | Custom, demo-based | Enterprise AI monitoring, GEO measurement, AI optimization, AI commerce | Public pages expose little methodology; April 2026 $43M Series B is signal, not validation |
| Semrush AI Visibility Toolkit | $99/month | One domain, 25 prompts, 300 daily AI Analysis queries, 1,000 daily Prompt Research queries | Low prompt limit at base; extra domains, locations, and users compound cost |
| Ahrefs Brand Radar | From $199/month; custom prompts from $50 | AI mentions, custom prompts, search demand, web visibility across AI Overviews, AI Mode, ChatGPT, Perplexity, Copilot, Gemini, Grok | Search-backed prompt database scales, but custom prompts still face AI answer variance |
| Adobe LLM Optimizer | Custom annual; minimum 1,000 prompts | Daily prompt analysis, weekly trends, citations, mentions, recommendations, optional Auto-Optimize for AEM and CDN | Too heavy for SMBs; edge-tied recommendations need controlled validation |
Citation tracking and brand monitoring
This is the most mature use case. The tool runs a defined prompt set against the engines, captures the response, and records whether the brand, URL, competitor, or domain appears. Every serious vendor in the table does some version of this. The reliable measurement is narrow: for prompt set P, engine E, geography G, and date range D, how often did the brand appear, with which citations, and near which competitors. Anything broader is a generalization the tool cannot back.
Content optimization
Less mature. HubSpot recommends pages to create or update. AthenaHQ ships a citation engine. Goodie surfaces "optimization actions." Adobe ships Auto-Optimize for eligible Adobe Experience Manager and CDN setups. Semrush adds AI checks to its site audit. The measurable part is whether the recommendation was implemented and whether later answer captures changed. The speculative part is causality. Engines change, competitors change, models change, indexes update. A before-and-after chart is not enough.
Prompt-volume estimation
The most contested product claim in the category. Profound has the strongest public methodology because its Prompt Volumes data is sourced from real, anonymized, double opt-in panel conversations refreshed weekly across four engines. AthenaHQ surfaces numerical volume estimates. Semrush ships Prompt Research. HubSpot suggests prompts from company, competitor, and industry context. These help with prioritization. They are not search-volume equivalents. The universe is less observable, less standardized, and more private.
Competitive analysis
Every serious tool now offers share-of-voice comparison. The output is only as useful as the prompt set. "Who are the best CRM tools" is weak. "What CRM should a 70-person B2B SaaS company with HubSpot marketing and Salesforce sales use" is closer to a real buying journey. This is where operators should spend the most time editing prompts before trusting any dashboard.
What is measurable, and what is vendor noise
AEO measurement today is synthetic monitoring plus answer parsing. Pick prompts, pick engines, run on a schedule, capture answers, parse mentions and citations, report visibility, sentiment, share of voice, and source domains. Otterly says this directly. Profound says its prompts can be auto-generated, manually uploaded, or pulled from Prompt Volumes, and its responses come from a RAG-based monitor across ChatGPT, Perplexity, Microsoft Copilot, and AI Overviews.
That makes citation rate and mention frequency measurable, but only inside a declared test frame. A sound statement: "Your brand was mentioned in 12% of answers for this 400-prompt set, across these engines, in these regions, during these dates." A weak statement: "Your brand appears in 12% of relevant AI searches." The second requires knowing the universe of relevant prompts, the distribution of real user behavior, personalization, geography, temporal change, and model routing. Few vendors disclose enough to back it.
The biggest technical problem is non-determinism. SparkToro and Gumshoe ran 2,961 prompts across ChatGPT, Claude, and Google's AI search surfaces with repeated trials. The same brand list came back less than 1% of the time. The same list in the same order came back less than 0.1% of the time. This does not make AEO measurement useless. It makes single screenshots useless, and it makes rank-style reporting fragile. Visibility percentage across repeated runs is the credible unit. Exact answer position is not.
Vendor disclosures vary. Profound discloses the most on prompt-volume sourcing: panel basis, anonymization, multi-engine aggregation, weekly refresh, GDPR and CCPA language. HubSpot's free Grader discloses what data it sends to the engines and the five dimensions it scores, but its paid AEO product does not publish a sample-size model. Otterly discloses prompt-based monitoring, daily runs, and the reasons manual searches diverge from platform results: memory, history, location, personalization.
AthenaHQ, Goodie, Bluefish, Semrush, Ahrefs, and Adobe expose plan limits and feature classes, but rarely the statistical treatment of repeated runs, confidence intervals, prompt-set selection, or engine drift.
The second problem is the gap between visibility and influence. A citation in ChatGPT or AI Overviews can shape a buyer without producing a referral visit. Similarweb's flat-referrals-while-visits-grow chart is the warning. But influence on purchase is harder to measure than mention frequency. Teams need downstream signals: branded search lift, direct traffic, sales-call mentions, demo-form source notes, AI-referral sessions, assisted pipeline, and controlled content tests where possible.
The third problem is causality. If a vendor recommends a comparison page and the brand starts appearing more often two weeks later, the tool may be right. It may also be benefiting from a model update, fresh indexing, PR coverage, reviews, forum threads, competitor decay, or prompt-set changes. Citation rate is measurable with caveats. Purchase influence is measurable indirectly. Causal lift from any specific AEO action is the least measurable thing in the stack.
Buy, build, or wait
There are three rational paths. The wrong move is buying a tool because the dashboard looks like a rank tracker. AEO dashboards are not rank trackers with new labels. They are sampling instruments for a volatile answer layer.
Buy a tool now
Buy when AI-answer presence is already a competitive question. That usually means B2B SaaS, marketplaces, ecommerce, agencies, education, finance, travel, and any consumer-adjacent category where buyers ask "best X for Y" before a demo or a purchase. The buyer profile is a marketing team with real content capacity, named competitors, budget above the low hundreds per month, and enough volume to care.
HubSpot AEO fits HubSpot-centric SMB and mid-market. Otterly or Semrush fits SEO teams that want affordable baseline monitoring. AthenaHQ, Goodie, Profound, Bluefish, Adobe, and Ahrefs Brand Radar fit teams that need scale, deeper competitive intelligence, multi-engine coverage, enterprise controls, or prompt-volume data.
The threshold condition is simple: the team has to be able to act on the findings. If no one can update content, earn citations, fix technical access, improve review surfaces, or brief sales on the gaps, the tool becomes expensive theater.
Build internal monitoring
Build when the use case is narrow, the team has technical capacity, or the vendor price exceeds the decision value. A focused B2B company can start with 100 to 300 curated buyer prompts, split by journey stage, product, competitor, and use case.
The internal system runs prompts on a schedule where terms allow, captures answers, parses brand and competitor mentions, logs citations, and stores raw outputs. It should repeat prompts often enough to estimate variance, not snapshot once. Build is also right when the team needs methodology control, like enterprise-procurement-only prompts or industry-specific compliance questions.
The weakness is coverage. AI Overviews, AI Mode, ChatGPT, Perplexity, Gemini, Copilot, and Claude do not expose identical APIs or identical user experiences. Google reports AI feature traffic inside the normal Search Console "Web" type, with no separate AIO breakout. Internal builds are good baselines. They rarely match specialized vendor coverage out of the box.
Wait, but run a baseline
Wait when the category is not yet valuable enough for paid tooling. Brands with low recognition, weak content foundations, no clear competitor set, no marketing owner, or no evidence buyers use AI engines during discovery. Wait also when higher-return work is queued: crawlability, category pages, comparison content, reviews, case studies, knowledge-base hygiene.
Waiting is not ignoring. The minimum move is manual baseline testing: run the top 20 to 50 buyer prompts across ChatGPT, Perplexity, Gemini, and AI Overviews monthly, log whether the brand appears, record cited sources. If the brand is absent from every high-intent prompt and buyers mention AI research in calls, paid monitoring becomes easier to justify. If neither is true, the next dollar belongs in basic content and authority first.
The contested part is timing. AEO tools are early. The search shift is not fictional. Make the call on answer-visibility risk, buyer behavior, content capacity, and budget, not on vendor urgency.
Pitfalls and anti-patterns
Optimizing for citations that don't drive business value
A citation is not a conversion path. A brand can appear in low-intent answers and gain nothing. Separate awareness prompts from buying prompts. Track downstream: branded search, direct visits, AI-referral sessions, demo forms, sales-call notes, assisted pipeline. The Similarweb referrals plateau is the warning. AI influence can grow while referral traffic stays small.
Confusing AEO with SEO
The overlap is high but the target is different. Google says standard SEO best practices remain relevant for AI Overviews and AI Mode and that there are no extra technical requirements beyond being indexable and snippet-eligible. That does not mean nothing changed. Aleyda Solis names the practical differences: query fan-out, synthesis, content chunks, factual spans, citations, and mention-based metrics. Treat AEO as an extension of search and brand operations, not a replacement for SEO.
Over-investing in measurement before the category settles
Tool prices range from $29 a month to enterprise annual contracts. That is a sign of category immaturity, not just market segmentation. SEOFOMO's survey is a useful brake: practitioners report traditional SEO still drives more than 95% of SEO ROI in many contexts, while AI search traffic and revenue often sit at 5% or less.
Ignoring AEO because it feels speculative
The opposite error is assuming search traffic returns to its old shape. Pew, Ahrefs, Similarweb, and Semrush all point to a real shift in how users receive answers, even when they disagree on causality and magnitude. Waiting is fine. Blindness is not.
Buying vendor claims without methodology
Any visibility score that does not disclose prompt set, engine coverage, geography, refresh cadence, repeated-run strategy, and source-parsing rules is hard to act on. Ask: How were prompts selected? How many times are they run? Are runs personalized or neutral? Are citations, mentions, and recommendations separated? How are hallucinated citations handled? What is the variance over repeated runs? Without those answers, the score is a sales artifact.
Optimizing for one engine
ChatGPT, Gemini, Perplexity, Claude, Copilot, AI Overviews, and AI Mode behave differently. HubSpot's paid product covers three engines. AthenaHQ and Goodie list broader coverage by tier. Single-engine measurement can work for a narrow buyer base. It cannot stand in for AI-answer visibility as a category.
Treating answers as stable
SparkToro's repeated-prompt findings are the clearest warning. AI recommendations vary from run to run on the same prompt. Teams that chase exact answer order will waste time. Teams that track repeated visibility, source patterns, and prompt clusters will learn more.
What to validate before you spend
- List the 20 to 50 prompts your buyers would actually ask before purchase, split by problem, category, comparison, pricing, integration, risk, and competitor.
- Run them across ChatGPT, Gemini, Perplexity, and AI Overviews where available. Log brand, competitors, cited sources, sentiment accuracy, and whether the answer would help or hurt the buying journey.
- Validate that AI-answer visibility matters to your funnel: AI referrals, branded search change, sales-call notes, demo-form source fields, customer interviews.
- Confirm capacity. AEO data only matters if the team can update content, improve citations, earn third-party mentions, repair technical access, and measure downstream impact.
- Set the budget threshold before the demo. If the expected value of better AI visibility cannot beat internal monitoring and basic content fixes, wait.
The answer layer is a sampling problem
The answer layer is not a new ranking surface. It is a sampling problem dressed as a dashboard.
If your tool cannot tell you what its prompt set represents, how often it ran, and how much the answer drifted between runs, it is selling you confidence, not measurement.
Share
Methodology
This dossier reads every public product, pricing, and documentation page shipped by HubSpot, Profound, AthenaHQ, Otterly.AI, Goodie, Bluefish, Semrush, Ahrefs, and Adobe, and grades them against their own disclosed methodology. Vendors that publish a sample-size model, a repeated-run protocol, or a panel basis are credited. Vendors that ship a "visibility score" without one are not. The why-now is anchored in five independent datasets: Pew Research Center on click behavior under AI summaries, Ahrefs on click-through decay under AI Overviews, Similarweb on zero-click news searches and AI referrals, Semrush on AI Overview prevalence by query intent, and SparkToro and Gumshoe on repeated-prompt variance. The measurement critique is built on those five plus standard sampling logic, not vendor marketing. The category framing draws on the most rigorous practitioner work in the space: Mike King at iPullRank, Aleyda Solis, the SEOFOMO practitioner survey, and the 2023 GEO paper by Aggarwal and collaborators that introduced the academic frame. Where they disagree, the disagreements are named. Public prices are current as observed on April 29, 2026. No vendor demos, sandbox trials, or private references were used, and none were needed. Every claim above is sourced.
Sources
- HubSpot, Introducing HubSpot AEO (April 14, 2026)
- HubSpot AEO product page
- HubSpot, AEO Grader vs. Otterly.ai
- Profound, Answer Engine Insights
- Profound, Prompt Volumes
- AthenaHQ, Plans & Pricing
- Otterly.AI home
- Otterly.AI pricing
- Goodie pricing
- Bluefish AI
- Semrush, AI Visibility Toolkit
- Ahrefs Brand Radar
- Ahrefs Plans & Pricing
- Adobe LLM Optimizer pricing
- Adobe Experience League, LLM Optimizer docs
- Similarweb, Answer Engine Optimization: The Complete 2026 Guide
- Similarweb, Generative AI Statistics for 2026
- Pew Research Center, Google users are less likely to click on links when an AI summary appears (July 22, 2025)
- Ahrefs, Update: AI Overviews Reduce Clicks by 58% (February 4, 2026)
- Semrush, AI Overviews' Impact on Search in 2025
- Google Search Central, AI features and your website
- SparkToro, AIs are highly inconsistent when recommending brands or products (January 27, 2026)
- iPullRank, The Fall of the Blue Links and the Rise of GEO
- Aleyda Solis, The AI Search Content Optimization Checklist
- SEOFOMO, The State of AI Search Optimization, 2025 Edition
- Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan, Deshpande, GEO: Generative Engine Optimization (arXiv 2023, ACM SIGKDD 2024)
Tools Mentioned
- HubSpot AEO — Brand monitoring across ChatGPT, Gemini, and Perplexity at $50/monthHubSpot
- HubSpot AEO Grader — Free brand-perception check across OpenAI, Perplexity, and GeminiHubSpot
- Profound — Answer Engine Insights and Prompt Volumes with double opt-in panel dataProfound
- AthenaHQ — Multi-engine visibility, citation intelligence, and content optimization, from $295/monthAthenaHQ
- Otterly.AI — Affordable AI search monitoring across ChatGPT, AI Overviews, Perplexity, and Copilot, from $29/monthOtterly.AI
- Goodie — Closed-loop AEO platform with optimization actions and AI commerce visibilityGoodie
- Bluefish AI — Enterprise AI marketing suite for agentic commerce and brand monitoringBluefish
- Semrush AI Visibility Toolkit — SEO-team AEO monitoring at $99/monthSemrush
- Ahrefs Brand Radar — AI mention monitoring and custom prompt tracking, from $199/monthAhrefs
- Adobe LLM Optimizer — Enterprise prompt-based AI visibility with optional Auto-Optimize for AEM and CDNAdobe


