Authority
The Shortlist Index

Methodology

What we test, how we score, what we publish, and what we keep internal.

The Shortlist Index measures how often B2B SaaS companies are cited by name when AI tools answer software recommendation questions.

We test each company against a standardized set of buyer-intent questions across five AI engines: ChatGPT (OpenAI gpt-4o), Perplexity (sonar), Gemini (gemini-2.5-pro), Claude (claude-sonnet-4-6), and Google AI Overviews. Questions span category-level searches ("best CRM for a small team"), problem-led searches ("how do I reduce churn?"), and direct comparisons ("HubSpot vs Pipedrive").

Each company receives a Shortlist Score from 0 to 100. The score reflects four factors: how often the company is mentioned, how high it ranks in AI responses, how many of the five engines mention it, and how well its website content is structured for AI extraction.

We update every category weekly. Scores are based on the prior 7 days of testing. We do not accept payments to improve a ranking, do not adjust rankings based on advertising relationships, and do not allow companies to submit corrections to their scores. Scores are determined entirely by what the AI engines say — we are the measurement layer, not the influence layer.

What we publish

  • The full leaderboard for every active category (top 50 by score).
  • Per-engine scores for the top 10 in each category.
  • Week-over-week deltas (▲ gainers, ▼ losers).
  • Updated weekly, every Monday morning.

What stays internal

  • The exact prompt set per category.
  • The weighting formula across the four scoring factors.
  • The parsing logic that extracts brand names from AI responses.
  • The full prompt response archive (raw text from every engine, every run).

We keep these internal so the Index can't be gamed by optimizing specifically for our test set. Companies that want to improve their score must improve their general AI visibility — which is the correct outcome.

Anti-gaming policies

  1. No paid placements.Authority does not accept payment for higher rankings. If a retainer client's score improves, it's because their AI visibility actually improved — verifiable by anyone running the same prompts.
  2. No score submissions.Companies can't edit their own scores. Scores reflect what AI tools say about them, period.
  3. Rate-limited testing.We never query a company's site faster than 1 request per 10 seconds. No company can detect and temporarily boost their AI footprint to influence a run.
  4. Randomized prompt order. Prompts are run in randomized sequence per engine, per week.
  5. Public WoW deltas. Any unexplained 30-point jump is visible to every other company in the category.
  6. Client scores are not excluded.Authority retainer clients appear in the Index normally. If a client's score improves, that's public proof the work moves the needle — which is a marketing advantage, not a conflict.

Company sourcing

For each category, we source candidate companies from G2 top 50 + Capterra top 50, deduplicated, then filtered to a $500K–$500M ARR band via Crunchbase and LinkedIn headcount cross-references. We add up to 5 manual entries per category for emerging tools with active community presence (Hacker News mentions, Reddit threads, significant Twitter/X following).

Target count per category: 20–30 companies. Below 20 isn't credible as a leaderboard. Above 35 isn't scannable.

How we run the weekly job

Every Sunday at 02:00 UTC, our cron job runs the prompt suite across all 5 engines for every company in every active category. Monday morning, scores update. Anomaly checks (any 20+ point WoW change, any engine returning empty for 5+ companies) hold the publish until manually reviewed. We never partially publish — either the full category updates or it doesn't.

Errors and fail-modes

  • If a single engine fails for a company, we skip that engine for the week and compute the score on the remaining four. The company's record gets a "data partial" flag.
  • If two or more engines fail, we skip the company that week and carry forward the prior score.
  • If the cron itself fails, we alert via Sentry and retry. Existing scores stay in place.

What "cited" actually means

A citation = the brand name is mentioned by name in the AI engine's response, with our parsing logic confirming the mention is the brand and not an unrelated entity. Soft mentions (e.g. "tools like X, Y, and Z") count. Indirect mentions ("the leading provider in the space") don't — citations require a name.

Versioning

The methodology version is currently v1.0. We'll publish a changelog when we revise scoring weights, add engines, or change the company sourcing process. Existing weekly snapshots are preserved at the version they were generated under.

Questions about your score?

Run your free Shortlist Scorefor the same audit applied to your specific domain and category. That gives you the breakdown that the public Index leaderboard doesn't show.