How a Research Tank Improved Its AI Visibility for Policy Reports
A non-partisan policy institute produced high-quality, peer-reviewed analysis but found its research omitted from AI-generated synthesis and literature summaries. This is a composite, illustrative scenario built to demonstrate the GEO method—not a real client engagement.
Updated June 9, 2026•Hypothetical scenario
Profile details
Industry
Think Tanks & Public Policy Research
Organization Size
Mid-Sized
Team Size
Solo Communications Manager (part-time web support)
Difficulty
Moderate
Estimated Timeline
Variable
The organization profile at a glance
The situation
What was the problem?
The organization produced high-quality, peer-reviewed public policy analysis but found its research omitted from AI-generated synthesis and literature summaries.
When journalists and policymakers used conversational search engines to summarize current policy debates, the tank's foundational insights were routinely credited to larger competitors or missed entirely.
The challenge
Public policy research consumption has shifted rapidly from traditional keyword searches to multi-criteria, analytical conversational queries. Instead of searching for "urban housing policy report 2026," target audiences—such as legislative aides, journalists, and academic researchers—now ask AI engines to synthesize competing viewpoints on complex topics.
To surface in these answers, content must be optimized for an AI Knowledge Graph. Think of a Knowledge Graph as a massive web of interconnected real-world entities (people, places, organizations, and concepts) and the explicit relationships between them. Traditional SEO optimizes for isolated keywords; GEO optimizes for these relational connections. If an AI engine cannot definitively map your scholar to a specific policy concept, your organization's research will not exist in the generated summary.
What target audiences are asking AI
"What are the primary economic arguments against implementing a local vacancy tax, and which think tanks have published data on this in the last two years?"
"Summarize the consensus among non-partisan research groups regarding the long-term infrastructure funding gap in California."
"Give me a breakdown of recent policy recommendations for grid modernization that account for regional supply chain constraints."
Conversational engines do not look for exact keyword matches to generate an answer. They scan their internal index for semantically rich nodes that directly answer the user's multi-layered intent. If your content forces the model to guess the relationship between a policy recommendation and your institution, it will simply cite a competitor that states the relationship clearly.
Baseline GEO audit
Diagnostic prompts were run across multiple LLMs to evaluate the visibility, accuracy, and citation frequency of the organization's research portfolio. Ratings are illustrative, not measured.
Audit category
ChatGPT
Gemini
Claude
Perplexity
AI Visibility
Weak
Moderate
Weak
Moderate
Entity Clarity
Moderate
Moderate
Weak
Moderate
Program/Service Pages
Missing
Missing
Missing
Missing
FAQ Content
Weak
Weak
Missing
Moderate
Trust Signals
Moderate
Strong
Moderate
Strong
Expert Profiles
Weak
Moderate
Weak
Weak
ChatGPT
AI Visibility
Weak
Entity Clarity
Moderate
Program/Service Pages
Missing
FAQ Content
Weak
Trust Signals
Moderate
Expert Profiles
Weak
Gemini
AI Visibility
Moderate
Entity Clarity
Moderate
Program/Service Pages
Missing
FAQ Content
Weak
Trust Signals
Strong
Expert Profiles
Moderate
Claude
AI Visibility
Weak
Entity Clarity
Weak
Program/Service Pages
Missing
FAQ Content
Missing
Trust Signals
Moderate
Expert Profiles
Weak
Perplexity
AI Visibility
Moderate
Entity Clarity
Moderate
Program/Service Pages
Missing
FAQ Content
Moderate
Trust Signals
Strong
Expert Profiles
Weak
The audit revealed that while the organization possessed high foundational trust signals due to legacy academic backlink footprints, its core insights were functionally invisible to AI synthesis. Perplexity and Gemini occasionally cited the website when explicitly pushed via deep-dive prompts, but ChatGPT and Claude completely omitted the tank's findings in broader policy summaries. The root cause was architectural: because the organization's insights were trapped behind generic introduction paragraphs or deep within PDF files, the models could not accurately extract the exact policy positions or credit them to the institution's experts.
Key issues found
Three architectural problems explained why the research was invisible to AI synthesis.
1
Locked narrative architecture (PDF-only insight)
The core value of the organization—its data tables, policy recommendations, and methodology—was nested entirely inside multi-page PDF documents. The corresponding HTML landing pages contained only brief, stylized marketing copy and a download button. While modern LLMs can parse PDFs, doing so requires significantly higher computational cost during indexing; LLMs prioritize high-quality, structured HTML text on the primary page canvas. By hiding the substance in a download link, the organization was effectively filtering itself out of the AI training and retrieval pipelines.
2
Disconnected scholar entities
Scholar bio pages were treated as simple corporate "About Me" pages. They listed names and narrative text but lacked structural connections to the reports those scholars authored or the specific policy domains they specialized in. If an AI model cannot programmatically verify that a report on economic policy was written by a credentialed economist with a verifiable publication history, it down-ranks the content's reliability and avoids citing it as an authoritative source.
3
Vague, indirect copywriting
Page introductions relied on academic jargon or generalized, narrative throat-clearing (e.g., "In an era of unprecedented change, addressing the complex challenges of our communities requires a multifaceted approach to policy valuation..."). When a model scans a page for a direct answer to a prompt, vague prose dilutes the primary entities. The model abandons the text in favor of sources that use clear, direct, and explicit subject-verb-object structures.
Recommended GEO improvements
The fixes that turn flat, unstructured documents into machine-readable semantic content.
Transforming text for direct answers
Before
"Our latest comprehensive policy brief dives deep into the ongoing challenges facing municipal transit frameworks in the post-pandemic landscape. Through rigorous data collection and stakeholder engagement, our research team analyzes the systemic issues that continue to impact operational sustainability across various metropolitan corridors, offering a path forward for local leadership looking to make a meaningful difference."
After
"This policy brief analyzes municipal transit funding deficits in three mid-sized California cities from 2024 to 2026," with explicit key findings (an average 34% operational funding deficit; service frequency on high-density routes down 12%) and named policy recommendations.
Why we chose it: LLMs evaluate the first few hundred words of an HTML document to determine semantic relevance. Structuring the top of the page around direct answers ensures RAG systems can easily extract clean snippets for user answers.
Implementing AboutPage and Scholar schema
Before
A standard HTML bio block: a name, a job title, and a narrative paragraph with no machine-readable links between the scholar, their credentials, and the reports they authored.
After
Custom JSON-LD on every scholar and report page, explicitly connecting the researcher entity to academic credentials, publications, and institutional affiliations (see the code below).
Why we chose it: Schema provides an unambiguous map for AI knowledge graphs. By explicitly defining relationships, you remove the ambiguity that causes AI models to ignore unverified content.
Converting PDF data into semantic HTML tables
Before
Critical data points, charts, and statutory comparisons hidden inside downloadable PDFs—or, worse, pasted onto pages as JPEG screenshots of charts.
After
The same data rebuilt directly on the report's webpage as clean, semantic HTML tables with clear row and column headers.
Why we chose it: When a user asks for a comparative data breakdown, an engine is far more likely to extract data from a cleanly formatted webpage table than to parse a visual chart or a table buried inside a download. Text-based extraction from clean HTML tables remains significantly more reliable for citation generation.
Scholar schema example
The backend markup that connects a researcher entity to their credentials, employer, and policy domains.
Defining worksFor, alumniOf, and knowsAbout gives an engine verifiable authority signals it can map to specific policy concepts.
Common questions
What people ask most about making research citable by AI.
Why were the think tank's insights invisible to AI synthesis despite strong trust signals?
The organization's core value—its data tables, policy recommendations, and methodology—was nested entirely inside multi-page PDF documents, while the HTML landing pages held only brief marketing copy and a download button. LLMs prioritize high-quality, structured HTML text on the primary page canvas, so hiding the substance behind a download link effectively filtered the organization out of AI retrieval pipelines.
Why does scholar entity authority matter for AI citation?
AI engines evaluate information quality using authority, expertise, and trust signals. If a model cannot programmatically verify that a report was written by a credentialed expert with a verifiable publication history, it down-ranks the content's reliability and avoids citing it as an authoritative source.
Are clean HTML tables better than chart images for AI extraction?
Yes. While multimodal models can technically read images, text-based data extraction from clean HTML tables remains significantly more reliable for LLM retrieval pipelines and citation generation. Pasting a JPEG screenshot of a data chart does not make that data dependably visible to AI.