Research & Think Tanks GEO Case Study

The challenge

Public policy research consumption has shifted rapidly from traditional keyword searches to multi-criteria, analytical conversational queries. Instead of searching for "urban housing policy report 2026," target audiences—such as legislative aides, journalists, and academic researchers—now ask AI engines to synthesize competing viewpoints on complex topics.

To surface in these answers, content must be optimized for an AI Knowledge Graph. Think of a Knowledge Graph as a massive web of interconnected real-world entities (people, places, organizations, and concepts) and the explicit relationships between them. Traditional SEO optimizes for isolated keywords; GEO optimizes for these relational connections. If an AI engine cannot definitively map your scholar to a specific policy concept, your organization's research will not exist in the generated summary.

What target audiences are asking AI

"What are the primary economic arguments against implementing a local vacancy tax, and which think tanks have published data on this in the last two years?"
"Summarize the consensus among non-partisan research groups regarding the long-term infrastructure funding gap in California."
"Give me a breakdown of recent policy recommendations for grid modernization that account for regional supply chain constraints."

Conversational engines do not look for exact keyword matches to generate an answer. They scan their internal index for semantically rich nodes that directly answer the user's multi-layered intent. If your content forces the model to guess the relationship between a policy recommendation and your institution, it will simply cite a competitor that states the relationship clearly.

Baseline GEO audit

Diagnostic prompts were run across multiple LLMs to evaluate the visibility, accuracy, and citation frequency of the organization's research portfolio. Ratings are illustrative, not measured.

Audit category	ChatGPT	Gemini	Claude	Perplexity
AI Visibility	Weak	Moderate	Weak	Moderate
Entity Clarity	Moderate	Moderate	Weak	Moderate
Program/Service Pages	Missing	Missing	Missing	Missing
FAQ Content	Weak	Weak	Missing	Moderate
Trust Signals	Moderate	Strong	Moderate	Strong
Expert Profiles	Weak	Moderate	Weak	Weak

ChatGPT

AI Visibility: Weak
Entity Clarity: Moderate
Program/Service Pages: Missing
FAQ Content: Weak
Trust Signals: Moderate
Expert Profiles: Weak

Gemini

AI Visibility: Moderate
Entity Clarity: Moderate
Program/Service Pages: Missing
FAQ Content: Weak
Trust Signals: Strong
Expert Profiles: Moderate

Claude

AI Visibility: Weak
Entity Clarity: Weak
Program/Service Pages: Missing
FAQ Content: Missing
Trust Signals: Moderate
Expert Profiles: Weak

Perplexity

AI Visibility: Moderate
Entity Clarity: Moderate
Program/Service Pages: Missing
FAQ Content: Moderate
Trust Signals: Strong
Expert Profiles: Weak

The audit revealed that while the organization possessed high foundational trust signals due to legacy academic backlink footprints, its core insights were functionally invisible to AI synthesis. Perplexity and Gemini occasionally cited the website when explicitly pushed via deep-dive prompts, but ChatGPT and Claude completely omitted the tank's findings in broader policy summaries. The root cause was architectural: because the organization's insights were trapped behind generic introduction paragraphs or deep within PDF files, the models could not accurately extract the exact policy positions or credit them to the institution's experts.

Key issues found

Three architectural problems explained why the research was invisible to AI synthesis.

1

Locked narrative architecture (PDF-only insight)

The core value of the organization—its data tables, policy recommendations, and methodology—was nested entirely inside multi-page PDF documents. The corresponding HTML landing pages contained only brief, stylized marketing copy and a download button. While modern LLMs can parse PDFs, doing so requires significantly higher computational cost during indexing; LLMs prioritize high-quality, structured HTML text on the primary page canvas. By hiding the substance in a download link, the organization was effectively filtering itself out of the AI training and retrieval pipelines.
2

Disconnected scholar entities

Scholar bio pages were treated as simple corporate "About Me" pages. They listed names and narrative text but lacked structural connections to the reports those scholars authored or the specific policy domains they specialized in. If an AI model cannot programmatically verify that a report on economic policy was written by a credentialed economist with a verifiable publication history, it down-ranks the content's reliability and avoids citing it as an authoritative source.
3

Vague, indirect copywriting

Page introductions relied on academic jargon or generalized, narrative throat-clearing (e.g., "In an era of unprecedented change, addressing the complex challenges of our communities requires a multifaceted approach to policy valuation..."). When a model scans a page for a direct answer to a prompt, vague prose dilutes the primary entities. The model abandons the text in favor of sources that use clear, direct, and explicit subject-verb-object structures.

Recommended GEO improvements

The fixes that turn flat, unstructured documents into machine-readable semantic content.

Transforming text for direct answers

Before

"Our latest comprehensive policy brief dives deep into the ongoing challenges facing municipal transit frameworks in the post-pandemic landscape. Through rigorous data collection and stakeholder engagement, our research team analyzes the systemic issues that continue to impact operational sustainability across various metropolitan corridors, offering a path forward for local leadership looking to make a meaningful difference."

After

"This policy brief analyzes municipal transit funding deficits in three mid-sized California cities from 2024 to 2026," with explicit key findings (an average 34% operational funding deficit; service frequency on high-density routes down 12%) and named policy recommendations.

Why we chose it: LLMs evaluate the first few hundred words of an HTML document to determine semantic relevance. Structuring the top of the page around direct answers ensures RAG systems can easily extract clean snippets for user answers.

Implementing AboutPage and Scholar schema

Before

A standard HTML bio block: a name, a job title, and a narrative paragraph with no machine-readable links between the scholar, their credentials, and the reports they authored.

After

Custom JSON-LD on every scholar and report page, explicitly connecting the researcher entity to academic credentials, publications, and institutional affiliations (see the code below).

Why we chose it: Schema provides an unambiguous map for AI knowledge graphs. By explicitly defining relationships, you remove the ambiguity that causes AI models to ignore unverified content.

Converting PDF data into semantic HTML tables

Before

Critical data points, charts, and statutory comparisons hidden inside downloadable PDFs—or, worse, pasted onto pages as JPEG screenshots of charts.

After

The same data rebuilt directly on the report's webpage as clean, semantic HTML tables with clear row and column headers.

Why we chose it: When a user asks for a comparative data breakdown, an engine is far more likely to extract data from a cleanly formatted webpage table than to parse a visual chart or a table buried inside a download. Text-based extraction from clean HTML tables remains significantly more reliable for citation generation.

Scholar schema example

The backend markup that connects a researcher entity to their credentials, employer, and policy domains.

ProfilePage JSON-LD for a scholar

JSON

{
  "@context": "https://schema.org",
  "@type": "ProfilePage",
  "mainEntity": {
    "@type": "Person",
    "name": "Dr. Aris Vance",
    "jobTitle": "Senior Research Fellow",
    "worksFor": {
      "@type": "ResearchOrganization",
      "name": "The Public Policy Institute"
    },
    "alumniOf": {
      "@type": "CollegeOrUniversity",
      "name": "University of Michigan"
    },
    "knowsAbout": [
      "Energy Infrastructure",
      "Grid Modernization",
      "Public Policy"
    ]
  }
}

Defining worksFor, alumniOf, and knowsAbout gives an engine verifiable authority signals it can map to specific policy concepts.

Common questions

What people ask most about making research citable by AI.

Why were the think tank's insights invisible to AI synthesis despite strong trust signals?

The organization's core value—its data tables, policy recommendations, and methodology—was nested entirely inside multi-page PDF documents, while the HTML landing pages held only brief marketing copy and a download button. LLMs prioritize high-quality, structured HTML text on the primary page canvas, so hiding the substance behind a download link effectively filtered the organization out of AI retrieval pipelines.

Why does scholar entity authority matter for AI citation?

AI engines evaluate information quality using authority, expertise, and trust signals. If a model cannot programmatically verify that a report was written by a credentialed expert with a verifiable publication history, it down-ranks the content's reliability and avoids citing it as an authoritative source.

Are clean HTML tables better than chart images for AI extraction?

Yes. While multimodal models can technically read images, text-based data extraction from clean HTML tables remains significantly more reliable for LLM retrieval pipelines and citation generation. Pasting a JPEG screenshot of a data chart does not make that data dependably visible to AI.

Continue learning

Related guides to take you deeper.

Optimization

How a Research Tank Improved Its AI Visibility for Policy Reports

Industry

Organization Size

Team Size

Difficulty

Estimated Timeline

The challenge

Baseline GEO audit

Key issues found

Locked narrative architecture (PDF-only insight)

Disconnected scholar entities

Vague, indirect copywriting

Recommended GEO improvements

Transforming text for direct answers

Implementing AboutPage and Scholar schema

Converting PDF data into semantic HTML tables

Scholar schema example

Common questions

Continue learning

Structured Data for Beginners

Improve Staff Profiles

Organization Schema