AI Search

How ChatGPT Finds Information

ChatGPT doesn't crawl the web on its own — it blends what it learned during training with live searches through Bing's index, reading pages in fragments before writing an answer. Knowing how that process works helps any organization make sure it gets found, and described accurately.

By Matt Updated July 9, 2026

Quick answer

How does ChatGPT find the information in its answers?

ChatGPT answers most questions from what it learned during training. When a question needs current information — recent news, local details, or anything past its training cutoff — it runs a live web search: it queries Microsoft's Bing index, opens a shortlist of pages, reads them in chunks looking for the passage that answers the question, and writes a new answer with links back to its sources.

That means ChatGPT doesn't crawl the internet independently — it depends on Bing to find candidate pages in the first place, and it only reads the pieces of each page that seem relevant rather than the whole thing. Both of those details matter for anyone who wants their own organization's information found and cited correctly.

Why this matters for your organization

Traditional search hands someone a list of links and lets them click through. ChatGPT skips that step: it reads several sources itself and hands back one written answer, often with no click to any website at all. When someone asks "does the library still offer free tax prep for seniors?" or "what's the fastest way to reach the county permits office?", the answer they get back — accurate or not — may be the only impression they ever form of your organization.

That shift matters most for small teams. A national brand has PR staff and paid monitoring tools watching what AI tools say about it. A county library, a regional food bank, or a five-person accounting firm usually has neither, which makes understanding how ChatGPT actually finds and reads content one of the few free levers available for making sure it describes you correctly. For a first look at whether it already does, see how to check your organization in ChatGPT. The broader shift this is part of is covered in what is AI search? and GEO vs AEO vs SEO.

Key concepts and terms

The handful of terms that explain how the system actually works.

Training knowledge vs. live search: Some of what ChatGPT "knows" is baked into the model itself from training — general facts, language, and reasoning patterns. The rest comes from a live web search run at the moment you ask, which is how it reaches anything newer than its training cutoff.
Retrieval-Augmented Generation (RAG): The technical name for the process above: the system fetches outside information before it drafts a response, rather than answering purely from what it already "remembers." It's the reason ChatGPT can cite sources instead of just asserting facts.
Bing's web index: ChatGPT does not crawl the live internet on its own. Its search feature queries Microsoft's Bing index to find candidate pages — as of mid-2026, a Seer Interactive analysis of 500+ citations found 87% matched Bing's own top search results. If a page isn't in Bing's index, ChatGPT generally can't find it.
Sliding-window chunk retrieval: Rather than reading an entire page top to bottom, ChatGPT scans it in overlapping segments, or "chunks," looking for the one that answers the question. It's efficient, but it means an important detail placed far from the relevant chunk can get missed. (More on structuring pages around this: what are entities?)

How ChatGPT finds information, step by step

The path from a question to a cited answer, in five stages.

1

Deciding whether to search at all

When you ask a question, ChatGPT first judges whether its training data is enough to answer it well. If the question needs current or local information — "what's the fastest way to renew a business license in [county]?" — it rewrites your conversational question into a more search-friendly query before doing anything else.
2

Getting a shortlist from Bing

That query goes to Microsoft's Bing index — OpenAI does not operate an independent search index of its own. Bing returns a list of candidate pages with basic metadata. At this stage, ChatGPT hasn't actually read any of the pages yet; it's judging relevance from titles and snippets alone.
3

Selecting and reading pages

ChatGPT drops the clearly irrelevant results and picks a smaller set of promising pages. Using its OAI-SearchBot crawler, it fetches those pages and strips out the layout, ads, navigation menus, and scripts, leaving just the raw text behind.
4

Reading in chunks, not whole pages

Long pages can overwhelm what the model can hold in mind at once, so it scans the text in a sliding window, looking for the segment that most directly answers the question. That segment gets pulled into its working context; the rest of the page is set aside.
5

Writing the answer and citing sources

Finally, ChatGPT combines the retrieved text with your original question and writes a new answer in its own words, with numbered citations linking back to the pages it used — so you can check the source yourself.

Training knowledge vs. live search

The two kinds of knowledge ChatGPT draws on, and why the difference matters.

Dimension	Training Knowledge	Live Search
What it is	Facts, language, and reasoning patterns baked into the model during training	Information retrieved from the live web through Bing at the moment you ask
Why it matters	Fixed as of a "knowledge cutoff" date — it doesn't update on its own	Makes the system current and able to answer past that cutoff
Example	Knowing that the U.S. Capitol is in Washington, D.C.	Finding whether the county clerk's office is open this Saturday

Training Knowledge

What it is: Baked into the model during training
Why it matters: Fixed; doesn't update after the cutoff date
Example: Knowing D.C. is the U.S. capital

Live Search

What it is: Retrieved from the live web via Bing at query time
Why it matters: Current and accurate past the cutoff
Example: Finding this Saturday's clerk's-office hours

What this looks like in practice

Two scenarios showing how the retrieval process helps — or hurts — real organizations.

The clinic described as closed

What happened

A community health clinic's sliding-scale dental program is alive and well, but a user asking ChatGPT about it gets told the program was discontinued. The clinic's current hours and eligibility rules are buried in the middle of a long PDF, below an old news mention that the model's retrieval chunk happens to land on instead.

What fixed it

The clinic adds a short, plain-text program page with the current hours and eligibility stated in the first two sentences, dated at the top. Within a few weeks, ChatGPT's answers about the program come from the new page instead of the stale one.

The lesson: the sliding window rewards whichever chunk answers the question most directly — not necessarily the newest or most authoritative one.

The library that vanished from event answers

What happened

A county library runs a full weekend events calendar, but asking ChatGPT "what's happening at the library this weekend?" turns up nothing about it. The calendar only renders after a JavaScript widget loads in the browser — invisible to the plain-text crawler behind ChatGPT's search feature.

What fixed it

The library adds a simple, server-rendered list of the week's events in plain HTML beneath the interactive calendar. The next time someone asks, ChatGPT finds and cites the library's own page.

The lesson: content that only exists inside JavaScript, a login, or a paywall is functionally invisible to ChatGPT's search feature, no matter how good it is.

What this gets right, and where it breaks down

A balanced look at the trade-offs of AI-driven information retrieval.

What this gets right

Fewer invented facts. Grounding an answer in real, retrieved text gives the model something to summarize instead of guessing, which reduces (but doesn't eliminate) the chance of a made-up answer.
Current information. A live search lets ChatGPT answer questions its training data alone never could — this week's hours, this month's news, this year's requirements.
One answer instead of ten tabs. Instead of a user opening several websites themselves, ChatGPT reads a handful of sources at once and summarizes what they agree — and disagree — on.
A visible source trail. Inline citations let a reader click through and check where an answer came from, the same way they'd check a footnote.

Where this breaks down

It depends on someone else's index. ChatGPT's search feature currently pulls from Microsoft's Bing index rather than crawling the web on its own. If a page isn't indexed by Bing, or a site has blocked OpenAI's search crawler, ChatGPT generally won't find or cite it, no matter how good the content is.

It reads in pieces, not as a whole. Because of the sliding-window approach described above, an important qualification placed far from the specific answer a user asked for can get missed entirely, producing a technically-sourced but incomplete answer.

It can't see behind logins, paywalls, or heavy JavaScript. Content locked behind a members-only portal, a paywall, or a page that only renders after a script runs is invisible to it — even if it's the single most authoritative answer available. (See can AI tools see my website? to check your own.)

Make your content easier for ChatGPT to find

Five concrete changes, roughly in order of impact.

Put the direct answer at the top20 min State the answer in two or three plain sentences right under the heading, before any backstory. ChatGPT's chunking favors pages that answer clearly up front.
Use real, descriptive headings30 min Genuine H2/H3 headings, not bolded paragraph text, tell the sliding-window scraper — and human skimmers — exactly what each section covers. See structured data for beginners for the next step.
Put key facts in plain text, not just PDFs or imagesvaries Hours, prices, and eligibility rules that only exist inside a scanned PDF or an image are invisible to ChatGPT's crawler. Repeat the same information in plain HTML text somewhere on the page too.
Confirm ChatGPT's crawler can reach your site15 min Check your robots.txt file for any rule blocking OAI-SearchBot — the crawler OpenAI documents as powering ChatGPT's search feature. That's separate from GPTBot, which only governs AI training. Use can AI tools see my website? to check yours.
Ask ChatGPT your own audience's questions30 min Try the handful of questions people actually ask about your organization and see what it says. For an ongoing routine instead of a one-time check, see how to measure AI visibility.

One sourcing note: the Seer Interactive analysis cited above is directional — a sample of 500+ citations from February 2025 — and this kind of finding can shift as OpenAI's technology changes, so treat "Bing drives ChatGPT's citations" as a current pattern rather than a permanent rule.

Common questions

Does ChatGPT copy text directly from search engines?

No. ChatGPT reads the text returned from a web search, isolates the relevant passages, and writes a new answer in its own words. It only reproduces exact quotes when a user specifically asks for them.

Why might ChatGPT give outdated or inaccurate information about my organization?

Usually because the page it read is out of date, or because a recent update to your site has not yet been picked up by Bing's index, which is what ChatGPT's search feature relies on. It can also happen if ChatGPT decides its training data is "close enough" and skips a live search altogether.

Can we block ChatGPT from reading our website — or make sure it can?

Yes, either way. Your site's robots.txt file can allow or block OAI-SearchBot, the crawler ChatGPT's search feature uses to find and cite pages. That's a separate setting from GPTBot, which only affects whether your content can be used to train OpenAI's models. Most organizations that want to be found and cited should make sure OAI-SearchBot is allowed.

How does ChatGPT decide which websites to cite?

It starts with whatever pages Bing's index returns for the search query, then ChatGPT applies its own filtering on top — favoring pages that clearly match the question and are easy for its scraper to read and extract an answer from.

Does ChatGPT only cite big, well-known websites, or can a small organization be cited too?

Small organizations get cited regularly. ChatGPT cites whatever pages its search step retrieves and finds relevant to the question, not a fixed list of approved sources. A community clinic's own program page can outrank a national news article if it answers the specific question more directly and is easy for the crawler to read.

Continue learning

Related guides from the AI Search section, and a next step to check your own visibility.

AI Search

Get new guides as they're published

Subscribe and get a weekly email with new guides, tips, and important news affecting your AI search marketing. Unsubscribe at any time.

How ChatGPT Finds Information

Why this matters for your organization

Key concepts and terms

How ChatGPT finds information, step by step

Deciding whether to search at all

Getting a shortlist from Bing

Selecting and reading pages

Reading in chunks, not whole pages

Writing the answer and citing sources