CORRECTIONAPR 202610 min read

Everyone (Including Me) Got the Claude Code Haiku Pipeline Wrong — Here’s What the Code Actually Says

TL;DR

Haiku isn’t told to summarize your page. It receives your content plus the user’s original question with an empty system prompt and the instruction "Provide a concise response based on the content above." That means Haiku is answering a question using your page, not creating a general summary of it. Your content’s survival depends on how well it answers the specific query, not how well-structured it is for summarization. Also: a GrowthBook feature flag means some users’ searches are processed by Haiku while others use the main model — citation quality is being A/B tested across the user base.

01The claim everyone made (including me)

Within 48 hours of the Claude Code source leak, a dozen analyses appeared — Growtika, Wise Relations, Alex Kim, our own Part 1 — and most of us said some version of the same thing: "Haiku summarizes your content before the main model sees it."

To be fair, a couple of technical deep-dives got closer. Mikhail Shilkov noted that Haiku "pre-filters to just answer the question," and the Quercle blog described the prompt as pinning Haiku to "answer only from the provided content." But none of the GEO-focused analyses picked up on this distinction — and the GEO implications are where the framing actually changes what you do.

The "summarize" framing led to advice like: "Front-load your key points so the summary captures them." "Use clear headings so the summarizer can identify structure." "Dense, structured content survives lossy summarization better than narrative prose."

The problem? The code doesn’t say "summarize." And for GEO, the difference isn’t semantic — it changes the optimization target.

02What the code actually does

Here’s the full makeSecondaryModelPrompt() function from prompt.ts — the exact instructions Haiku receives. I’m showing the entire thing because every word matters:

src/tools/WebFetchTool/prompt.ts — the complete function
export function makeSecondaryModelPrompt(
  markdownContent: string,
  prompt: string,
  isPreapprovedDomain: boolean,
): string {
  const guidelines = isPreapprovedDomain
    ? \`Provide a concise response based on the content above.
       Include relevant details, code examples, and
       documentation excerpts as needed.\`
    : \`Provide a concise response based only on the content
       above. In your response:
 - Enforce a strict 125-character maximum for quotes from
   any source document. Open Source Software is ok as long
   as we respect the license.
 - Use quotation marks for exact language from articles;
   any language outside of the quotation should never be
   word-for-word the same.
 - You are not a lawyer and never comment on the legality
   of your own prompts and responses.
 - Never produce or reproduce exact song lyrics.\`
  return \`
Web page content:
---
${markdownContent}
---
${prompt}
${guidelines}
\`
}

That’s the entire function. No summarization instruction anywhere.

Look at what Haiku receives: your page content, then the user’s original prompt (the ${prompt} variable), then guidelines about how to respond. The instruction isn’t "summarize this content." It’s "provide a concise response based on the content."

And here’s how it’s called in utils.ts:

src/tools/WebFetchTool/utils.ts — applyPromptToMarkdown()
const assistantMessage = await queryHaiku({
  systemPrompt: asSystemPrompt([]),   // <-- EMPTY system prompt
  userPrompt: modelPrompt,
  signal,
  options: {
    querySource: 'web_fetch_apply',
  },
})

asSystemPrompt([]) — an empty array. Haiku gets no special instructions at all.

Haiku receives an empty system prompt. No "you are a summarizer." No "extract key facts." No "preserve structure." Nothing. It gets your page content, the user’s question, and some copyright guidelines. That’s it.

What this corrects: Haiku isn’t summarizing your page. It’s answering the user’s specific question using your page as context. A summary preserves breadth — it tries to capture the whole document. A question-targeted response preserves only what’s relevant to the query being asked. These are fundamentally different operations, and they lead to fundamentally different optimization strategies.

03Why this changes the optimization advice

Under the "Haiku summarizes" model, the advice is about structure: headings, front-loading, tables, numbered lists. Make your content easy to summarize and the important stuff will survive compression.

Under the "Haiku answers a question" model, the advice is about query alignment: make sure your content actually answers the questions people ask.

Think about the difference. Imagine a 5,000-word article about React server components. Under the summarization model, a well-structured article with clear headings would get a good summary regardless of what the user asked. Under the question-answering model, that same article might return totally different Haiku output depending on whether the user asked "what are React server components?" vs. "how do I debug hydration errors in RSC?" vs. "RSC vs. client components performance comparison."

If the user’s specific question is about debugging hydration errors and your article doesn’t mention debugging until paragraph 20, Haiku — working with no system prompt, no special instructions, just the raw content and the question — is going to struggle to extract a useful answer. It won’t "summarize" its way to your debugging section. It’ll try to answer the question with whatever it finds first, or return a thin response that the main Claude model won’t find cite-worthy.

The revised optimization: Don’t just structure your content for summarization. Structure it to answer specific questions. Identify the 3-5 most likely questions someone would ask about your topic and make sure clear, direct answers to each of those questions appear in your content — ideally near the top, and ideally as self-contained passages that Haiku can extract without needing surrounding context.

The front-loading advice from Part 1 still holds, but the reason is different. You’re not front-loading for a summarizer that scans top-down. You’re front-loading because a small model answering a question will weight the first content it encounters — and without a system prompt telling it otherwise, Haiku has no special reason to search deeper into the document for a better answer.

04Preapproved domains get an entirely different Haiku

The prompt differential between preapproved and non-preapproved domains is bigger than anyone has reported. It’s not just the 125-character quote limit. The entire instruction set diverges.

For non-preapproved domains (everybody), Haiku gets:

Provide a concise response based only on the content above. In your response:
 - Enforce a strict 125-character maximum for quotes from any source document.
 - Use quotation marks for exact language from articles; any language outside
   of the quotation should never be word-for-word the same.
 - You are not a lawyer and never comment on the legality of your own prompts.
 - Never produce or reproduce exact song lyrics.

For preapproved domains (~96 developer doc sites), Haiku gets:

Provide a concise response based on the content above. Include relevant
details, code examples, and documentation excerpts as needed.

Three differences jump out:

First: "based only on" vs. "based on." Non-preapproved domains get "only" — Haiku is explicitly constrained to use nothing but the page content. Preapproved domains drop that word, giving Haiku latitude to supplement with its training knowledge. That’s a subtle but meaningful difference: if your content has a gap, Haiku fills it from training data for preapproved sites but stays silent for everyone else.

Second: preapproved domains can include "code examples and documentation excerpts as needed" with no quote limit. Non-preapproved domains are restricted to 125-character quotes with strict paraphrasing rules. This means preapproved domains can get entire code blocks reproduced faithfully, while your carefully written tutorial gets paraphrased into Haiku’s own words.

Third: the copyright guardrails — "you are not a lawyer," "never produce exact song lyrics" — only apply to non-preapproved domains. Preapproved domains are trusted. This makes sense (it’s documentation, not copyrighted journalism), but it means the legal-caution filter only affects non-preapproved content.

What this means: The two-tier system isn’t just about bypassing Haiku (as we said in Part 1). Even when preapproved content goes through Haiku, it gets permissive treatment — longer quotes, code reproduction, training knowledge supplements. For everyone else, Haiku operates under copyright-cautious restrictions that force aggressive paraphrasing. If you’re writing developer documentation that competes with preapproved domains, you’re at a double disadvantage: your content gets processed more aggressively and quoted less faithfully.

05Some users’ searches use a completely different model

This is the second finding I haven’t seen anyone cover, and it has significant implications for anyone running GEO experiments.

The WebSearchTool has a GrowthBook feature flag that controls which model processes the search:

src/tools/WebSearchTool/WebSearchTool.ts — call()
const useHaiku = getFeatureValue_CACHED_MAY_BE_STALE(
  'tengu_plum_vx3',
  false,
)
// ...
thinkingConfig: useHaiku
  ? { type: 'disabled' as const }
  : context.options.thinkingConfig,
// ...
model: useHaiku ? getSmallFastModel() : context.options.mainLoopModel,
toolChoice: useHaiku ? { type: 'tool', name: 'web_search' } : undefined,

When tengu_plum_vx3 is enabled, the search uses Haiku with thinking disabled. When it’s off, the search uses the main model (Opus or Sonnet) with whatever thinking configuration the user has — including extended thinking if they’re using ultrathink.

This isn’t about WebFetch (the content processing step). This is about WebSearch itself — the step where the user’s question gets transformed into search queries and search results get selected. The model that interprets your question and decides what to search for can be Haiku (small, fast, no chain-of-thought) or the main model (large, capable, potentially with extended thinking).

The default is false — meaning the main model processes searches by default. But this is a GrowthBook flag, which means Anthropic can flip it for any percentage of users at any time without a code release. We have no way to know what the current split is.

Why this matters for GEO experiments: If you’re running citation experiments against Claude Code, your results may be non-deterministic in ways you can’t control. Two users asking the same question might get different search queries — one generated by Opus with extended thinking, another by Haiku with no thinking at all. The quality and specificity of those search queries determines which URLs surface, which determines what gets cited. If your experiments show inconsistent citation rates, this flag could be why. There’s also a toolChoice difference: with Haiku, the tool is force-called (type: 'tool'), while the main model can choose whether to search at all. This means Haiku-mode users always trigger a web search, while main-model users might decide the question doesn’t need one.

06A third detail nobody’s discussed: web search is US-only

One line in the WebSearch prompt stopped me:

src/tools/WebSearchTool/prompt.ts — getWebSearchPrompt()
Usage notes:
  - Domain filtering is supported to include or block specific websites
  - Web search is only available in the US

Claude Code’s web search tool prompt explicitly states "Web search is only available in the US." I want to be careful here — this could mean several things. It might be enforced at the API level, meaning international users genuinely can’t web-search. It could be a documentation note about the beta rollout that’s since been expanded. Or it could be aspirational text that doesn’t match current behavior. Anthropic’s public docs list Claude’s availability across many countries, but don’t specifically address whether web search in Claude Code is geo-restricted.

If it is enforced, the GEO implication is significant: international Claude Code users would rely entirely on training data for their responses, never triggering web search citations. For those users, optimization shifts from "be findable in live search" to "be prominent enough to appear in training data." Worth investigating, but I’m flagging this as unconfirmed rather than asserting it as fact.

07Revised optimization framework

Based on the actual code (correcting our Part 1 where needed):

REVISED OPTIMIZATION STACK8 ACTIONS

Corrected priorities based on what the code actually says.

1

Answer the most likely questions directly

Haiku answers questions, not summarizes — query alignment beats structural optimization

2

Make answers self-contained and near the top

Haiku has no system prompt telling it to search deeper; first-match bias is real

3

Write descriptive title tags

Search results are {title, url} — title is citation label (unchanged from Part 1)

4

Use hard numbers, specific stats, structured data

125-char quote limit forces paraphrasing; unique data points survive paraphrasing intact

5

Cover questions from multiple angles on one page

Different user queries trigger different Haiku outputs from the same page — breadth of Q&A coverage = more citation surface

6

Accept citation fidelity is lower for non-preapproved domains

Your words will be paraphrased; optimize for the idea being attributed, not exact quotes

7

Account for non-determinism in experiments

tengu_plum_vx3 means search quality varies across users by model selection

8

Consider your audience's geography

Web search may be US-only — international users get training data only

08What I got wrong, and why corrections matter

I could have just updated Part 1 silently. The reason I’m writing this as a separate post is that the "Haiku summarizes your content" framing has spread beyond our article — it’s in Growtika’s analysis, Giuseppe Gurgone’s pre-leak reverse engineering, and multiple Hacker News threads. It’s becoming accepted wisdom, and it’s not quite right.

The distinction between "summarize this page" and "answer this question using this page" sounds academic until you’re deciding how to structure a 3,000-word guide. Under the summarization model, you’d optimize for scannability — headings, bullets, front-loaded facts. Under the question-answering model, you’d optimize for query coverage — making sure each likely question has a clear, extractable answer.

The best content does both. But if you’re choosing where to invest your next hour of optimization work, the code says query alignment wins. The headings aren’t for Haiku’s structural parsing — they’re for Haiku’s ability to quickly locate the answer to a specific question. Same visible output, different underlying reason, and the reason matters when the edge cases diverge.

I’ve added an update note to Part 1 linking here. If I get something wrong again, I’ll correct it again. That’s how this works.


Methodology: Same as Part 1 — direct examination of the Claude Code v2.1.88 leaked source. The key files for this article are src/tools/WebFetchTool/prompt.ts (47 lines — I’ve shown the entire function), src/tools/WebFetchTool/utils.ts (the applyPromptToMarkdown call with empty system prompt), and src/tools/WebSearchTool/WebSearchTool.ts (the tengu_plum_vx3 flag and model selection logic). Every code excerpt is verbatim.

Part 1: I Read All 512,000 Lines of Claude Code’s Leaked Source

Growtika: Claude’s Code Just Leaked — 7 GEO Findings

Alex Kim: The Claude Code Source Leak — fake tools, frustration regexes, undercover mode

Giuseppe Gurgone: How Claude Code Eats the Web

Mikhail Shilkov: Inside Claude Code’s Web Tools

Quercle: How Claude Code Web Tools Work

WaveSpeed AI: Claude Code Architecture Deep Dive

Written by AI. Obviously.