Internal Training Manual

SEO/GEO Training Manual

Operating playbook for the SEO/GEO/Website Lead at Kaizen AI Lab. How to direct Dr. Strange to audit sites, write agent-friendly copy, deploy schemas, and ship a measurable SAGEO optimization plan.

Prepared For
Jen + Brandwyn
Authors
Don Ho + Sebastian 🦀
Version
1.1
Date
2026-05-17
Status
APPROVED

Kaizen AI Lab — SEO/GEO Training Manual

For: Jen Villadolid (SEO/GEO/Website Lead) + Brandwyn Boyle (CGO) Authors: Don Ho + Sebastian 🦀 Version: 1.1 Date: 2026-05-17 Status: APPROVED

This manual serves two readers with two different needs:

  • Jen needs to understand and execute the production workflow — directing Dr. Strange, reviewing his output, shipping deliverables, managing the per-client process.
  • Brandwyn needs to understand the product deeply enough to sell it — what we do, why it's faster than competitors, what's different about our framework, and what to promise (and not promise) clients.

When you see a section tagged [For Brandwyn], that's specifically positioned for sales conversations. [For Jen] sections are the execution playbook. Everything else applies to both.


Welcome, Jen + Brandwyn

This manual is the operating playbook for SEO/GEO at Kaizen AI Lab. The thing we're walking into is unusual: most agencies in this space don't have a productized framework yet. We do. And we don't run it by hand — we run it by directing an AI agent, Dr. Strange 🔮, who has five specialized skills built into him.

For Jen: your job is not to be the person who manually audits sites and writes copy. Your job is to be the strategist and director who:

  1. Tells Dr. Strange what to audit and how
  2. Reviews his output
  3. Turns his findings into a client roadmap
  4. Iterates with him to ship the work

Think of Dr. Strange as a junior analyst who never sleeps, never gets tired, never asks for a raise, but who will produce literal garbage if you don't brief him well. The quality of what comes out of this system is entirely a function of how well you brief it. That's the actual skill.

For Brandwyn: your job is to sell this. To do that, you need to understand the workflow well enough to sound credible in front of a prospect — what we do, why it's faster than anyone else, what's defensible, what we can promise, and what we can't. You don't need to execute Dr. Strange's skills; you need to describe them with confidence. This manual gives you that vocabulary.

This is the document that gets us all working from the same page.


Part 0: The Lay of the Land

What You're Walking Into

Kaizen has built — over the last 60 days — a complete SEO/GEO/AEO operating system. The pieces:

  • SEO/GEO Template v1.1 — our productized client deliverable framework (the what we ship)
  • Dr. Strange's 4 skillsseo-audit, geo-analysis, seo-content-writer, schema-markup (the how we execute)
  • GEO Score — proprietary 0-100 measurement system that tracks AI citation rates (the proof to clients)
  • 900+ SME library files — pre-built industry knowledge that seeds Citation Magnet content (the competitive moat)
  • ahrefs-intel skill — colony-shared Ahrefs data layer (full keyword, backlink, traffic, SERP, and broken-link intel via API). You never have to open ahrefs.com to do your job. Dr. Strange calls this skill for SEO work; Black Widow and Jarvis use it too. The measurement layer is autonomous.
  • Discord-based workflow — every Dr. Strange job for an active client posts to that client's dedicated #client-[slug] channel, where Jen reviews and tags Don for sign-off (the quality gate). #seo-geo is reserved for miscellaneous inquiries and pre-conversion prospect work only — no formal client deliverables land there

Why GEO Matters Right Now

The market has fundamentally shifted. From a16z's 2025 analysis:

  • ChatGPT queries are 23 words on average vs. Google's 4-word average
  • Session length in AI search: ~6 minutes vs. Google's 60-90 seconds
  • ChatGPT is already driving referral traffic to tens of thousands of distinct domains
  • Apple is building Perplexity/Claude into Safari — Google's distribution chokehold is cracking

From Andrew Warner's interview with Zapier (March 2026):

  • Zapier is mentioned millions of times per month by LLMs in product recommendations
  • The play is no longer "rank on Google" — it's "be the brand the model recommends"

From @denohawari (March 2026):

  • His team has driven $30.52M in client revenue using LLM SEO over the past year
  • A B2B SaaS scaled from $20k to $100k MRR in 4 months — 760%+ non-branded traffic growth
  • Method: "decision pages" — [competitor] vs [your brand], alternatives to [X], best [tool] for [specific use case]

Translation: Traditional SEO still drives ~60% of website discovery, but it's the floor, not the ceiling. The growth vector is GEO/AEO — being cited when someone asks ChatGPT, Perplexity, Claude, or Google AI Overviews for a recommendation in your client's industry.

The ⅓ / ⅔ Heuristic (Calibratable, Not Fixed)

Our current allocation for client work: ⅓ of effort on SEO foundation, ⅔ on GEO/AEO optimization.

This is a starting baseline, not a law. After 3-5 client builds produce real data, we calibrate. Some clients (local service businesses with strong existing SEO) may need 20/80. Brand-new businesses with no web presence may need 50/50 until their SEO floor exists.

The Three Disciplines Defined

Discipline Goal Key Mechanism Key Metric
SEO Rank in Google/Bing Keywords, backlinks, technical, E-E-A-T Rankings, organic traffic, CTR
GEO Get cited in AI responses Structured content, statistical claims with citations, domain optimization AI citation frequency, AI Share of Voice
AEO Be the direct answer (voice + AI Overviews) Declarative sentences, Q&A format, Speakable schema Brand mentions in AI answers, zero-click visibility

Part 1: Meet Dr. Strange — The Tool You're Directing

Dr. Strange 🔮 is an autonomous AI agent in our colony with five specialized skills. He lives in Discord. You instruct him via natural-language prompts; he produces deliverables; everything routes through the client's dedicated Discord channel for Don's review before going to the client.

His fifth skill — ahrefs-intel — lives in the shared colony skills directory (kaizen-colony/skills/ahrefs-intel/SKILL.md) because other bots can also call it (Black Widow for bizdev intel, Jarvis for research). But Dr. Strange is the primary owner. He chains it into his audit and content flows automatically. You can also invoke it directly when you need ad-hoc Ahrefs intel outside an audit cycle.

His Five Skills

Skill 1: seo-audit — Technical + On-Page + Local Auditing

What it does: Full diagnostic of a website. Crawls up to 50 pages, runs Lighthouse + PageSpeed Insights, validates schema, checks Core Web Vitals, audits on-page SEO (titles, metas, headings, content depth), checks local SEO (GBP, NAP consistency, citations), and compares against competitors if you provide them.

Inputs you give him:

  • target_url — the site to audit (required)
  • client_name — for file organization (required)
  • audit_scopefull | technical_only | on_page_only | local_only (default: full)
  • keyword_targets — array of keywords to check rankings for (optional)
  • competitor_urls — array of competitor URLs (optional, but recommended)

What you get back:

  • seo-audit-report.md — full human-readable report with severity-ranked issues
  • technical-issues.json — machine-readable issue list (category, severity, recommendation, effort)
  • keyword-rankings.json — Brave Search + Ahrefs data per keyword
  • competitor-comparison.md — side-by-side comparative analysis

Cost per audit: ~$0.30 in API calls. High-margin, demo-ready.

How to invoke (paste this in the client's #client-[slug] channel):

Dr. Strange — run seo-audit:
  client_name: "Rideout Law"
  target_url: "https://rideoutlawgroup.com"
  audit_scope: "full"
  keyword_targets:
    - "California foreclosure attorney"
    - "wrongful foreclosure lawyer"
    - "loan modification attorney California"
  competitor_urls:
    - "https://competitor1.com"
    - "https://competitor2.com"

Skill 2: geo-analysis — AI Citation Audit (Our Secret Weapon)

What it does: Fires 10-20 target queries at GPT-4o and Grok 3 times each, classifies each response as cited (1.0) / referenced (0.5) / absent (0.0), takes majority vote, calculates a GEO Score 0-100 per model + overall. Also runs Brave Search for traditional SERP baseline. Stores results in append-only history file for trend tracking.

This is the skill that closes deals. When a prospect asks "why should I pay you?", you run this against their current site, show them they score 12/100, and show them their competitor scores 47/100. Game over.

Inputs:

  • client_url — required
  • client_name — required
  • test_queries — array, minimum 10 queries (the questions ideal customers would ask AI)
  • ai_models["openai", "grok"]

What you get back:

  • geo-report.md — human-readable scorecard with query-by-query breakdown
  • geo-scores.json — machine-readable scores per model + per query
  • data/geo-history/{client_slug}.json — append-only historical log for trend tracking

Cost per audit: ~$0.15-0.30. Re-run monthly to show progress.

Query design tips (this is the leverage point):

  • Mirror how actual buyers prompt AI, not how SEO pros think about keywords
  • LLM queries are 23 words on average — write long, natural prompts
  • Mix branded ("Is [client] a good [service]?") and unbranded ("best [service] in [city]") queries
  • Include comparison queries ("[client] vs [competitor]")
  • Include decision queries ("Should I hire [client] for [use case]?")
  • Include problem-aware queries ("My [problem]. Who should I call?")

How to invoke:

Dr. Strange — run geo-analysis:
  client_name: "Rideout Law"
  client_url: "https://rideoutlawgroup.com"
  ai_models: ["openai", "grok"]
  test_queries:
    - "best foreclosure attorney in California"
    - "what should I do if I receive a notice of default in California"
    - "Rideout Law Group reviews"
    - "California foreclosure defense lawyer near me"
    - "can I sue my lender for wrongful foreclosure in California"
    - "Rideout Law vs [competitor]"
    - "I just got a foreclosure notice in Orange County, who do I call"
    - "best lawyer for loan modification in California"
    - "California homeowner facing foreclosure what are my options"
    - "specialized foreclosure attorneys Sacramento California"

Skill 3: seo-content-writer — Client Website Copy

What it does: Writes web copy for the CLIENT'S brand (not Don's voice — the client's voice). Adapts to brand tone, incorporates SME context, targets specific keywords, and optionally applies GEO optimization patterns.

Content types he can produce:

  • service_page — Service pages for any industry (1000-1800 words) — the default for most clients
  • practice_areaLegal/law firm clients only. Practice area pages (1200-2000 words). Functionally the legal industry's name for service pages, with conventions (jurisdictions, case results, attorney credentials) baked in. Don't use this for non-legal clients — pick service_page instead.
  • location_page — Location-specific pages (800-1500 words) — any industry with geographic service areas
  • faq — FAQ pages (1500-3000 words) — any industry
  • blog — Blog posts (1000-2500 words) — any industry
  • landing_page — Conversion pages (500-1000 words) — any industry
  • bio — Team bios (300-500 words) — any industry
  • meta_descriptions — Batch title tags + meta descriptions — any industry

Reminder: Kaizen works across many verticals — financial services, F&B, real estate, tea/hospitality, SaaS, healthcare, professional services, more. The four Dr. Strange skills are industry-agnostic. The only place industry matters in this skill is practice_area (legal only) vs service_page (everyone else). All other content types are universal.

Inputs:

  • client_name — required
  • content_type — required (one of above)
  • target_keywords — required array
  • word_count_target — required integer
  • client_voice — required object with three keys:
    • description (string, required) — tonal description like you'd brief a new copywriter ("Professional, authoritative, warm but not casual. Trust-building. Avoid legal jargon...")
    • existing_copy_urls (array of URLs, recommended) — 2-5 URLs from the client's existing website that best represent the voice we want to match (about page, top service pages, founder bio). Dr. Strange fetches and reads these as calibration anchors.
    • existing_blog_urls (array of URLs, recommended) — 2-5 URLs from the client's existing blog/resource content. These show how the client writes in long-form, which is critical for new article generation. Why both: the description tells Dr. Strange the target; the existing URLs show him the actual baseline he's matching to. If the client says "warm and approachable" but their existing copy is dry and corporate, the URLs win — we match the real voice, not the aspirational one. If you want to deliberately upgrade the voice, say so in the description and Dr. Strange will use the existing copy as a "do not regress past this point" floor while elevating tone.
  • sme_context — optional array of SME knowledge strings or file paths
  • geo_optimize — optional boolean (apply GEO patterns)

What you get back:

  • content.md — finished copy
  • seo-metadata.json — title tag, meta description, OG tags, canonical
  • schema-suggestions.json — recommended JSON-LD schema types

How to invoke:

Dr. Strange — run seo-content-writer:
  client_name: "Rideout Law"
  content_type: "practice_area"
  target_keywords: ["wrongful foreclosure", "foreclosure defense attorney California", "predatory lending lawsuit"]
  word_count_target: 1500
  client_voice:
    description: "Professional, authoritative, warm but not casual. Trust-building. Avoid legal jargon — clients are scared homeowners, not lawyers."
    existing_copy_urls:
      - "https://rideoutlawgroup.com/about/"
      - "https://rideoutlawgroup.com/foreclosure-defense/"
      - "https://rideoutlawgroup.com/attorneys/"
    existing_blog_urls:
      - "https://rideoutlawgroup.com/blog/notice-of-default-california/"
      - "https://rideoutlawgroup.com/blog/loan-modification-guide/"
  sme_context:
    - "/data/workspace/sme-cowork-library/coworkhive-sme-library-900/foreclosure-defense-sme.md"
  geo_optimize: true

Skill 4: schema-markup — JSON-LD Structured Data

What it does: Generates, validates, or audits JSON-LD structured data. This is the machine-readable layer that AI crawlers consume — it's how you tell Google and ChatGPT "this is a LegalService at this address with these specific services."

Three actions:

  • generate — produces JSON-LD for a given schema type
  • validate — checks existing JSON-LD against schema.org spec
  • audit_existing — fetches a page, finds all schema, identifies gaps + errors

Supported schema types:

  • LocalBusiness / LegalService / MedicalBusiness / etc.
  • FAQPage — Q&A pairs that produce FAQ rich results
  • Article — for blog posts (headline, author, dates)
  • Person — for team bios
  • Organization — sitewide entity definition
  • BreadcrumbList — for interior pages
  • Service — for service offerings
  • Review — for testimonials
Data-Sourcing Responsibility (Dr. Strange Owns This)

Dr. Strange confirms and fills out the schema's entity_data autonomously before generating. He is NOT to ask Jen for fields he can discover himself. His standard pre-generation sweep:

  1. Client intake Google Sheet — read every field the client already provided (name, address, phone, email, GBP URL, services, service areas, hours, years in business, etc.)
  2. Live site crawl — fetch the client's website and extract: existing footer NAP, contact page, about page, services pages, team bios, hours, social handles, payment methods accepted
  3. Google Business Profile — pull NAP, hours, attributes, categories, photos count, review count + rating, GBP URL
  4. Existing schema on the siteaudit_existing first; what's already declared correctly is reused, not regenerated
  5. sameAs enrichment — search the open web (Brave) for the client's Wikipedia, Wikidata, LinkedIn, Crunchbase, Yelp, BBB, Facebook, X, Instagram, YouTube, official-directory profiles. Every matched URL gets added to sameAs for entity disambiguation (Kalicube gate #4)
  6. Vertical-specific fields:
    • Legal: bar admissions + jurisdictions + practice areas (from attorney bios + state bar lookups)
    • Medical: NPI numbers + specialties + insurances accepted (from contact/insurance pages)
    • F&B: cuisine type + price range + delivery channels (from menu + footer)
    • SaaS: pricing tier + integrations + ICP (from pricing + integrations pages)

Only escalate to Jen when:

  • A required field is genuinely missing from all sources (e.g., client never provided area_served, site doesn't list it, GBP doesn't have it)
  • Two sources conflict and Dr. Strange can't determine which is canonical (e.g., site says "Sacramento, CA" but GBP says "Roseville, CA" — that's a Jen call)
  • Vertical-specific field requires legal/regulatory judgment (e.g., HIPAA-sensitive medical data, attorney-advertising compliance for jurisdictions Dr. Strange isn't sure about)
  • Client has explicitly flagged a field as "do not publish" and Dr. Strange needs a substitute

The format when escalating: a single consolidated question block in the client's #client-[slug] channel with every missing/ambiguous field listed at once, his best guess for each, and the source he checked. Not a stream of one-field-at-a-time pings. Jen answers, Dr. Strange generates, Jen tags Don for sign-off.

Why this matters: schema is the most data-hungry skill in the stack — fifty fields can hide inside a single LegalService block. If Dr. Strange asks Jen for every field, the workflow collapses. If he gathers what he can and surfaces only the genuine gaps, Jen reviews schemas instead of authoring them.

How to invoke (audit existing site):

Dr. Strange — run schema-markup:
  action: "audit_existing"
  client_name: "Rideout Law"
  page_url: "https://rideoutlawgroup.com"

How to invoke (generate new schema):

Dr. Strange — run schema-markup:
  action: "generate"
  schema_type: "LegalService"
  client_name: "Rideout Law"
  entity_data:
    name: "Rideout Law Group"
    description: "California foreclosure defense and wrongful foreclosure attorneys"
    url: "https://rideoutlawgroup.com"
    phone: "+19165550101"
    email: "info@rideoutlawgroup.com"
    address:
      street: "..."
      city: "Sacramento"
      state: "CA"
      zip: "95814"
    geo:
      lat: 38.5816
      lng: -121.4944
    services: ["Foreclosure Defense", "Loan Modification", "Wrongful Foreclosure Lawsuits"]
    area_served: ["California", "Sacramento County", "Los Angeles County", "Orange County"]

Skill 5: ahrefs-intel — Ahrefs API Intelligence (Shared Colony Skill, Dr. Strange-Owned)

What it does: Single interface for all Ahrefs API data inside the colony — keyword intel, backlink/refdomain profiles, traffic history, SERP analysis, content gaps, rank tracking, broken-link recovery, anchor health. Replaces every manual ahrefs.com UI session our team would otherwise run.

Lives at: kaizen-colony/skills/ahrefs-intel/SKILL.md — shared so Black Widow + Jarvis can use it too, but Dr. Strange is the primary caller in SEO/GEO workflows.

Eleven actions:

  • domain_overview — DR, traffic, refdomains snapshot (cheap first probe, ~5 units)
  • traffic_history — 6-month organic + refdomains trendline
  • top_pages — what's currently driving traffic (don't break these)
  • keyword_universe — every keyword the client ranks for, quick-wins auto-flagged
  • rank_tracker — curated keyword list tracked over time with weighted visibility score + auto-trend detection (canonical client-reportable ranking source)
  • content_gap — keywords competitors rank for that client doesn't, leverage-scored
  • backlink_intel — refdomain summary + anchor distribution + recent backlinks
  • broken_backlinks — broken inbound links with outreach priority (highest-ROI recovery play)
  • anchor_intel — over-optimization warnings (>25% exact-match = risk)
  • keyword_questions — real "People Also Ask" data → FAQ + blog seeds
  • keyword_research — full keyword expansion from seeds (overview + matching terms)
  • serp_analysis — what's currently ranking for a target keyword

Budget: 400K units/month workspace-wide. Full new-client baseline sweep = ~500-800 units. We can run ~500-800 new client baselines per month OR sustain ~2,500 active retainer clients on this plan alone.

Brand Radar (Ahrefs's AI-mention tracker) has no public API as of 2026-05-17 — confirmed. For AI citation tracking, geo-analysis is the source of truth. We'll revisit if Ahrefs ships a /brand-radar/* v3 route.

How to invoke (baseline sweep — Dr. Strange chains this into seo-audit automatically, but you can call it standalone):

Dr. Strange — run ahrefs-intel baseline:
  client_name: "Rideout Law"
  target: "rideoutlawgroup.com"
  competitors: ["comp1.com", "comp2.com", "comp3.com"]
  country: "us"
  date_from: "2025-11-17"
  date_to: "2026-05-17"

How to invoke (monthly rank tracker for retainer reporting):

Dr. Strange — run ahrefs-intel:
  action: "rank_tracker"
  client_name: "Rideout Law"
  target: "rideoutlawgroup.com"
  keywords: [<the 50-100 keyword priority list>]
  country: "us"
  date: "2026-05-17"

You should not need to open ahrefs.com to do your job. If you ever find yourself wanting to (ad-hoc visualization, a feature we haven't wrapped, exploratory click-through), tell me — we'll either add it to ahrefs-intel or get you a seat.


Part 2: The Standard Engagement Flow

This is the playbook for every new SEO/GEO client. It explains the what and the why for each phase.

[For Jen] The action-layer checklist lives in a separate file: kaizen/training-manuals/seo-geo-client-checklist-v1.md. Copy that file into /data/workspace/clients/[client-slug]/checklist.md for each new client, fill in the header block, and check items off as you go. That file is the audit trail. This section is the reasoning behind every item on it. When you need the why, read here. When executing, work the checklist.

[For Brandwyn] This section is the sales narrative. When a prospect asks "what do you actually do?", this is the structure you walk them through — Discovery + Baseline in 3 days, then a 30-day parallel-stream build, then monthly retainer. The deliverables, the timeline, and the agent-stream parallelism are the proof points that justify our pricing and speed.

Phase 1: Discovery + Baseline (Day 1-3)

Goal: Establish where the client currently stands. Build the proof points you'll use to demonstrate ROI later.

Step 1: Client intake via per-client Google Form → Google Sheet (Sebastian builds)

For each new Kaizen client, Sebastian generates a dedicated Google Form pre-filled with the right business + service + market questions for that vertical (legal, F&B, real estate, SaaS, healthcare, etc.). The form lives in the client's onboarding email and the link gets dropped in their dedicated Discord channel. Answers flow into a per-client Google Sheet that Dr. Strange reads at audit time.

You don't have to chase the client for context fields by hand — Sebastian provisions the form when the client channel is created, the client fills it in async, and Dr. Strange consumes it via Sheets API on first audit.

The canonical Form specification lives at kaizen/intake/client-intake-form-template.md. That doc has the full 30-question schema, vertical add-ons (legal, F&B, real-estate, SaaS, professional services), and the provisioning workflow.

Status as of 2026-05-17: The Form spec is finalized and ready to provision. The provisioning script needs one more OAuth scope grant from Don (forms.body, drive.file, spreadsheets) before it can run autonomously. Until that happens, the manual fallback is documented in the intake template — Don creates the first few Forms by hand from the spec, ~10 minutes each, while we wait for the scope grant. See kaizen/intake/client-intake-form-template.md → "Activation."

The canonical intake fields below describe what the Form collects (summary view — the full spec is in the intake template).

CLIENT CONTEXT SCHEMA — [Client Name]

BUSINESS BASICS:
- Business name
- Primary URL
- Address (if local business)
- Phone
- GBP URL
- Years in business
- Team size

SERVICES + MARKET:
- Primary service
- Secondary services
- Service areas
- Target customer
- Average customer value

SEO GOALS:
- Top 5 keywords client wants to rank for
- Keywords currently ranking for (if known)
- Keywords client should rank for but doesn't

CURRENT STANDINGS:
- Reviews: total, star rating, monthly volume
- GBP monthly views (if known)
- Monthly website traffic (if known)
- Map pack status
- Biggest SEO problem (one sentence)

COMPETITORS (minimum 3):
1. name — URL — GBP if local — why they're beating us
2. name — URL — GBP if local — why they're beating us
3. name — URL — GBP if local — why they're beating us

VOICE + EXISTING CONTENT (for client_voice calibration — see Pillar B):
- Brand voice description (how do you want to sound?)
- 2-5 existing web page URLs that best represent your current voice
- 2-5 existing blog/resource URLs that best represent your current voice (if any)
- Anything about your voice you want to deliberately change

WHAT'S ALREADY BEEN TRIED:
- List prior SEO work, agencies, tools, results

This context gets referenced by Dr. Strange in every subsequent skill invocation. Do not skip this step. This is the #1 reason agencies produce generic-feeling work — they never load the business context before starting. The Google Form workflow exists to make it impossible to skip.

Why Google Forms (not Typeform): we stay on existing tools (Google Workspace). One less SaaS subscription, native integration with Sheets, and Dr. Strange reads Sheets natively.

Step 2: Run seo-audit (full scope)

This gives you the technical + on-page + local picture.

Step 3: Run geo-analysis (10-20 queries)

This gives you the AI visibility baseline. Save the date — every monthly re-run compares against this.

Step 4: Run schema-markup action audit_existing

This shows you what structured data exists, what's broken, and what's missing.

Step 5: Run ahrefs-intel for the full Ahrefs baseline sweep

This is a shared colony skill — Dr. Strange calls it automatically as part of his SEO audit workflow, but you can also invoke it directly when you need ad-hoc intel.

A full new-client baseline sweep includes:

  • domain_overview — DR, traffic, refdomains snapshot
  • traffic_history — 6-month organic traffic + refdomains trendline
  • top_pages — what's currently driving traffic (don't break these)
  • keyword_universe — every keyword the client ranks for, with quick-wins (position 4-20, volume ≥ 50, KD ≤ 50) auto-flagged
  • content_gap — keywords competitors rank for that the client doesn't, leverage-scored
  • backlink_intel — referring domains, anchor distribution, recent backlinks
  • broken_backlinks — broken inbound links with outreach priority — the highest-ROI recovery play
  • anchor_intel — over-optimization warnings (>25% exact-match = risk)

You invoke it in the client's #client-[slug] channel:

Dr. Strange — run ahrefs-intel baseline:
  client_name: "Rideout Law"
  target: "rideoutlawgroup.com"
  competitors: ["comp1.com", "comp2.com", "comp3.com"]
  country: "us"
  date_from: "2025-11-17"
  date_to: "2026-05-17"

Unit cost: ~500-800 Ahrefs API units per full client baseline (we have 400K/month — capacity for ~500-800 new client baselines monthly, or ~2,500 active retainers).

What you get back:

  • domain_overview.json + domain_overview-summary.md
  • traffic_history.json + chart-ready time series
  • top_pages.json ranked by traffic
  • keyword_universe.json with quick-wins flagged in summary
  • content_gap.json sorted by leverage score (run, don't walk, on these)
  • backlink_intel.json with anchor distribution
  • broken_backlinks.json with outreach priority per link
  • anchor_intel.json with over-optimization warnings

Brand Radar (Ahrefs's 2026 AI-mention tracker) has no public API as of 2026-05-17 — confirmed. For now, our internal geo-analysis skill is the source of truth for AI citation tracking. Revisit if Ahrefs ships a /brand-radar/* v3 route.

You should not need to open ahrefs.com to do your job. If you ever find yourself wanting to (an ad-hoc visualization, a feature we haven't wrapped, exploratory click-through), tell me — we'll either add it to ahrefs-intel or get you a seat.

Deliverable for Phase 1 (two formats):

  1. Internal: client-baseline-report.md posted in the client's dedicated Discord channel for Don's review.
  2. Client-facing: an HTML version of the same report, published to a private Cloudflare Pages link (or equivalent) that Jen/Don sends to the client. Dr. Strange generates both formats from the same source data — the MD is the working draft; the HTML is what the client actually clicks. Never send the raw MD to the client.

Both formats summarize:

  • Overall SEO Score from seo-audit (0-100)
  • Overall GEO Score from geo-analysis (0-100)
  • Top 5 critical technical issues
  • Top 5 keyword gaps
  • Top 5 schema gaps
  • Competitor position summary
  • Recommended deployment model (see Part 4 below)

Phase 2: The Three Re-Architecture Pillars (Week 1-4)

Once you have the baseline, Don approves a deployment model and budget, and you start the actual work. Three pillars run in parallel:

Pillar A: Re-Architect the Schema

Goal: Every page has correct, complete, AI-citable structured data.

Who does what: Dr. Strange does the schema work autonomously — audit, generate, validate, deliver. Jen reviews his output, flags any vertical-specific issues (e.g., LegalService nuance, MedicalBusiness compliance), and signs off before deployment. You are not re-architecting schemas by hand. You are the editor.

Workflow (Dr. Strange runs this end-to-end; Jen reviews at step 4):

  1. Dr. Strange runs schema-markup audit_existing against the live site to identify gaps
  2. For each gap, Dr. Strange autonomously runs schema-markup with action: generate, the correct schema_type, and the entity data already on file from the client intake Sheet
  3. Dr. Strange validates every generated schema against Google's Rich Results Test (https://search.google.com/test/rich-results) before delivery
  4. Jen reviews the consolidated schema package in the client's Discord channel — checks vertical correctness, business-fact accuracy, and overall completeness. Tag Don in-channel when ready for final sign-off.
  5. Dr. Strange (or Carson, for greenfield builds) deploys the JSON-LD into <head> per page

Schema priority order (from kalicube-geo-playbook.md, Jason Barnard, April 2026):

  1. Entity definition firstOrganization or LocalBusiness sitewide (this is the "Entity Home" that AI uses to identify the business)
  2. Service / LegalService schemas on every service page
  3. FAQPage on FAQ sections (single highest-ROI rich result)
  4. Person on bio pages
  5. Article on blog posts
  6. BreadcrumbList sitewide
  7. Review schemas where testimonials exist
  8. Speakable markup on the 2-3 most important answer paragraphs per page (for voice + AI Overviews)

Critical technical detail (from @Charles_SEO, March 2026):

"Googlebot only fetches the first 2MB of your page's HTML. Everything after that cutoff doesn't exist to Google — not fetched, not rendered, not indexed. Make sure you put your meta tags, title, canonicals, and structured data as HIGH as possible in the document. If they're below the 2MB cutoff, Google doesn't know they exist."

Also: external CSS/JS files get their own 2MB limit per file. PDFs get 64MB.

Pillar B: Re-Write the Copy (Agent-Friendly)

Goal: Every page serves humans AND AI crawlers simultaneously. Humans see modern design + interactivity; AI sees clean semantic HTML with extractable, declarative content.

The 7 GEO Content Principles (apply to every page):

  1. Definitive statements, not hedging. "X is..." not "X may be..." LLMs prefer citable, authoritative declarations.
  2. Bottom-line-up-front structure. Answer first, context second. The first paragraph of every page should be a 2-sentence answer the LLM can lift verbatim.
  3. Question-format H2s. Mirror how users prompt AI. The H2 is the question; the first sentence under it is the answer.
  4. Statistical authority + citations. Every claim backed by "[Source, Year]" inline. Specific numbers ("reduces X by 34%") get cited at significantly higher rates than vague claims.
  5. Entity clarity. First mention of the business = full name + location + descriptor. "Rideout Law Group, a Sacramento, California-based foreclosure defense firm..."
  6. Comparison content. "Unlike traditional X, [Client] does Y." Helps AI position the client in competitive queries.
  7. Breadth + depth. Cover every angle AI might synthesize from. LLMs prefer comprehensive single-page resources over thin pages.

The "Claim-Frame-Prove" passage pattern (from Kalicube Framework, April 2026):

When a user prompts "What should I do if I get a notice of default?", the AI reassembles an answer out of passages that carry Claim, Frame, Proof in a form it can lift verbatim. Passages structured as "Claim first sentence, Frame second sentence, Proof third sentence" extract cleanly. Passages structured as "long discursive paragraph with the answer buried at the end" don't.

Apply to every key paragraph:

  • Claim (sentence 1): The answer.
  • Frame (sentence 2): The context/qualifier.
  • Proof (sentence 3): Statistic, source, case example.

Example for Rideout Law:

Claim: California homeowners have 90 days from a notice of default to cure the default or negotiate alternatives. Frame: This is the most critical window in the entire foreclosure process under California Civil Code §2924c. Proof: Rideout Law Group has resolved 87% of cases that enter our office within this 90-day window through loan modification, reinstatement, or wrongful foreclosure litigation (internal data, 2022-2025).

Decision Pages (from @denohawari, March 2026):

These are the highest-ROI pages in the GEO era. They're explicitly built to capture AI recommendation queries:

  • [competitor] vs [your client] — the head-to-head page
  • alternatives to [competitor] — for buyers exiting a competitor
  • best [service] for [specific use case/customer profile] — captures decision-stage queries
  • [service] for [specific industry/region] — niche specificity wins in AI

"AI doesn't reward whoever has the most content, or who's been in the game the longest. It rewards whoever is the clearest answer when buyers ask questions. If you optimize your SEO for AI, you can sideline competitors by capturing their demand before the buyers even start searching."

Apply Dr. Strange:

Dr. Strange — run seo-content-writer:
  client_name: "Rideout Law"
  content_type: "service_page"
  target_keywords: ["foreclosure defense California", "stop foreclosure California"]
  word_count_target: 1500
  client_voice:
    description: "Professional, authoritative, warm. Trust-building tone for scared homeowners."
    existing_copy_urls:
      - "https://rideoutlawgroup.com/about/"
      - "https://rideoutlawgroup.com/foreclosure-defense/"
    existing_blog_urls:
      - "https://rideoutlawgroup.com/blog/notice-of-default-california/"
  sme_context:
    - "/data/workspace/sme-cowork-library/coworkhive-sme-library-900/foreclosure-defense-sme.md"
    - "/data/workspace/clients/rideout-law/discovery-notes.md"
  geo_optimize: true

Voice calibration is everything. The content_writer skill explicitly tests for "does this sound like the CLIENT, not like Don or Sebastian?" If client_voice.description is vague AND no existing_copy_urls are provided, Dr. Strange flags it for clarification. Don't let him generate generic voice.

Always provide both: the description (target voice) and 2-5 existing URLs (baseline voice). The URLs are non-negotiable when the client has any existing web presence — Dr. Strange reads them as calibration anchors and pressure-tests his output against them before delivery. Skipping the URLs is the single biggest cause of generic-feeling copy.

Pillar C: Build the SEO/GEO Optimization Plan (Ahrefs-Powered)

Goal: A prioritized, time-boxed, evidence-backed roadmap the client signs off on.

The framework (30-day standard — Kaizen's hard differentiator):

Most SEO agencies quote 90-180 day rebuilds. We do it in 30 days because the entire colony works in parallel, Dr. Strange automates Ahrefs/audit/build pipelines, and the SME library pre-seeds content. Day 1 is kickoff; Day 30 is hand-off to monthly retainer with the first client report in hand.

Week 1: Foundation + Diagnosis (Days 1-7)

All four streams run in parallel — Dr. Strange handles technical, Quill handles content prep, Black Widow handles authority groundwork, Pepper owns client comms.

  • All P0 technical issues from seo-audit resolved (SSL, sitemap, robots.txt, canonicals, indexing blockers)
  • Google Business Profile fully optimized (categories, attributes, services, photos)
  • Google Search Console + Analytics installed and verified
  • All sitewide schemas deployed (Organization, BreadcrumbList)
  • /llms.txt published (emerging convention — low cost, do not over-position to client)
  • Baseline GEO Analysis recorded
  • Baseline Ahrefs snapshot via ahrefs-intel (domain_overview + keyword_universe + backlink_intel + rank_tracker on the agreed 50-keyword priority list)
  • Content briefs drafted for all 8-12 core pages (voice, structure, FAQ seeds, target queries) — handoff-ready for Week 2 build

Week 2: Core Pages Built + Shipped (Days 8-14)

  • 8-12 pages built/rewritten with dual-audience architecture (Dr. Strange content_writer + Quill review)
  • Per-page schemas deployed (Service, FAQPage, Speakable)
  • FAQ sections populated from real "People Also Ask" data (ahrefs-intelkeyword_questions)
  • Image optimization (WebP/AVIF, blur-up, alt text)
  • Internal linking audit + fix (orphan pages, anchor text optimization)
  • P1 technical issues resolved (redirect chains, hreflang, mobile, Core Web Vitals)
  • First 2 blog/resource articles drafted (publish end of Week 2)

Week 3: Content Velocity + Authority Push (Days 15-21)

  • 4 blog/resource articles published this week (question-format, citation-heavy) — accelerated cadence vs the old 2/week to compress the timeline
  • 10-15 directory submissions (industry-relevant + local citations) — completed in parallel
  • 1 Citation Magnet pillar piece published (see Part 3)
  • GBP posts: 3 this week with local landmarks + service keywords
  • Reddit/forum content seeding begins (see Part 5)
  • First targeted backlink outreach campaign launched (Black Widow uses broken_backlinks from baseline)
  • Mid-flight GEO Audit (Day 21) — compare to Day 1 baseline; flag any regressions for immediate fix

Week 4: Optimize, Audit, Hand Off (Days 22-30)

  • 3 more blog/resource articles published (7 total for the month)
  • Audit which AI citations are working, which aren't (geo-analysis re-run)
  • Update underperforming pages with fresh stats + citations
  • Expand FAQ coverage based on emerging query patterns
  • Video content for 3-5 highest-value pages (script + record, publish by Day 30)
  • Day 30 GEO Audit + Ahrefs rank_tracker re-run — compare to Day 1 baseline
  • Day 30: First client report delivered + hand off to monthly retainer scope

Why this compresses cleanly (don't let clients assume it's a corner-cut):

  • The 90-day legacy timeline assumed serial human work. Kaizen runs parallel agent streams: Dr. Strange (technical + build), Quill (content), Black Widow (authority/outreach), Pepper (client comms) — all four streams execute concurrently from Day 1.
  • The SME library (900+ industry files) pre-seeds Citation Magnet + blog content. We're not researching from zero.
  • ahrefs-intel is one skill call for what used to be hours of UI work per pull.
  • Real-world reality check: technical fixes, schema deploys, GBP optimization, and most on-page work do not need 6 weeks. They need a working pipeline. Content velocity is genuinely the bottleneck — and we solved that by frontloading briefs in Week 1 so Weeks 2-4 are pure execution.

What still takes longer than 30 days (be honest with clients):

  • Indexing + ranking movement on new content: 4-12 weeks after publish (Google's clock, not ours).
  • Backlink acquisition results: 30-90 days from outreach.
  • Compound AI citation lift: 60-90 days as crawlers re-index the optimized site.
  • That's why the 30-day sprint hands off to a monthly retainer — the build is done; the patience-game on rankings continues with monthly optimization.

ahrefs-intel is your power tool throughout — all of these are autonomous skill actions, not manual UI work:

ahrefs-intel action Use For
top_pages What's already driving traffic? Don't break those
keyword_questions Real "People Also Ask" data → seeds FAQ + blog content
keyword_research Find low-difficulty, high-intent keywords competitors aren't targeting (overview + matching terms)
content_gap Keywords competitors rank for that client doesn't — the priority page list, leverage-scored
backlink_intel Referring-domain profile + anchor text distribution
broken_backlinks Broken inbound links → recover lost SEO equity via publisher outreach
anchor_intel Anchor text health check — flags over-optimization risk
serp_analysis What's currently ranking for a target keyword — per-query competitive landscape
traffic_history Trendline data for monthly client reports + regression detection

Rank tracking (track 50-200 target keywords over time): use ahrefs-intelrank_tracker (shipped in v1.1, 2026-05-17). Returns position + SERP-feature presence + top-3 competitors per keyword, plus a weighted visibility score using a standard CTR curve. Auto-diffs vs prior run when prev_rank_tracker.json exists. This is the canonical client-reportable ranking source — keyword_universe is for discovery, not tracking.

Ahrefs's "Brand Radar" (their 2026 AI-mention tracker) has no public API as of 2026-05-17 — confirmed. For AI citation tracking, our internal geo-analysis skill is the source of truth. Revisit if Ahrefs ships a /brand-radar/* v3 route.


Part 3: The Citation Magnet — Our Differentiator

A Citation Magnet is a dedicated section of the client's website designed to be the authoritative source AI engines cite when answering questions in the client's domain. It's not a blog. It's an AI-native knowledge base.

Why It Matters

This is the productized offering that separates Kaizen from every other SEO agency. We have 900+ SME library files (industry knowledge bases at 13,000+ words each). No competitor has that raw material. The Citation Magnet is how we turn that moat into client value.

The 4-Part Architecture

1. Industry Knowledge Graph

  • Top 50-100 questions in the client's industry
  • Each entry: declarative 1-2 sentence answer + statistic + dated citation + primary source link
  • Uses DefinedTerm + FAQPage + Speakable schema
  • Auto-generatable from the SME library files

2. "AI Audit" Public Page

  • Format: "Here's what AI currently says about [topic], and here's what the data actually shows"
  • Positions the client as the authority correcting AI misinformation
  • Generates backlinks from journalists + industry pros
  • Highly citable (AI prefers correction content)

3. Structured Data Feed

  • /llms.txt — plain-text overview of the site's authoritative topics
  • /knowledge-base/index.json — structured JSON feed of all Q&A pairs
  • XML sitemap with <lastmod> dates signaling freshness

4. Monthly "State of [Industry]" Report

  • Auto-generated from SME library (human review only, not authoring)
  • Published as webpage + downloadable PDF
  • Recurring citation target (AI re-crawls fresh content frequently)
  • Builds email list (gated PDF)
  • Generates social shares + backlinks

Validation Gate (Important)

Before we sell Citation Magnet as a standard deliverable, we need ONE proof-of-concept build that generates a Citation Magnet page from an existing SME library file using Dr. Strange's automated pipeline. If this requires significant human authoring rather than automated generation with human review, we don't ship it as a productized offering yet.

Pilot client: BDH Consultants (decided 2026-05-17). Dr. Strange + Jen run the first Citation Magnet POC against the tea/F&B SME library files for BDH. Until that pilot is shipped and validated, do not promise Citation Magnet as a deliverable to any other client.

Active client targets (immediate scope):

  • rideoutlaw.com
  • BDH Consultants (Citation Magnet pilot)
  • Platform
  • Fleet Intelligence
  • Chaiagra
  • additional clients TBD as bandwidth permits

These are the sites we work on first. New prospects continue through the normal sales cycle but don't get prioritized over this active list.


Part 4: Pricing + Deployment Models

We support three deployment paths. The model determines scope and timeline. All pricing is owned by Brandwyn (CGO) and must be aligned with her before any quote goes out. This manual does not list dollar figures — pricing decisions are Brandwyn's call, working with Don. Jen is not customer-facing, so quoting is not part of her scope; she scopes the work, Brandwyn prices it.

Model A: Greenfield Build (Full SEO/GEO)

When: New business, no existing site, or existing site is beyond salvaging.

  • Full Astro 6 hybrid architecture on Cloudflare Pages
  • Complete dual-audience page design
  • Citation Magnet architecture included
  • 30-day launch sequence
  • Pricing: routed through Brandwyn

Model B: Migration (Existing → Astro)

When: Client has WordPress/Squarespace/Wix with valuable content but a platform that limits SEO/GEO.

  • Phased content migration to Astro
  • URL mapping + 301 redirect plan
  • Preserve existing SEO equity during transition
  • Citation Magnet added in Phase 2
  • Pricing: routed through Brandwyn

Model C: SEO/GEO Overlay (Keep Existing Platform)

When: Budget-conscious SMB. Most common entry point. Don's preferred starter.

  • Add JSON-LD to existing pages (works on any platform)
  • Publish /llms.txt and knowledge-base/index.json
  • Build Citation Magnet pages as subdirectory/subdomain
  • Add FAQ schema to existing service pages
  • Implement Speakable markup
  • Monthly GEO audits + optimization
  • Pricing: routed through Brandwyn

The SEO/GEO Audit as Sales Tool

Standalone GEO Audit (geo-analysis run) — Brandwyn sets the price.

  • Show prospect their current AI visibility score (likely low)
  • Show where competitors are being cited instead
  • Propose Overlay or Build to close the gap

Cost to Kaizen per audit: ~$0.30 in API calls. High-margin, demo-ready, urgency-creating. This is the lead-gen weapon — Brandwyn and Don decide what to charge for it.


Part 5: Off-Site GEO Tactics (The Zapier Playbook)

This is the part most SEO agencies still don't do. From the Zapier interview (Andrew Warner, March 2026) — these are the tactics that get Zapier mentioned millions of times by LLMs.

What's autonomous vs. what needs human judgment: Some of these can be agent-run end-to-end (Quill drafts Reddit replies, Black Widow does publisher outreach, Dr. Strange tracks citation movement). Others legitimately require Jen's manual touch — voice calibration, relationship judgment, ethical line-walking. Each tactic below is tagged.

Tactic 1: Reddit (Highest ROI for GEO)

Autonomy: HYBRID. Quill can draft answers and identify threads autonomously; Jen must approve every post before it goes live (Reddit accounts get banned fast for AI-detection; voice calibration is non-negotiable). House account kaizen_strategist (or per-client account where appropriate) — never use Don's personal handle.

  • LLMs heavily weight Reddit content because moderators vet answers over time
  • Older posts are more valuable — LLMs trust them more
  • Use a house account
  • Answer questions that customers are likely to type into LLMs
  • Ask questions that customers are likely to type into LLMs (then answer your own questions)
  • Don't obsess over upvotes — Zapier found little correlation between vote counts and GEO utility
  • It's a volume play — spread across hundreds of threads, not viral on one
  • Agent-run pieces: thread discovery (Quill scrapes target subreddits + scores threads by LLM-query likelihood), draft answers, schedule posts
  • Jen-run pieces: final voice review + post-publish moderation engagement

Tactic 2: YouTube

Autonomy: MOSTLY MANUAL. Video production, talent management, and creator partnerships require human judgment. Dr. Strange can support (script outlines from SME library, target-query mapping) but is not autonomous here.

  • Gemini uses YouTube heavily; other LLMs are influenced by it indirectly
  • Create your own videos, especially for B2B (less competition in B2B video)
  • Work with both big creators (polished, high-reach) AND small creators (outsized LLM influence per view)
  • Example from Warner's interview: a 835-view video earned 5.9% of the question citation share in its niche
  • Agent-run pieces: script outlines from SME library, target-query research, post-publish citation tracking
  • Jen-run pieces: filming/production coordination, creator outreach, contract negotiation

Tactic 3: Correct Outdated Articles

Autonomy: MOSTLY AUTONOMOUS. Black Widow can identify, draft, and send publisher outreach end-to-end. Jen approves the publisher list and reviews any high-stakes outreach (legacy press, sensitive verticals).

  • Older articles about the client continue to influence LLM responses long after they're outdated
  • Message publishers with outdated info; ask for updates
  • Publishers often comply because they want credible content
  • Track which 3rd-party articles AI is citing → systematically refresh them
  • Agent-run pieces: Black Widow identifies outdated articles via geo-analysis citation tracking, drafts publisher outreach, sends from a shared Kaizen outreach inbox, follows up on a cadence
  • Jen-run pieces: approve target list weekly, review any responses that need human judgment (corrections, disputes)

Tactic 4: Tools for Measuring LLM Citations

We stay 100% on internal tools for now (confirmed 2026-05-17). No third-party AI citation trackers needed.

  • geo-analysis (Dr. Strange's skill) — our internal source of truth. Multi-model testing across GPT-4o + Grok, citation classification, GEO Score, append-only history.
  • Ahrefs Brand Radar — has no public API as of 2026-05-17. Not wired up. Revisit if Ahrefs ships a /brand-radar/* v3 route.
  • Profound / Petra Labs / Amplitude — not in scope. Existing tools cover the need.

Part 6: Measurement + Monthly Reporting

Every client gets a monthly SEO/GEO report. Plain language, never jargon (see Part 8 for the translation guide).

Success Metrics (Day 30 build target + 90-day compound target)

The 30-day build delivers a measurable Day 30 lift, but compound results land in the 60-90 day window as Google re-indexes and AI crawlers re-cite. Targets below are split accordingly — set client expectations against the right column.

Metric Source Day 30 Target 90-Day Target
GEO Score geo-analysis Baseline + 5 pts Baseline + 15 pts
Citations (% queries where AI cites client) geo-analysis 10%+ 30%+
References (% queries where client is named) geo-analysis 25%+ 50%+
SERP Presence (top-10 organic rankings) ahrefs-intelrank_tracker 20%+ of target keywords 60%+ of target keywords
Organic Traffic Google Analytics +5-10% vs. baseline +25% vs. baseline
Lead Attribution CRM / intake form Tracking live by Day 30 "How did you find us?" = "AI assistant" tracked

Monthly Report Template

MONTHLY SEO/GEO REPORT — [Client Name]
Month: [Month Year]

YOUR AI VISIBILITY SCORE: [X]/100 (↑/↓ from last month)

WHEN CUSTOMERS ASK AI ABOUT YOUR INDUSTRY:
- [X] of [Y] questions: AI recommends YOUR business ✅
- [X] of [Y] questions: AI mentions you but not first 🔶
- [X] of [Y] questions: AI doesn't mention you yet 🔴

TOP WINS THIS MONTH:
- "[Query]" — ChatGPT now cites [Client] (was absent last month)
- "[Query]" — moved from mention to direct recommendation

PRIORITY TARGETS NEXT MONTH:
- "[Query]" — competitor [X] is being cited; here's our plan to earn that spot

TRADITIONAL SEO:
- Google rankings: [X] keywords in top 10 (was Y last month)
- Website traffic: [X] visits (+Y% vs last month)
- Leads from website: [X] ([Y] mentioned finding you through AI)

THIS MONTH'S WORK:
- [List of pages updated, schemas added, content shipped]

NEXT MONTH'S PRIORITIES:
- [List of upcoming work, tied to specific GEO Score gaps]

The Revenue Attribution Reality Check

Honest caveat to know going in: the chain from "AI cited the client" → "customer walked in the door" is imperfect. Most AI-driven visits don't carry UTM parameters. Your three best proxies:

  1. The "How did you find us?" intake form field (include "AI assistant: ChatGPT/Perplexity/Claude/etc.")
  2. GEO Score trends (rising = more AI mentions = more downstream brand recall)
  3. Anecdotal feedback ("a customer told me ChatGPT recommended you")

Don't oversell attribution precision to clients. The honest pitch: "AI is increasingly how your customers research. Our job is to make sure when they ask, you're the answer. Here's how we measure that."


Part 7: The Tips Layer (Stuff Most Manuals Miss)

These are the small, high-leverage details that separate good SEO work from great SEO work. Pulled from Keep.md, recent industry sources, and lessons learned.

Technical: The 2MB Rule (Charles Floate, March 2026)

  • Googlebot fetches only the first 2MB of HTML. Everything after that = invisible to Google.
  • Place these AS HIGH IN THE DOCUMENT AS POSSIBLE: <title>, <meta> tags, canonicals, all structured data (JSON-LD).
  • External CSS/JS files get their own 2MB limit per file.
  • PDFs get 64MB. Use PDFs for resource downloads that need to be indexed.
  • WRS (Web Rendering Service) is stateless — it clears localStorage + session data between requests. If your content depends on session state to render, Google can't see it.

Technical: Templated Website Builders Quietly Block AI Crawlers

This isn't a Shopify-only problem. All the major templated site builders — Shopify, Wix, Squarespace, GoDaddy, Webflow's default templates, and most "AI website builder" platforms — ship with robots.txt configs that block GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, and CCBot by default. The platforms made this change quietly across 2025-2026 in response to publisher pressure on AI training data.

The diagnostic: before any other work on a new client, fetch https://[client-domain]/robots.txt and grep for GPTBot, ClaudeBot, PerplexityBot, anthropic-ai, CCBot, ChatGPT-User. If any are Disallow'd, the client's GEO ceiling is artificially capped — AI engines literally cannot read their site. Fix this first or the rest of the work compounds at zero.

Platform-specific fixes:

  • Shopify: Online Store → Themes → Edit code → Add robots.txt.liquid to templates folder → Remove AI bot disallow blocks OR explicitly Allow: / for AI user agents → Keep /admin, /cart, /checkout, /account disallowed
  • Wix: SEO Tools → robots.txt Editor → Override default with explicit Allow rules for AI bots. Wix only exposes this on Premium plans — if the client is on a free plan, upgrade is part of the engagement.
  • Squarespace: Settings → Advanced → URL Mappings won't do it; Squarespace doesn't expose robots.txt directly. Workaround: add <meta name="robots" content="all"> and explicit AI-bot meta directives (<meta name="GPTBot" content="index, follow"> etc.) to every page via Code Injection in Site-Wide Header.
  • GoDaddy Website Builder: Settings → SEO → Search Engine Visibility. The setting is binary (allow/block all crawlers) — confirm "allow." For more granular control, GoDaddy users typically need to migrate off the builder.
  • Webflow: Project Settings → SEO → robots.txt → manually override the default block list. Webflow gives full control here, so it's a 2-minute fix.
  • WordPress (any host): Edit robots.txt directly or via Yoast / Rank Math plugin. Most managed WordPress hosts (WP Engine, Kinsta, Flywheel) ship clean defaults but verify per client.

The escalation rule: if the client is on a platform where unblocking requires migration (most aggressive: GoDaddy Website Builder, some Wix free plans), that becomes a deployment-model decision (Model B migration to Astro). Don't try to optimize AI visibility on a platform that's structurally blocking it.

Google's Own Guidance on AI Optimization (Source: Google Search Central, updated 2026-05-15)

Google published an official "Optimizing for Generative AI Features on Google Search" guide (developers.google.com/search/docs/fundamentals/ai-optimization-guide). Read it directly when in doubt; it is the canonical Google position. Highlights:

How Google's AI features actually retrieve content

Google's AI Overviews + AI Mode are built on two named mechanisms — both worth understanding because they shape what content gets surfaced:

  1. Retrieval-Augmented Generation (RAG) / grounding — Google retrieves relevant, up-to-date pages from its Search index, then uses those pages to generate the AI response with clickable links back. Implication: indexability + freshness + ranking are still the gate. If you don't rank, you don't get retrieved, you don't get cited.

  2. Query Fan-Out — Google's AI model generates a set of related concurrent queries from the user's original prompt and fetches results for all of them, then synthesizes. Example: "how to fix a lawn that's full of weeds" fans out to "best herbicides for lawns," "remove weeds without chemicals," "how to prevent weeds." Implication: covering the adjacent intent space for a topic (not just the head term) is the leverage point. Comprehensive single-page resources that satisfy the fan-out cluster outperform thin pages on the head term alone. This is the mechanical reason keyword_questions (real PAA data) seeds our FAQ work — it maps the fan-out space.

What Google explicitly says NOT to do
  • ❌ Don't create special "machine-readable files" (including /llms.txt) just for Google AI features
  • ❌ Don't "chunk" content into tiny pieces for AI
  • ❌ Don't rewrite content specifically for AI systems (it understands synonyms + meaning)
  • ❌ Don't pursue inauthentic "mentions" across the web for AI ranking purposes
  • ❌ Don't create excessive long-tail pages targeting fan-out query variations — this violates the scaled content abuse spam policy
What Google says TO do
  • ✅ Create unique content with a distinct perspective (first-hand reviews, expertise-driven, original POV)
  • ✅ Non-commodity content > commodity content (a piece on "Why We Waived the Inspection & Saved Money: A Look Inside the Sewer Line" beats "7 Tips for First-Time Homebuyers" every time)
  • ✅ People-first content that satisfies user needs
  • ✅ Meet Search technical requirements (indexable + eligible for snippet)
  • ✅ Standard semantic HTML, proper JavaScript handling, crawlable
  • ✅ Good page experience across all devices
  • ✅ Merchant Center + Google Business Profiles for product + local data
  • ✅ Verify your site in Search Console (mandatory for diagnosis)
  • ✅ For very large + frequently updated sites, manage crawl budget
Where Google's position contradicts other "GEO" advice — and what to do about it

This is critical to internalize. Google's guidance specifically pushes back on three things that other GEO sources (Kalicube, Zapier, denohawari) actively recommend:

Tactic Google's position Other AI platforms (ChatGPT / Perplexity / Claude)
/llms.txt Not needed Some evidence it helps (anecdotal, no platform-confirmed)
Schema markup (JSON-LD) Not required for AI search; keep for rich results only Strong evidence it helps entity disambiguation (Kalicube gate #4)
Chunking / Claim-Frame-Prove structure Not required Strong evidence it improves citation extractability

Our position: We continue all three because Kaizen's product is multi-platform AI visibility, not Google-AI-only. ChatGPT, Perplexity, Claude, Copilot, and Gemini each have different retrieval mechanisms — what helps on one may not help on another. Google saying "we don't need X" is not the same as "X doesn't help anywhere."

What changes for client conversations:

  • Don't oversell /llms.txt, schema, or chunking as "Google AI Overview levers." They aren't (per Google).
  • DO sell them as multi-platform AI visibility levers with measured impact via geo-analysis (which tests across GPT-4o + Grok + others, not just Google).
  • When a client asks "but Google says you don't need this" — agree on Google specifically, point them at the geo-analysis score lift across non-Google platforms.
The synthesis

GEO is not about anti-SEO tactics. It's about doing SEO better — with more uniqueness, more first-hand expertise, more declarative structure, more entity clarity. The same content that wins in Google AI Overviews wins in Google traditional search. The work we do additionally for non-Google AI platforms (schema, /llms.txt, Claim-Frame-Prove structure) is measured separately and justified by multi-platform geo-analysis lift, not by Google AI claims.

Agentic Experiences + Universal Commerce Protocol (UCP) — Forward-Looking

New emerging frontier flagged by Google in May 2026: AI agents that act on users' behalf — booking reservations, comparing product specs, completing transactions. These browser agents access your site by:

  • Analyzing visual renderings (screenshots)
  • Inspecting the DOM structure
  • Interpreting the accessibility tree

Implications for Kaizen client work:

  • Accessibility tree matters more than ever — agents parse it. Don't skip ARIA labels, semantic HTML, alt text.
  • DOM structure quality affects agent comprehension. Bloated component trees and aria-hidden chaos hurts.
  • Screenshots — visual hierarchy, contrast, color-coding — agents see what users see. Don't rely on hover states or invisible text.

Universal Commerce Protocol (UCP) at ucp.dev is the emerging open standard for AI agents transacting on websites. Watch this; it'll be a Pillar A schema-equivalent for transactional sites within 12 months.

For now: Google's pointer is to "agent-friendly website best practices" at web.dev. Add this to the monthly retainer scope for any client with transactional pages (ecommerce, booking, lead-gen) starting in 2026 Q3.

Brandwyn sales angle ([For Brandwyn]): When a prospect asks "what about agents?" or "where is this heading?", UCP is the answer. Position Kaizen as the firm that's already preparing client sites for the agent era — schema, accessibility, DOM hygiene, all set up to make agents able to transact on the client's behalf, not just read. This is a 12-month-out moat that Google itself is telling us to build.

Voice: The Most Common Mistake

When writing client copy, the failure mode is writing in Don's voice or Sebastian's voice instead of the client's voice. Always pressure-test:

  • Does this sound like the client? Or like a consultant?
  • Is the formality calibrated? (Law firm ≠ SaaS startup ≠ medical practice ≠ tea shop)
  • Are industry-specific terms used correctly?
  • Would the client's existing site feel consistent with this?

If client_voice is vague when you brief Dr. Strange, stop and clarify with Don before generating. Generic voice is worse than no voice.

The Kalicube "First-Failing Gate" Rule (Jason Barnard, April 2026)

The AI engine pipeline has 10 gates between "discovered" and "won." Confidence passes multiplicatively — 90% at each of 10 gates = 35% at the end. A single weak gate destroys everything downstream.

The rule: Locate the earliest gate where confidence drops below threshold and fix THAT one first. Investing in citation quality when pages don't render properly = waste. Investing in third-party corroboration when entity isn't disambiguated = waste.

Common first-failing gates:

  1. Rendering — pages don't load / JS fails / 2MB cutoff hits
  2. Indexability — robots.txt or meta noindex blocks
  3. Annotation — schema is missing or broken
  4. Entity disambiguation — Google/LLMs can't tell which "Smith Law" we're talking about (fix with sameAs links to Wikipedia, Wikidata, LinkedIn, GBP)
  5. Extraction — content is written discursively, not in Claim-Frame-Prove blocks
  6. Citation — no statistics, no dated sources, no authoritative anchors

Always diagnose before you act. Your first GEO Analysis run tells you the symptom; your seo-audit + schema audit tells you which gate is failing.

The "Hidden Salesforce" Frame (For Client Conversations)

When selling GEO to skeptical clients, the most effective frame (per Kalicube):

"Seven AI platforms — Google, ChatGPT, Perplexity, Claude, Copilot, Siri, Alexa — are working 24/7 either for you or for your competitors. Untrained, they default to whichever competitor has the best-corroborated content. Trained, they become the most scalable sales channel a business has ever had. The question isn't whether AI is recommending businesses in your space. It's whose business it's recommending."

CEOs understand the salesforce argument in ~8 seconds. Use it.

Content Drift vs. Corroboration Decay (Governance)

Most agencies monitor content drift (your own content going stale). They miss corroboration decay — 3rd-party references that supported your credibility get rewritten, drop off, or get superseded by competitor-favorable articles. Your site hasn't changed; the evidence base under you has.

Add to monthly retainer scope:

  • Track full brand footprint, not just owned site
  • Refresh third-party content on a deliberate cadence (quarterly minimum)
  • Set Google Alerts for client name + key terms; flag any new article that contradicts or supersedes our positioning
  • Reach out to publishers proactively (correction requests, refresh offers)

Decision Pages > Top-of-Funnel Content

From @denohawari's $30M case study:

"Don't write 'what is HR software?' — that's broad keyword content where giants always win. Write '[competitor] vs [your brand]', 'alternatives to [competitor]', 'best [tool] for [specific use case]'. These pages directly match the questions buyers ask AI. AI rewards specificity, not authority age."

For every client, build at minimum:

  • 1 [client] vs [top competitor] page
  • 2-3 best [service] for [specific customer profile] pages
  • 1 alternatives to [competitor] page (if the competitor is gettable)

The Karpathy "Untrained Salesforce" Insight (Operating Note)

Some of the most important content on a well-built website is rarely visited by humans, because it exists to teach the AI what the brand is. The seven AI platforms read that content to form their understanding of the client even when no human ever clicks the page.

Don't prune low-traffic pages purely on analytics. If the page exists to feed the entity understanding (about page, technical service definitions, methodology pages), it earns its place even with zero human visits. AI eats it.


Part 8: Client-Facing Translation Guide

Never lead with technical jargon in sales conversations. Use plain English.

What We Call It (Internal) What the Client Hears
SEO/GEO Optimization "We make sure your business shows up when people search Google AND when they ask AI assistants like ChatGPT for recommendations"
Astro Islands architecture "Your site loads in under 2 seconds on any device — faster than 95% of your competitors"
Dual-audience page design "Your website works for both Google and AI assistants — most sites only work for one"
⅓/⅔ SEO/GEO-AEO split "We spend most of our effort making sure AI recommends YOUR business, not just getting you to page 1 of Google — because that's where your customers are heading"
Citation Magnet "We build a section of your site that makes you the source AI recommends when customers ask questions in your industry"
/llms.txt "A file that tells AI systems what your business does and why you're the expert — like a business card for ChatGPT"
JSON-LD structured data "Hidden code that helps Google and AI understand exactly what services you offer, where you're located, and why you're qualified"
Speakable schema "We mark the most important parts of your pages so voice assistants like Siri and Alexa can read them to customers"
GEO Analysis / AI citation audit "We check whether AI assistants are recommending your business — and show you exactly which questions you're winning and losing"
GEO Score "Your AI Visibility Score — a simple number showing how often AI recommends you vs. your competitors"
E-E-A-T signals "Proof that you're a real expert — credentials, experience, reviews — the stuff that makes both Google and AI trust you"
FAQPage schema "We format your FAQ so it can appear directly in Google search results and AI answers — not just on your website"
Decision pages "Pages that win the moment a buyer is choosing between options — like '[competitor] vs us' or 'best X for [your situation]'"
Topic clusters "A web of connected pages that makes AI see you as THE expert on your topic, not just someone who wrote one article about it"
Claim-Frame-Prove structure "We write your content so AI can lift our answers directly into ChatGPT's response — that's how citations happen"

Part 9: Workflow Quick Reference

Where to Post What — The Per-Client Channel Model

Every Kaizen client gets their own dedicated Discord channel. That channel is where all of that client's audits, GEO analyses, schemas, copy, roadmaps, and deliverables live. Don is in every client channel. Jen and Dr. Strange (and whichever colony bots are needed) are in every client channel. The client is not in the channel — it's internal-only.

The old shared #seo-geo, #website-builds, and #projects channels still exist, but they are not where formal client work happens. #seo-geo specifically is for miscellaneous SEO/GEO inquiries and potential client/prospect inquiries that surface before someone is an official Kaizen client — sales-cycle stuff, exploratory chatter, lightweight demos. Don is not going to review Dr. Strange's deliverables there.

The moment a prospect signs and becomes a client, Sebastian provisions a dedicated #client-[slug] channel and every Dr. Strange invocation, audit, copy draft, schema, and roadmap for that client lands there. Don reviews and signs off in the client channel. Nowhere else.

What Channel ID
All formal client work — every Dr. Strange invocation + review + sign-off #client-[client-slug] (per-client) provisioned at onboarding
Questions or tags for Don on a specific client the relevant #client-[client-slug] channel
Cross-client decisions that need Don's input #war-room 1487891856791703724
Miscellaneous SEO/GEO inquiries + potential client/prospect inquiries #seo-geo 1487707978860593375
Pre-conversion website build experiments / demos #website-builds 1487707980857217154
Pre-conversion deliverable previews for prospects #projects 1482998914419261542

Channel provisioning: when a prospect converts to a client, Sebastian creates the #client-[slug] channel, invites Don + Jen + the colony bots, and drops the client's intake form link as a pinned message. The old prospect-channel threads stay where they are (historical record); new work happens in the dedicated channel.

Tagging Don: any question or request for Don about a specific client goes in that client's channel, period. Don does not want one-off questions sent to #war-room or any shared channel. The only thing that goes in #war-room is a cross-client decision — a strategy call that touches multiple clients at once, a pricing-policy question, a process change. If it's about one client, it goes in that client's channel.

The Quality Gate (Non-Negotiable)

Every Dr. Strange output is reviewed before client delivery. No exceptions.

The review happens in the client's dedicated channel — not in a shared review channel. Workflow:

  1. Dr. Strange (or whichever bot owns the deliverable) posts the output to the client's channel
  2. Jen reviews first — voice, vertical accuracy, completeness
  3. Jen tags Don in-channel for final sign-off (@Don ready for review — [deliverable type])
  4. Don signs off (or red-lines) in-channel
  5. Sebastian publishes the client-facing version (HTML link, email, etc.) only after Don's sign-off

This includes:

  • Audit reports
  • GEO Analysis reports
  • Generated copy
  • Generated schemas
  • Proposed roadmaps

Why: this is how we catch voice misfires, factual errors, and over-promising before they hit the client. Dr. Strange is fast but not perfect. The review gate is what makes the system trustworthy. Tagging Don in the client channel (vs. a separate review channel) keeps the full audit trail in one place — Don can scroll the channel and see every step.

Daily / Weekly / Monthly Rhythm

Daily:

  • Sweep each active client's dedicated channel for any Dr. Strange runs awaiting your review (this is where formal client work lives)
  • Check #seo-geo for any miscellaneous SEO/GEO inquiries or prospect questions worth surfacing to Brandwyn/Don
  • Track active client tasks in shared task queue

Weekly:

  • Run ahrefs-intel baseline refresh on each active retainer client (Dr. Strange does it; you just review issues)
  • Review rank_tracker movement vs. prior run (Dr. Strange auto-diffs and surfaces gainers/losers)
  • Brief Dr. Strange on the next batch of pages/schemas/audits in each client channel

Monthly:

  • Re-run geo-analysis for every retainer client
  • Generate monthly SEO/GEO report per client (template in Part 6)
  • Update historical trend data
  • Review 3rd-party corroboration decay (Google Alerts queue)

Appendix A: Source Material

This manual was synthesized from:

Internal docs:

  • kaizen/sageo-optimization-template-v1.1.md — internal template (Jarvis → Sebastian, April 2026)
  • kaizen-colony/skills/dr-strange/seo-audit/SKILL.md
  • kaizen-colony/skills/dr-strange/geo-analysis/SKILL.md
  • kaizen-colony/skills/dr-strange/seo-content-writer/SKILL.md
  • kaizen-colony/skills/dr-strange/schema-markup/SKILL.md
  • kaizen-colony/bot-core-files/ATLANTIS-CHANNEL-MAP.md

External (via Keep.md):

  • Google Search Central, "Optimizing for Generative AI Features on Google Search" (canonical; updated 2026-05-15) — RAG + Query Fan-Out mechanisms, /llms.txt position, schema position, agentic experiences + UCP
  • web.dev: Agent-friendly website best practices — Google's pointer for preparing sites for browser agents
  • Universal Commerce Protocol (UCP) — emerging standard for AI agent transactions
  • Jason Barnard / Kalicube, "Extending IBM's GEO Playbook: 18-Component Operational Framework" (April 2026)
  • a16z, "How Generative Engine Optimization (GEO) Rewrites the Rules of Search" (May 2025)
  • Andrew Warner + Zapier, "GEO Playbook: How LLMs Decide What to Recommend" (March 2026)
  • @denohawari, "How We Drove $30.52M For Clients With LLM SEO" (March 2026)
  • @bloggersarvesh, "My Chief of SEO, Claude Cowork" + 7 Claude SEO Prompts (March 2026)
  • Charles Floate, "Googlebot's 2MB HTML Cutoff" (March 2026)
  • IBM GEO Playbook via Search Engine Land (April 2026)
  • Ahrefs, "How AI Search Drives Traffic + Conversions" (2026)

Peer-reviewed:

  • Aggarwal et al., "GEO: Generative Engine Optimization" (KDD 2024) — arxiv.org/abs/2311.09735

Manual v1.1 by Don Ho + Sebastian 🦀 for Jen Villadolid + Brandwyn Boyle. Approved 2026-05-17.


Changelog

v1.1 — 2026-05-17

  • Audience expanded to Jen + Brandwyn. Brandwyn needs the workflow vocabulary to sell the product confidently; the manual now serves both readers with [For Jen] and [For Brandwyn] tagging where relevant.
  • "SAGEO" terminology retired. It was shorthand, not a real product name. Replaced everywhere with "SEO/GEO" — the actual product reference. Same framework; cleaner naming.
  • Part 2 extracted to a standalone checklist. The standard engagement flow now lives as its own document (kaizen/training-manuals/seo-geo-client-checklist-v1.md) so Jen can manually check off each step per client. The Part 2 section in this manual now points to that checklist.
  • ahrefs-intel reframed as Dr. Strange's Skill 5 (lives in the shared colony skills dir so other bots can call it, but Dr. Strange is the primary owner).
  • 90-day Pillar C condensed to 30-day standard with parallel agent streams (Dr. Strange / Quill / Black Widow / Pepper running concurrently).
  • Schema re-architecture: Dr. Strange does the work, Jen reviews (was previously framed as Jen doing it manually).
  • Part 5 Zapier playbook: every tactic now tagged AUTONOMOUS / HYBRID / MOSTLY MANUAL so we know what runs end-to-end vs. what needs Jen's manual judgment.
  • Part 7: AI crawler blocks generalized from Shopify-only to all templated builders (Shopify, Wix, Squarespace, GoDaddy, Webflow, WordPress) with platform-specific fix steps.
  • Part 9: per-client channel model — every Kaizen client gets their own Discord channel where all formal Dr. Strange work + review + sign-off happens. #seo-geo is for miscellaneous + prospect inquiries only (no formal client deliverables land there). #website-builds / #projects are pre-conversion only. Quality gate = tag Don in the client channel, not in a shared review channel.
  • Channel routing simplified: #the-cabinet row removed from the Part 9 table (it's Sebastian-only bot-coordination, not Jen's workflow). Tagging Don for client-specific questions = that client's channel, always. Cross-client decisions = #war-room (1487891856791703724).
  • Phase 1 deliverable: internal MD + client-facing HTML link. Sebastian builds a per-client Google Form for intake; answers flow into a Google Sheet that Dr. Strange reads on audit. Stays on Google Workspace (no Typeform).
  • client_voice variable expanded to include the client's existing web copy and blog articles as a calibration anchor. Dr. Strange now reads the existing site as input to voice matching, not just the verbal description.
  • Citation Magnet pilot client: BDH Consultants (decided 2026-05-17).
  • Pricing routed through Brandwyn. No dollar figures in the manual — Brandwyn owns pricing decisions working with Don.
  • Active client targets named: rideoutlaw.com, BDH Consultants, Platform, Fleet Intelligence, Chaiagra (plus others as bandwidth allows).
  • Part 10 removed. All open questions resolved inline; the section was no longer useful.
  • rank_tracker action shipped in ahrefs-intel v1.1; Brand Radar confirmed to have no public API as of 2026-05-17.
  • Co-author added: Don Ho (alongside Sebastian).
  • Part 7 expanded with Google's official AI-search guidance (May 2026 update): RAG + Query Fan-Out mechanisms named explicitly. Acknowledged that Google specifically pushes back on /llms.txt, chunking, and schema-as-AI-lever — and reconciled by reframing those as multi-platform AI visibility levers (justified by geo-analysis lift across non-Google platforms), not Google-AI-Overview levers. Added new subsection on Agentic Experiences + Universal Commerce Protocol (UCP) as the 12-month frontier.

v1.0 — 2026-05-17 (superseded)

Initial draft. Single-author (Sebastian). Status DRAFT. Replaced same-day by v1.1 after Don's red-line pass.