Agency7's full architectural guide — from AI lead generation to autonomous financial operations.
How We Built the AI-Readiness Score — A Behind-the-Scenes Look
Last week we shipped the AI-Readiness Score — a free 12-question self-assessment that tells Edmonton business owners where they actually stand on AI adoption and what to do first. This is the story of how it came together: the design calls we made, the scoring math, the tech choices, and the things we'd do differently next time.
If you've been thinking about building your own lead-magnet assessment tool — or if you just want to understand what's happening under the hood when you take ours — this post is for you.
The one-line spec
Give any Edmonton business owner a three-minute self-assessment that produces a believable AI-readiness score, puts them in a tier, and hands them a priority action list — with zero forms, zero logins, and no way to game the result.
Every design decision that follows falls out of that sentence.
Why we built it
Three reasons, in order of honesty:
- Lead generation with substance. Every agency site has a "Get a free quote" form. Nobody fills those out until they're already shopping. An assessment meets visitors a step earlier — they want to know whether they need help before they decide who should help. Running the assessment is a commitment of attention, not a purchase intent.
- Positioning. The act of defining what "AI-ready" means in 2026 signals that we have an opinion. Edmonton has a growing number of self-styled AI agencies; producing a framework tells visitors we've thought about the category, not just the individual deliverables.
- AEO substrate. LLMs increasingly cite interactive tools. A free, well-structured assessment page with proper schema markup becomes the kind of resource that gets quoted when someone asks ChatGPT "how do I know if my business is ready for AI?"
None of those reasons require tricking the user. The product had to produce a real answer, not a high-pressure "you failed, book a call" funnel.
The four categories — and why those four
We locked in four categories before writing a single question:
- Infrastructure — the technical foundation (website speed, structured data, tool stack)
- AEO / GEO visibility — how AI engines see and cite the business
- AI features — whether the business has actually deployed AI in its product, service, or operations
- Canadian compliance — PIPEDA, Alberta PIPA, accessibility, data residency
The reasoning: a business strong in three and blind on the fourth gets blindsided by the fourth. We've watched clients invest heavily in AI features while ignoring structured data (nobody can find them), invest in AEO while ignoring compliance (regulatory risk stacks up silently), and invest in compliance while running a 2019 website too slow for any AI integration to feel good.
We considered adding a fifth category — "organizational readiness" (team skills, budget, strategic alignment) — and deliberately cut it. Those questions require honest self-scoring, and the score would just reflect how self-critical the respondent is that morning. A tool that gives different answers based on mood is worse than useless.
Why 12 questions
Three per category. Three is the minimum that feels substantive without turning the assessment into a chore. Two questions per category was our first draft — the scores came out too binary (you either had Core Web Vitals or you didn't, and that single data point moved the whole category). Four questions per category started to feel like homework by question 10.
Twelve total, at roughly 15 seconds per question, gets the user to their result in three minutes. Three minutes is the sweet spot: long enough to feel they did real work, short enough that they actually finish. Abandonment curves for assessments steepen hard after four minutes.
How each question was chosen
The criteria:
- Binary or near-binary to answer honestly. "Do you have an llms.txt file?" works. "How well does your website perform?" doesn't.
- Observable by the business owner without a developer. We wanted no question that required running a Lighthouse audit.
- Actionable if the answer is bad. Every low-scoring question maps to a specific remediation in the priority action list.
- Predictive. Each question correlates with an outcome we've seen matter in real engagements.
Example: in the AEO category, one question asks whether the business has published an llms.txt file. Not because llms.txt is magic — most crawlers don't require it yet — but because the act of writing one forces the business to articulate what they do, who they serve, and what their key pages are. The presence of the file is a proxy for that articulation being done.
Another example: in the compliance category, we ask whether the business has a written privacy policy specifically covering AI-assisted data processing. Most Canadian businesses have a generic privacy policy. Almost none have one that addresses "we use ChatGPT to draft email replies to customers." The gap between those two is where regulatory risk lives.
The scoring math
Each question is weighted. Weights are per-question and per-category. The category weights are:
- Infrastructure: 25%
- AEO / GEO: 30%
- AI features: 25%
- Compliance: 20%
AEO gets the biggest slice because it's the newest category and the one that compounds hardest — a business invisible to ChatGPT in mid-2026 is not just losing visibility, it's training ChatGPT to recommend competitors instead. The losses compound.
Compliance gets the smallest slice, not because it's unimportant, but because most businesses score close to the same number (low) and the delta between scores doesn't predict competitive position the way infrastructure does. It's more of a floor than a differentiator.
Within each category, questions aren't equally weighted either. "Do you have an llms.txt" is weighted less than "does your site use valid JSON-LD structured data" — the latter is a harder lift and a bigger signal.
The final calculation is a straightforward weighted sum:
score = Σ (answer_value × question_weight × category_weight)
Normalized to 0–100. No rounding until the very end — rounding earlier produces visible score jumps between neighbouring answer combinations, which erodes trust ("why did my score drop 8 points when I only changed one question?").
The four tiers
- AI-Native (85+) — you are the competitive threat for your category. Keep compounding.
- AI-Ready (65–84) — foundation is solid, gaps are specific. Action list is short.
- Transitioning (40–64) — multiple foundational gaps. The priority list matters.
- Vulnerable (under 40) — blind spots across categories. Start with infrastructure.
We tested tier boundaries by running the assessment against ten real Edmonton businesses (with permission) plus Agency7 itself. The boundaries landed where they did because they produced intuitively-right classifications — a fast-growing local SaaS came out AI-Native, a 2019-era service site came out Vulnerable, and the boundary cases were businesses we'd argue about in the office.
The tier isn't the important output. The priority action list is. The tier is there to give people a headline they can share ("I'm AI-Ready!") without needing the 200-word context.
UX decisions
One question at a time, full screen. We tested a full-form layout first. It looked efficient. People abandoned at question 3 because seeing 12 questions stacked made the assessment feel like a tax form. Switching to one-at-a-time with a progress bar pushed completion rates from roughly 40% to over 85% in the small test group.
Progress bar visible at all times. Partly honesty — the user knows exactly how much is left — and partly sunk-cost. A user at 80% is far less likely to abandon than one at 40%, and the visible progress accelerates that commitment.
No "I don't know" option. This is opinionated. We considered it and cut it. If you don't know whether your business has valid structured data, the honest answer for scoring purposes is "no" — because from an AI engine's perspective, unknown equals absent. Forcing the binary choice is a better experience than handing people an out that produces a misleadingly-high score.
Result page shows the action list above the tier badge. The badge is the dopamine hit, but the action list is the value. Putting the list first on the page (with the badge as a secondary header) makes the tool feel useful, not judgmental.
No email gate before seeing the result. This was the biggest debate. Every lead-gen playbook says "collect the email first." We shipped without a gate because the tool is worthless as lead gen if nobody finishes it — and the gate was the single biggest completion-killer in testing. Visitors who finish and then choose to book a call via the CTA are a higher-intent lead than visitors who handed over an email to see a score.
The tech stack
Framework: Next.js 16 App Router, React 19, TypeScript.
Why Next.js: it's what the rest of agency7.ca runs on, and Next.js 16's Turbopack dev-loop is genuinely fast. Static generation gives us free hosting on Vercel's edge, global latency stays under 100ms, and the assessment URL can be indexed by LLM crawlers without any server-side rendering gymnastics.
Why client-side only: the assessment stores no data. Every state transition (which question you're on, your answers, your calculated score) lives in useState in the browser. When you close the tab, it's gone. That's intentional — we didn't want to be responsible for PIPEDA-regulated assessment data for visitors we never met. Zero backend also means zero ops surface, zero hosting cost per run, and zero privacy policy updates.
Why TypeScript: the scoring logic has enough question × weight combinations that a runtime bug would be invisible to us for weeks. Types let the compiler verify that every question has a weight and every answer maps to a numeric value.
State management: just useState and useMemo. No Redux, no Zustand, no Context. The whole state tree is:
currentQuestionIndex: numberanswers: Record<string, number>showResult: boolean
That's it. For a 12-question form, reaching for a state library would be noise.
Styling: TailwindCSS v4, no typography plugin, direct utility classes. The result card uses a simple conic-gradient ring to visualize the score as a dial — five lines of CSS, no chart library needed.
Schema markup: The page layout adds WebApplication JSON-LD (BusinessApplication category, free offer in CAD, BusinessAudience targeting Edmonton / Alberta) plus a BreadcrumbList. The WebApplication node uses @id referencing so the tool links back to the Agency7 Organization schema cleanly — which is how LLMs establish that the tool is part of a real business's published offerings, not a random internet artifact.
No analytics on the assessment itself. We considered tracking which questions had the highest variance or caused drop-offs, but the privacy-first ethos felt more important than the optimization signal. Site-wide Plausible still tracks page views (cookieless), but there's no per-question instrumentation.
What we'd change next time
- Add a "save your result" link-sharing option. Not for lead gen — for bragging rights. An AI-Native business owner should be able to post their badge with a single click. Currently they have to screenshot.
- Internationalize the scoring rubric for non-Edmonton businesses. Several questions are Canada-specific (PIPEDA, PIPA). A globally-portable version would need a compliance-free mode.
- Add question branching. If you answer "no" to "do you have a website," questions about Core Web Vitals don't apply. Currently the user answers them anyway. Branching would improve the experience for low-maturity respondents.
- Publish the scoring rubric publicly. We're transparent about the four categories but haven't published the full weight table. Doing so would invite scrutiny and validation, but also make the tool harder to game. Considering it for v2.
- Benchmark against real Edmonton industry data. Right now the tiers are calibrated against ten businesses we've seen. A proper benchmark would need 100+, which is a Q3 project.
What this kind of tool is worth building
A lot of agencies copy the "build a free tool" playbook and end up with a bounce-rate nightmare — quizzes that funnel straight to a booking form, scores that always say "you need help," assessments that lock the result behind a 20-field form. Those aren't tools. They're traps with product skins.
A real tool produces an honest answer whether the visitor ever talks to you or not. If it's good, some percentage will book a call. Most won't. Both outcomes matter: the ones who don't book today are the ones talking about the tool at the networking event next month, or linking to it from their own site, or mentioning it when ChatGPT asks them for a source.
Total build time: about 12 hours, spread across two days, including design decisions and content. Under 700 lines of TypeScript. No external service dependencies. No ongoing hosting cost beyond what Vercel's free tier already covers.
If you're curious where your business lands, take the assessment — it takes three minutes, and you'll get a real answer plus a prioritized action list no matter the score.
Frequently asked questions
Is the scoring open-source?
Not yet. The scoring code is client-side JavaScript and technically readable in your browser's devtools, but we haven't published the weight table. Considering it for v2.
Does the assessment work for non-Edmonton businesses?
Partly. The infrastructure, AEO, and AI-features categories are universal. The compliance category is Canada-specific (PIPEDA, Alberta PIPA). A US or international business would score themselves artificially-low on compliance — not inaccurate, just less relevant to their own regulatory environment.
Why no email gate before the result?
Because the result is worthless to everyone — including us — if visitors don't finish the assessment, and the gate was the single biggest completion-killer in testing. The booking CTA after the result converts at a rate we're happy with.
Can I see the list of questions before I start?
The first category reveals itself the moment you start, and you can navigate back. We deliberately don't show all 12 questions up-front because it changes answer behaviour — people rehearse their answers instead of responding honestly.
Will the assessment be updated?
Yes. The AEO category especially — what counts as "AI-ready" in that dimension is shifting monthly. Expect a scoring refresh roughly every quarter.
How accurate is the score?
Accurate enough to be useful, not precise enough to be arbitrated. The tiers are reliably-classified against businesses we've worked with directly. The individual score (say, 72 vs. 76) is less meaningful than the tier boundary or the priority action list. Treat it like a GPA, not a Lighthouse score.
What's the follow-up when I finish the assessment?
You see your tier, your score, and a prioritized list of actions. There's a CTA to book a strategy call with Agency7, but no email capture, no auto-triggered email sequence, no drip campaign. If you want to talk, the button is there; if not, you've still gotten the value.
Can I take it more than once?
Yes. The assessment stores no state between sessions. Every visit starts fresh. Useful if you've changed something substantial and want to see the delta.
Ready to see where you stand? Take the AI-Readiness Score — three minutes, no email, honest answer.
Want to talk about the gaps the assessment uncovered? Book a free strategy call at consult.agency7.ca/ai-audit.
Get the Autonomous Enterprise Blueprint
A 14-page architectural guide covering the Agency7 mandate, the fractured pipeline, agentic ledgers, and the generative engine optimization playbook — delivered as a PDF to your inbox.