# AI Feature Scorecard

> A 3-axis decision tool for picking the first AI feature to ship in your SaaS.
> Drop into Notion, Linear, or a Google Doc. Score honestly.
>
> **Source:** [Which AI Feature Should I Ship First?](https://therobin.dev/blog/which-ai-feature-to-ship-first) — therobin.dev

---

## How to use this

1. List 3 candidate AI features you've been considering.
2. Score each on the three axes below (1–10).
3. Apply the weighted formula.
4. Run the highest-scoring feature through the disqualifier checklist.
5. The first feature that survives both is your v1.

---

## The 3 axes

### Axis 1 — User Value (weight 40%)

How much does this feature improve your user's job-to-be-done?

- **10:** Every active user encounters it weekly; removes a clearly-painful step; removing it would generate support tickets.
- **7:** Used by most users monthly; meaningful time savings.
- **5:** Useful but not essential; medium frequency.
- **3:** Nice-to-have; "would be cool"; low frequency.
- **1:** Novelty / demo value only.

### Axis 2 — Engineering Cost (weight 30%, score INVERSELY)

How expensive is the v1 — including the unsexy parts? Higher score = cheaper.

- **10:** 2 weeks, no new infra, single prompt + UI.
- **7:** 3–4 weeks, modest new components (e.g. simple retrieval).
- **5:** 6 weeks, embeddings + vector DB + re-ranking.
- **3:** 8+ weeks, multiple agent steps, tool use, eval harness, custom UI.
- **1:** 3+ months, deep ML/infra work, dedicated team needed.

### Axis 3 — LLM Cost-per-Request (weight 30%, score INVERSELY)

What does it cost to run, per request, at production scale? Higher score = cheaper.

- **10:** Fractions of a cent (Haiku 4.5 / GPT-4o-mini, sub-1k tokens).
- **7:** ~£0.005–£0.02 per request (Sonnet 4.6 short context).
- **5:** ~£0.03–£0.10 per request (Sonnet 4.6 long context, embeddings + LLM).
- **3:** ~£0.15+ per request (Opus 4.7, multi-step agents, large contexts).
- **1:** £1+ per interaction (multi-call agents, large context windows on every call).

> **Math reminder:**
> ```
> cost_per_request = (input_tokens × input_price) + (output_tokens × output_price)
> monthly_cost     = cost_per_request × requests_per_user × MAU
> ```

---

## Scorecard table

| Feature | User Value (×0.4) | Eng Cost INV (×0.3) | LLM Cost INV (×0.3) | Weighted Score |
|---------|:-:|:-:|:-:|:-:|
| Candidate A: ____________________ |   |   |   |   |
| Candidate B: ____________________ |   |   |   |   |
| Candidate C: ____________________ |   |   |   |   |

Weighted score formula:
```
final_score = (user_value × 0.4) + (eng_cost_inv × 0.3) + (llm_cost_inv × 0.3)
```

---

## Disqualifier checklist (run on top-scored feature)

- [ ] **Evals defined.** I know how to measure whether v1.1 is better than v1.0.
- [ ] **Cost ceiling set.** I've written down the maximum monthly LLM bill I'll tolerate at projected scale.
- [ ] **Fallback specified.** I know what happens when the LLM API is down, returns malformed output, or refuses a request.
- [ ] **Owner named.** I know who maintains this feature after launch — including monitoring prompt drift and model deprecations.

If any box is unchecked, drop to the next-highest-scoring feature and re-run.

---

## Worked example (from the post)

UK B2B SaaS, 5,000 MAU.

| Feature | UV | EC inv | LLM inv | Score |
|---------|:-:|:-:|:-:|:-:|
| AI document Q&A | 8 | 5 | 6 | **6.5** |
| AI-generated email replies | 6 | 8 | 7 | **6.9** ← winner |
| AI support agent (autonomous) | 9 | 2 | 3 | **5.1** |
| AI semantic search | 7 | 4 | 7 | **6.1** |

The most exciting feature (autonomous agent) scored lowest because of axis 2 + 3. The most boring feature (pre-filled email replies) won.

This is the pattern. Boring features ship.

---

## Need this run on your actual codebase?

The £900 [AI Integration Audit](https://therobin.dev/services) does exactly this:

- Scores up to 3 candidate features against your codebase
- Delivers a written architecture and rollout plan for the top recommendation
- Projects monthly cost at your current MAU and at 10× scale
- 5 days, async, written deliverable — no meetings required

[See the audit →](https://therobin.dev/services)