The AIght Score
The one number on this site that matters.
Every tool in the archive carries a single 0–100 score. It’s the average of five axes I rate by hand, after spending real time with the tool. No algorithm. No vendor input. No paid placements.
The score isn’t generated. There’s no API call behind it. I pick five axes that have stayed the same since the archive started — utility, privacy, speed, cost, transparency — and I write a number on each one after I’ve used the tool enough to mean it.
Some tools sit in the archive for weeks before they get a score. That’s on purpose. A number this small needs to carry weight, so I’d rather show scoring in review than publish a guess.
When a tool changes — new pricing, new privacy posture, a model swap — the score moves with it. The last-updated date on each tool page is the last day I sat with it.
The five axes
Each 0–100. The published score is the mean.
Utility
Does it actually do the thing it claims, well enough that you'd choose it over the alternative?
What it’s asking
Real, repeatable usefulness on the job it's marketed for. Demos that survive contact with messy inputs.
Red flags
- Polished landing page, broken in week-two use
- Works on the demo prompt, fails on yours
- Requires a specific prompt incantation to behave
Privacy
What happens to your data once it's in the box?
What it’s asking
Clear policy on training, retention, residency, deletion. EU options when relevant. Self-host as a bonus.
Red flags
- Defaults to training on your prompts unless you opt out
- Buried retention terms or none at all
- Vague "we may share with partners" clauses
Speed
How long do you actually wait for a useful output, in a realistic session?
What it’s asking
Time-to-first-useful-result. Latency at common context lengths. Streaming where it matters.
Red flags
- Headline benchmarks measured on toy prompts
- Hidden queue times under load
- "Fast" tier locked behind enterprise pricing
Cost
What does it really cost in a normal month, including the things they don't put on the pricing page?
What it’s asking
Honest monthly spend for a representative workload. Overages, throughput caps, paywall cliffs disclosed.
Red flags
- Free tier that resets daily, not monthly
- Token pricing with no visible usage meter
- Surprise per-seat minimums on the upgrade path
Transparency
How honest is the team about what the tool can and can't do?
What it’s asking
Public changelogs. Acknowledged failure modes. Real model names, not marketing names. Open weights or open source where claimed.
Red flags
- Model identity hidden behind a custom brand name
- Quiet feature removals
- Marketing speed numbers that don't match the docs
What the bands mean
Plain English for the headline number.
I keep coming back to this without thinking about it. The reason to use it outweighs the cost of switching to anything else.
Strong on most axes, real weaknesses on one. Worth your time if the weaknesses don't hit your specific case.
A serious answer for a narrow problem. Wrong tool for most people; right tool for some.
There's something fundamentally awkward about it. Use it because the alternative is worse, not because it's good.
Listed in the archive only so I can explain why I don't recommend it.
The score is an opinion, slowly formed. If you disagree, that’s the point — disagree with a person, not a vendor’s landing page.