Eval · SkillIndex
A credit score for your agents.
Your prompt's probably fine. Let me double-check. A 0–100 score for any prompt, skill, or agent. Seven-dimension breakdown. Execution test. Category percentile. Three minutes.
Real transformation · 3 minutes
From vague prompt to certified skill.
Watch how Eval turns a weak, ad-hoc prompt into a scored, structured skill a whole team can run.
Before
SkillIndex · 34Ad-hoc prompt
“Write a cold outreach email for our SaaS product to VP-level buyers.”
- No role or voice defined
- Skips intake — Claude has to guess the ICP
- No deliverable format — gets a single paragraph, not a sequence
- Cannot be rerun by a teammate for a different account
After
SkillIndex · 89 · CertifiedStructured skill
“You are a senior B2B SaaS SDR. Phase 1: intake on company, buyer persona, pain point, competitor context. Phase 2: build the hook using SPIN. Phase 3: produce a 4-email sequence with subject lines, CTAs, and follow-up timing. Phase 4: deploy checklist. Phase 5: measure reply rate and adjust.”
- Clear SDR role + tone of voice
- Intake walks Claude through ICP, pain, context
- 4-email sequence with structured format + send timing
- Any teammate runs it + gets consistent quality
Rewriting is $20 extra on top of the $10 Eval. Takes 2 minutes. Run one on your own skill →
Start an evaluation
Drop your skill.
Sample SkillIndex badge
87
/ 100
Easy Carl Certified
cold-call-script
Sales · 73rd percentile · Consistency 0.91 · 3/3 scenarios passed
What we measure
Seven dimensions, one score.
Structure
15
Valid YAML, 5-phase format, placeholder tokens, length in band
Triggering
15
Trigger-word coverage, overlap avoidance, activation accuracy
Specificity
20
Named frameworks, numeric benchmarks, industry terminology density
Completeness
15
Intake depth, decision trees, variants, deployment checklist
Deliverable
15
Produces an artifact, multi-variant templates, deployment-ready
Measurability
10
KPIs, numeric targets, reporting cadence, optimization triggers
Safety
10
Harmful-pattern scan, legal disclaimers, PII guidance, compliance
Total
100
Weighted composite
How it works
Four steps. Three minutes.
Drop in your prompt
Paste the text or upload the file. SKILL.md, system prompt, agent definition — whatever you've got. Encrypted at rest. I never share it.
Pay $10, sit back
Grading takes about 3 minutes. I'll email you when it's done.
Read the scorecard
SkillIndex score, 7-dimension breakdown, execution test results, category percentile. Receipts for every score — quoted lines from your own skill.
Fix it, or pay me to
Apply the specific recommendations yourself. Or pay +$20 and I'll send the rewritten version in two minutes.
What makes Eval different
Built for skill creators. Not LLM engineers.
Activation simulation
We test your skill against 20 synthetic user messages and show you which ones would trigger it. Reveals exactly what to add to your frontmatter description.
Skill DNA analysis
We extract the frameworks your skill references and compare to top-performing skills in the same category. "You cite SPIN but not MEDDIC — 87% of top B2B sales skills use both."
Benchmark percentile
Every submission is ranked against its category cohort, drawn from our library of 600+ reference skills. Know exactly where you stand.
Staleness detection
Flags outdated tools and tactics. "Your email marketing skill references Mailchimp but not Kit, Beehiiv, or Klaviyo — all dominant in 2025+."
Pricing
Pay once, or go unlimited.
Per skill
Eval
$10
One-time evaluation
Start Eval- ✓Full SkillIndex report
- ✓Execution test + percentile
- ✓Shareable URL + PDF
- ✓+$20 for rewritten version
Monthly
Eval Pro
$99/mo
For skill creators and teams
- ✓Unlimited evaluations
- ✓Version tracking + diffs
- ✓A/B test framework
- ✓Regression alerts
- ✓Team dashboard, up to 25 skills
- ✓30% off rewrites and Agent Packages
Enterprise
Eval Team
$299/mo
100 skills, unlimited seats
Contact Sales- ✓Everything in Pro
- ✓Up to 100 skills tracked
- ✓SSO + audit log
- ✓Custom rubrics (HIPAA, SOX, etc.)
- ✓API access for CI/CD
- ✓Quarterly portfolio review
FAQ
What's a SkillIndex score?
A 0–100 score for how well your prompt actually works — not how long it is. 80+ gets the Easy Carl Certified badge. I grade against 620+ reference skills across 35 categories.
What counts as a 'skill'?
Anything text-based that tells an AI what to do. SKILL.md, system prompts, Claude Project CLAUDE.md files, agent definitions, one-off prompts you're tired of tweaking. If it's text, I can grade it.
How does the execution test work?
I generate 3 synthetic intakes that match your skill's domain, run the skill against each, and grade the outputs. Same inputs run multiple times also tells me how consistent it is.
What's Eval Pro?
$99/month. Unlimited grades, version tracking so you see when a rewrite actually improves things, A/B tests, regression alerts, team dashboard. For people who grade a lot.
Are my skills private?
Yes. Encrypted at rest. Never public unless you share the report URL. Reports expire after 90 days unless you're a Member or Pro. I don't train on your stuff. I don't sell it.
Can I evaluate an agent, not just one skill?
Yes. Upload the whole agent folder. I score each skill, flag overlaps and gaps, and give you a team-level SkillIndex for the agent overall.