Trust Scores: How to Measure AI Code Quality Automatically
Your AI generates code fast, but how do you know if it's good? Trust Scores provide a 0-100 metric for AI code quality based on pattern adherence and violation history.
TL;DR
- The Problem: Linters catch syntax errors, not pattern violations—can't tell if AI code follows YOUR conventions
- Why It Matters: 500 lines of AI code in 30 seconds, but 30+ minutes to review manually for quality
- SnapBack's Solution: Trust Score (0-100) measures pattern adherence + violation history in real-time
- Key Benefit: Trust Score 82/100 = 82% pattern compliance—merge with data-driven confidence, not gut feel
Your AI coding assistant just generated 500 lines of code in 30 seconds.
Question: Is it good code?
You could:
- ✅ Read all 500 lines carefully (30+ minutes)
- ⚠️ Scan quickly and hope for the best (risky)
- ❌ Merge and pray (disaster waiting to happen)
What if you could see a number: Trust Score 82/100?
That number tells you:
- This code follows 82% of your established patterns
- It has low violation history
- Architecture rules are respected
- You can merge with confidence
This is Trust Score—the first real-time quality metric for AI-generated code.
The Problem: No Metrics for AI Code Quality
Traditional code quality metrics don't work for AI-generated code:
Static Analysis Catches Syntax, Not Intent
// ✅ Linter says: "All good"
async function getUser(id: string) {
const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
if (!user) throw new Error('User not found');
return user;
}
// ❌ But YOUR codebase uses `Result<T, E>`, not exceptions
// ❌ And YOUR team enforces custom error types (NotFoundError)
// ❌ Linter doesn't know this—it only checks syntax
Linters and TypeScript catch bugs, not pattern violations.
Code Review is Slow and Inconsistent
Different reviewers enforce different standards:
- Alice rejects
anytypes aggressively - Bob allows them in test files
- Charlie doesn't notice them
Human review is necessary but slow and subjective.
Test Coverage is Lagging
// AI generates new feature
async function calculateDiscount(user: User, cart: Cart) {
// 50 lines of complex logic
}
// Your test coverage: 0% (not written yet)
// How confident are you in merging this?
Tests validate behavior, but only if you write them.
Introducing Trust Score: 0-100 Scale for AI Code Quality
Trust Score measures how well AI-generated code aligns with your codebase's established patterns and architecture rules.
Trust Score = (Pattern Adherence × 60%) + (Violation History × 40%)
What It Measures
Pattern Adherence (60% weight):
- Does code follow established conventions?
- Are architectural rules respected?
- Is the style consistent with existing code?
Violation History (40% weight):
- How many times has this file violated patterns before?
- Are violations trending up or down?
- Is this a high-risk area of the codebase?
Range: 0-100 where:
- 90-100: Excellent (aligns perfectly with patterns)
- 70-89: Good (minor deviations, safe to merge)
- 50-69: Fair (review carefully before merging)
- 0-49: Poor (significant pattern violations, do not merge)
How Trust Score is Calculated
Step 1: Pattern Recognition
SnapBack's pattern memory system tracks your accepts/rejects over time:
// Week 1: You reject AI's exception-throwing pattern 3 times
// Pattern recorded: "User prefers `Result<T, E>` over exceptions"
// Week 4: AI suggests exception again
const pattern = patternMemory.check(code);
// Result: pattern.confidence = 92% ("User strongly prefers `Result<T, E>`")
Step 2: Architecture Validation
Checks code against explicit architecture rules:
// Your .snapbackrc defines rules
{
"architectureRules": {
"noCircularDeps": true,
"layerBoundaries": {
"platform": ["core", "contracts"],
"core": ["contracts"],
"contracts": []
}
}
}
// AI suggests: import { dbClient } from 'core/db' in platform/web/Button.tsx
const architectureViolation = checkLayerBoundary(importPath, currentFile);
// Result: violation = true (platform can't import from core)
Step 3: Violation History Analysis
Tracks file-level violation trends:
const violationHistory = {
'auth.ts': [
{ date: '2026-01-15', type: 'exception-thrown' },
{ date: '2026-01-22', type: 'missing-validation' },
{ date: '2026-02-05', type: 'circular-import' },
],
};
// 3 violations in 3 weeks = high-risk file
const riskScore = calculateRisk(violationHistory['auth.ts']);
// Result: riskScore = HIGH
Step 4: Calculate Final Score
function calculateTrustScore(file: string, changes: Change[]): number {
const patternScore = evaluatePatternAdherence(changes); // 0-100
const architectureScore = evaluateArchitecture(changes); // 0-100
const violationPenalty = calculateViolationPenalty(file); // 0-40
const patternWeight = 0.6;
const architectureWeight = 0.4;
const baseScore = (patternScore * patternWeight) +
(architectureScore * architectureWeight);
return Math.max(0, baseScore - violationPenalty);
}
Real-World Example: Trust Score Evolution
Day 1: New Codebase (Low Trust)
// AI generates auth module
async function validateToken(token: string) {
const decoded = jwt.verify(token, SECRET);
if (!decoded) throw new Error('Invalid token');
return decoded;
}
// Trust Score: 45/100
// Reasons:
// - Uses exceptions (your team prefers `Result<T, E>`) [-20]
// - No custom error types (NotAuthenticatedError) [-15]
// - Hardcoded SECRET instead of env var [-10]
// - New file, no violation history [+0]
Recommendation: Review carefully before merging.
Day 30: Patterns Established (Medium Trust)
// AI generates user service (learned patterns)
async function getUser(id: string): Promise<Result<User, NotFoundError>> {
const user = await db.query('SELECT * FROM users WHERE id = ?', [id]);
if (!user) return err(new NotFoundError('User not found'));
return ok(user);
}
// Trust Score: 78/100
// Reasons:
// + Uses `Result<T, E>` pattern [+30]
// + Custom error type (NotFoundError) [+20]
// + Follows DB query conventions [+15]
// + File has low violation history [+13]
Recommendation: Safe to merge after quick review.
Day 90: High Trust (Established Intelligence)
// AI generates payment processing
async function processPayment(
userId: string,
amount: number
): Promise<Result<Payment, PaymentError>> {
const user = await getUser(userId);
if (user.isErr()) return err(new PaymentError('User not found'));
const payment = await paymentService.charge(user.value, amount);
if (payment.isErr()) return err(payment.error);
await notificationService.send(user.value.email, 'Payment processed');
return ok(payment.value);
}
// Trust Score: 94/100
// Reasons:
// + Perfect `Result<T, E>` usage [+35]
// + Custom error types [+20]
// + Follows service composition pattern [+20]
// + Proper error propagation [+15]
// + Zero violations in last 60 days [+4]
Recommendation: Merge with confidence.
Why Trust Score Changes Everything
1. Objective Quality Gate for CI/CD
# .github/workflows/ci.yml
- name: Check Trust Score
run: |
TRUST_SCORE=$(snap check --mode trust --output json | jq '.score')
if [ $TRUST_SCORE -lt 70 ]; then
echo "Trust Score too low: $TRUST_SCORE"
exit 1
fi
Block merges automatically if Trust Score < 70.
2. Faster Code Reviews
// Reviewer sees PR with Trust Score
Trust Score: 85/100
- Pattern Adherence: 92%
- Architecture: 100%
- Violation History: Low
// Reviewer decision:
"Trust Score is 85, patterns look good. Approved."
// Review time: 2 minutes instead of 20
3. Track Quality Over Time
$ snap stats --history
Week 1: Trust Score 58/100 (learning patterns)
Week 4: Trust Score 72/100 (patterns emerging)
Week 8: Trust Score 81/100 (strong adherence)
Week 12: Trust Score 89/100 (excellent alignment)
# Quality improving over time = Pattern Memory working
4. Identify High-Risk Areas
// Trust Score by module
{
"auth.ts": 45, // ⚠️ High-risk (review carefully)
"payment.ts": 68, // ⚠️ Medium-risk (extra caution)
"ui/Button.tsx": 92, // ✅ Low-risk (safe to merge)
"utils/string.ts": 95 // ✅ Low-risk (safe to merge)
}
// Focus review time on auth.ts and payment.ts
5. Measure AI Tool Performance
Compare Trust Scores across different AI tools:
Copilot average Trust Score: 76/100
Cursor average Trust Score: 81/100
Claude Code average Trust Score: 84/100
→ Claude Code aligns best with your patterns
Trust Score vs. Traditional Metrics
| Metric | What It Measures | When It Helps | Limitations |
|---|---|---|---|
| Linter | Syntax errors, basic style | Always | Doesn't know your patterns |
| TypeScript | Type correctness | Always | Doesn't catch logic issues |
| Test Coverage | % of code tested | After tests written | Lagging indicator |
| Code Review | Human judgment | Always | Slow, subjective |
| Trust Score | Pattern adherence + history | Real-time during generation | Requires Pattern Memory training |
Trust Score complements, doesn't replace, these metrics.
Real-Time Trust Score with MCP Integration
With SnapBack's MCP integration, your AI can query Trust Score in real-time:
// Claude Desktop with SnapBack MCP
You: "Refactor the payment module"
Claude: [Queries SnapBack via MCP]
→ check({ mode: 'architecture', files: ['payment.ts'] })
SnapBack Response:
- Trust Score: 68/100 (Medium risk)
- Violation History: 3 violations in last 30 days
- Pattern Issues: Missing error handling, inconsistent naming
Claude: "I've checked the Trust Score for payment.ts (68/100).
This is a medium-risk file with recent violations.
I'll refactor carefully and add missing error handling
to bring the Trust Score above 80..."
This is intelligence-aware refactoring: Claude understands risk context before touching code.
Setup MCP Integration
# Claude Desktop
snap tools configure --claude
# Claude Code
snap tools configure --claude-code
Now Claude references Trust Scores during conversations.
Getting Started with Trust Score
Step 1: Install SnapBack
Trust Score works with Cursor, Copilot, Claude, and Windsurf:
npm install -g @snapback/cli
snap init
Step 2: Build Pattern Memory (2-4 Weeks)
Trust Score accuracy improves as Pattern Memory learns:
Week 1: Trust Scores fluctuate (learning patterns) Week 4: Trust Scores stabilize (patterns established) Week 8: Trust Scores highly accurate (strong intelligence)
Step 3: Check Trust Score Anytime
# Check current Trust Score
snap check --mode trust
# Output:
Trust Score: 82/100
Pattern Adherence: 88%
Architecture: 95%
Violation History: 3 (Medium)
Recommendations:
- auth.ts: Low Trust (45) - Review carefully
- payment.ts: Medium Trust (68) - Extra caution
Step 4: Set CI/CD Gates (Optional)
# Block merges if Trust Score < 70
- run: snap check --mode trust --min-score 70
Trust Score Configuration
Customize Trust Score calculation in .snapbackrc:
{
"trustScore": {
"minAcceptableScore": 70,
"patternWeight": 0.6,
"violationWeight": 0.4,
"violationDecayDays": 90,
"highRiskFiles": [
"auth.ts",
"payment.ts",
"billing.ts"
],
"excludeFiles": [
"**/*.test.ts",
"scripts/**"
]
}
}
Options:
minAcceptableScore: Threshold for CI/CD gates (default: 70)patternWeight: How much pattern adherence matters (default: 0.6)violationWeight: How much violation history matters (default: 0.4)violationDecayDays: How long violations affect score (default: 90 days)highRiskFiles: Files that require higher Trust ScoresexcludeFiles: Files excluded from Trust Score calculation
Frequently Asked Questions
Q: What's a good Trust Score?
90+: Excellent—code aligns perfectly with patterns 70-89: Good—safe to merge with quick review 50-69: Fair—requires careful review <50: Poor—do not merge without significant review
Q: Why is my Trust Score low on Day 1?
Trust Score depends on Pattern Memory. New projects have few patterns, so scores are lower. After 2-4 weeks of coding, Trust Scores stabilize and become accurate.
Q: Can I use Trust Score with any AI tool?
Yes. Trust Score works with:
Q: Does Trust Score replace code review?
No. Trust Score helps prioritize review time:
- High Trust Score (85+) → Quick review
- Medium Trust Score (65-84) → Standard review
- Low Trust Score (<65) → Thorough review
Q: How is Trust Score different from linters?
Linters check syntax. Trust Score checks pattern adherence—whether code follows YOUR established conventions, architecture rules, and style preferences.
See Trust Score in Action
Trust Score is built into SnapBack and works with all major AI coding tools.
Try it free:
npm install -g @snapback/cli
snap init
snap stats # See your Trust Score
Resources:
- Pattern Memory (Trust Score Foundation)
- SOPR Architecture (How We Calculate in <50ms)
- MCP Integration (Real-Time Trust Scores)
Integration Guides:
Your AI generates code fast. Trust Score tells you if it's good.