🎯 Compliance & AI

Zero-Tolerance Policy Enforcement

From blind spots to real-time risk visibility. How a judgment-focused interface turned an imperfect AI model into a tool that prevents incidents instead of explaining them.

Timeline
4 months
My role
Design Lead
Outcome
73% incident reduction
Team
2 designers, 4 eng, ML leads
The stakes

A bad call here doesn't cost a sprint

In a regulated call center, a missed compliance violation isn't a UX papercut. It's a fine, a lost license, or a headline. Leadership was flying blind: quality teams manually reviewed roughly two percent of calls, violations surfaced days later, and by the time anyone acted the damage was done. The real problem wasn't effort. It was a system blind to risk at scale.

Compliance failures rarely come from people not trying. They come from systems that can't see.
The problem

Why manual auditing kept failing

  • Sampling missed the long tail. The two percent reviewed was never the two percent that mattered.
  • Feedback came too late to change behavior. Agents learned about a violation days after the call.
  • Auditors drowned in low-risk calls and had no way to triage toward genuine risk.
  • Leaders managed reactively, explaining incidents after the fact instead of preventing them.
Before-state journey map: manual sampling, delayed review, reactive escalationThe broken compliance process showing manual sampling, delayed review, and reactive escalation on a dark theme.Manual sampling~2% of callsDelayed reviewDays after callReactive escalationLeaders explain post-fact
What I owned

My role as the design lead

I led design end to end: the research with auditors and agents, the information architecture for surfacing risk, the core interaction model, and the working relationship with the ML team that decided what the model could and couldn't be trusted to flag. My job wasn't to decorate a dashboard. It was to design the judgment layer between an imperfect model and a human who had to act on it.

  • Ran contextual interviews with auditors to map how they actually triage risk.
  • Defined a risk-scoring surface that ranked calls by likelihood and severity.
  • Designed the human-in-the-loop review flow so people could confirm, dismiss, or escalate fast.
  • Set the trust model: where the AI decides, where it suggests, and where a human must sign off.
The approach

Design as risk management

The core insight: not all violations are equal, and not all model confidence is equal. So the interface had to communicate two things at once — how risky a call was, and how sure the system was. We designed a triage queue that pushed high-severity, high-confidence calls to the top, with clear states for the gray-area cases that genuinely needed a human.

We deliberately avoided full automation. A model that auto-closes calls feels efficient until it's confidently wrong about a serious violation. The design kept a human accountable for every high-stakes decision while letting the AI remove the grunt work of finding the needle.

After-state: Real-time risk queue with confidence and severityImproved system flagging every call in real-time with risk scoring sorted by severity and confidence.All calls analyzed100% coverageRisk scoredSeverity + confidenceProactive coachingSame-day interventionRisk queue prioritizationHigh severity + high confidence→ escalateHigh severity + low confidence→ reviewLow severity + any confidence→ monitor
The hard design problem wasn't the dashboard. It was deciding where the machine stops and a person starts.
The outcome

From reactive to preventive

Coverage went from a sliver to every call. Violations that used to surface days later now flag in near real time, so a team lead can coach an agent the same day. Within two quarters, reported compliance incidents dropped sharply, and leadership shifted from explaining incidents to preventing them.

  • 100% call coverage replaced ~2% sampling.
  • ~73% reduction in compliance incidents over two quarters.
  • Audit team time redirected from finding violations to resolving them.
What I took from it

The leadership lesson

The temptation with AI products is to maximize automation and call it progress. The leadership call here was the opposite: to deliberately keep humans in the loop where the cost of a confident error was unacceptable. Designing that boundary — and getting product, engineering, and compliance to agree on it — was the real deliverable. The interface was just where the agreement became visible.

Executive Summary

Executive Summary

Executive summary: challenge, solution, outcomeHigh-level overview with challenge, solution, and outcome in the page's color theme.ChallengeLeadership was flying blind: quality teams reviewed only 2% of calls.Violations surfaced days later, by which time damage was done.SolutionReal-time risk detection with human-in-the-loop review.Designed the judgment layer between imperfect AI and accountable humans.Outcome100% call coverage replaced 2% sampling. Incidents dropped 73% in two quarters.Shifted from reactive incidents to preventive coaching.
Success Metrics

Success Metrics

Success metrics: coverage, incidents, feedbackThree key metrics showing impact: 100% coverage, 73% incident reduction, same-day feedback.Call coverage100%from 2%Incident reduction73%in 2 quartersFeedback speedSame-dayfrom days laterImpact timelineBefore: Manual sampling → Delayed escalation → Reactive managementAfter: Real-time detection → Immediate review → Preventive coachingAudit teams redirected from finding violations to resolving them.
Stakeholder Map

Stakeholder Map

Stakeholder map showing key playersFour stakeholder groups: auditors, agents, leadership, and ML team.AuditorsTriage riskReview violationsPrevent incidentsAgentsReal-time feedbackSame-day coachingCompliance supportLeadershipRisk visibilityPreventive actionIncident preventionML teamModel accuracyDefine thresholdsSet automation boundsThe trust model: where AI decides, where it suggests, where humans sign off.
Research Plan

Research Plan

Research plan methodologyFour research approaches: contextual interviews, workflow mapping, risk scoring, and ML model sessions.Contextual interviewsOn-site with auditorsHow they triage riskWorkflow mappingCurrent state docIdentify bottlenecksRisk scoring interviewsWhat makes high-risk?Severity vs likelihoodML model sessionsUnderstand confidenceAutomation boundariesOutcome: Risk IA + interaction model for judgment within confidence bounds
Research Insights

Research Insights

Research insights key findingsFour critical insights from auditor research.Key findings from auditor researchSampling blindness2% coverage nevercaptures what mattersFeedback lagDays delay meansagents forget contextTriage overloadLow-risk callsdrown out genuine riskReactive postureLeaders explainincidents after the factCore insightCompliance failures rarely come from people not trying.They come from systems that can't see.
JTBD

Jobs to Be Done

Jobs to be done: what users are trying to accomplishThree primary jobs for auditors, agents, and leadership.The jobs our users are trying to doFor auditors:"I need to quickly identify which calls pose real risk so I can focus on what matters."For agents:"I need feedback while the interaction is fresh so I can immediately correct my approach."For leadership:"I need visibility into violations as they happen so I can coach teams proactively."
Opportunity Map

Opportunity Map

Opportunity map showing unmet needsCoverage gap and feedback delay lead to two main opportunities.Unmet needs → Design opportunitiesCoverage gapEvery call matters,but only 2% reviewedOpportunity 1Real-time detectionwith 100% visibilityFeedback delayDays preventbehavior changeOpportunity 2Same-day coachingwith context intactCore opportunity: Design the human-AI boundaryWhere the AI decides automatically, where it suggests and humans confirm,and where humans must always sign off. The interface is the agreement between systems and judgment.
Design Principles

Design Principles

Five design principlesCore principles guiding the interface design.Five design principles1. Surface risk clearlyDon't bury violationsMake them impossible to miss2. Make confidence explicitShow when the modelis sure and when it's guessing3. Keep humans accountableHumans own the callnot the system4. Enable fast actionReview and escalatein clicks not minutes5. Design for nuance in the gray areaHigh severity + low confidence = genuine uncertainty that needs human judgment
Workflow Architecture

Workflow Architecture

Workflow architectureDetection pipeline, scoring, risk queue, and human review layer.Incoming calls100% coverageDetectionAI flags violationsScoringSeverity + confidenceQueueRanked by riskHuman review layerAuditor review interfaceConfirm violation → Route to team lead → Escalate if needed → Close when resolvedFeedback loops to agents same-day. Leadership sees trends in real time.Every decision is audited, every escalation is tracked, every outcome improves the next round.
User Flow

User Flow

Auditor user flow through the interfaceFive-step workflow: login, queue, open call, review, action.Auditor's daily workflow1. Login2. Risk queue3. Open call4. Review5. ActionDetails of step 3: Open callShows: call recording, transcript, flagged moments, model confidence, severity scoreDetails of step 5: ActionConfirm violation (escalate) | Dismiss (false positive) | Escalate (critical)
IA

Information Architecture

Information architectureRisk queue, violation detail, auditor controls, and coaching interface.Risk queueAll violations sortedby severity × confidenceQuick scan, mass actionsViolation detailCall recording + transcriptTimestamp of violationWhy model flagged itAuditor controlsConfirm / DismissEscalate / RouteAdd context notesCoaching interfaceTeam lead views violationstied to their agentsAssign training modulesNavigation principleEverything surfaces risk. Every page leads auditors toward judgment: what matters, why it matters, what to do.
Concept Exploration

Concept Exploration

Concept exploration: three design directionsFull automation rejected, passive monitoring rejected, human-in-the-loop selected.Concept 1Full automationAI closes violationsautomaticallyRisks: confident errorsConcept 2Passive monitoringDashboard of trendsno actionable alertsProblem: no urgencyConcept 3Human-in-loopAI surfaces,humans judge✓ SelectedWhy concept 3 wonKeeps humans accountable for high-stakes decisions. AI handles the grunt work (finding the needle),but humans confirm violations, dismiss false positives, and own the judgment.Model confidence is explicit — high-severity + low-confidence = clear signal that human review matters.Auditors spend time resolving violations, not hunting for them. Agents get same-day coaching.Leadership shifts from reactive incident management to preventive behavior change.The interface design is the visible agreement between where the machine stops and the person starts.
Wireframes

Wireframes

Wireframes for risk queue and violation detailTwo main screen wireframes showing layout structure.Screen 1: Risk queueSort: Severity × ConfidenceHigh / HighHigh / LowMedium / HighLow / —Violation item(Call, agent, severity)Screen 2: Violation detailCall info: agent, time, durationAudio playback + confidenceTranscript with highlightedviolation segmentConfirmDismissEscalateNotes field for auditor context
Design System

Design System

Design system componentsRisk badges, violation cards, action buttons.Core design system componentsRisk badgesHigh severityMedium severityLow severityConfidence: highViolation card (summary list item)Agent: Sarah Chen | Call: 09:15 AM today | Duration: 2m 34sViolation: Policy disclosure on 3rd-party liability — confidence 87%Severity: High | Status: UnreviewedAction buttons (consistent use)ConfirmDismissEscalateRoute to lead
Final Design

Final Design

Final polished interface mockupComplete risk detection interface with header, queue, and detail panel.Zero-Tolerance ComplianceAuditor: SarahRisk queueSort: Severity × ConfidenceHigh / HighSarah Chen • 09:15 AMPolicy disclosure — 87%High / LowMike Johnson • 11:42 AMDisclosure violation — 52%Low severityPriya Patel • 02:30 PMMinor tone issue — 78%Violation detailSarah Chen • Agent ID 4782Today 09:15 AM • 2m 34sPolicy disclosure on 3rd-party liabilityConfidence: 87% | Severity: HighTranscript excerpt:"...our liability coverage does NOT include..."[HIGHLIGHTED] No mention of exclusions.ConfirmDismissEscalateAdd note for team lead:[Agent mentioned policy but didn't clarify...]
Usability Testing

Usability Testing

Usability testing results with 8 auditorsTesting results showing findability, judgment, decision speed, and escalation behavior.Usability testing with 8 auditorsFindability testTask: Find high-risk violationResult: 100% success, avg 8 sec✓ Risk queue sorting worksJudgment testTask: Confirm or dismiss violationsResult: Confidence scores reduced✓ Low-confidence flags invite cautionDecision speedTime per violationMedian: 45 seconds✓ Fast enough for same-day reviewEscalation behaviorDismiss button prevented over-escalationFalse positive rate: 12%✓ Auditors trust model but verifyKey validation insightAuditors prefer explicit confidence over a perfectly-right-but-opaque AI. When they see "87% confident" they act decisively.When they see "52% confident," they slow down and verify. The interface enabled judgment, not removed it.
What I'd Do Next

What I'd Do Next

Future roadmap: four phases of improvementExpansion, coaching dashboards, WFM integration, and broader trust model.Future roadmapPhase 1Expand to email & chat complianceTimeline: Q2–Q3Phase 2Agent coaching dashboardTimeline: Q3–Q4Phase 3WFM system integrationTimeline: Q4Phase 4Apply trust model to other decisionsTimeline: 2024+Why these improvements matterEach adds specificity without changing core: AI surfaces, humans judge. The trust model scales.Agent coaching ensures behavior change. WFM integration closes feedback loop.The pattern—confidence + human judgment + accountability—applies anywhere risk and automation collide.
Future-State AI Vision

Future-State AI Vision

Future-state AI vision: advanced ML capabilitiesMulti-language detection, behavioral prediction, real-time coaching, legal interpretation.Where AI goes next (without replacing judgment)Multi-language detectionUnderstand violations across languagesStatus: Feasible nowBehavioral predictionDetect repeat patterns before escalationStatus: With more dataReal-time coachingSuggest corrective language to agentsStatus: Needs pilotLegal interpretationUnderstand jurisdiction-specific rulesStatus: Research phaseThe critical constraintNone of these improvements replace human judgment. A prediction says "Sarah needs coaching," not "Sarah violated policy."A language model interprets context, but auditors confirm. Real-time coaching suggests, but agents decide.The interface stays the same: confidence signals + human judgment + accountability. The AI just gets smarter at what it suggests.
NOTE — placeholder metrics: the numbers on this page are illustrative. Replace with your real results before sharing.
More high-stakes work

Building something where the wrong call gets expensive?

Start a conversation