Outcome Hit Rate
Outcome Hit Rate measures whether CS interventions actually change outcomes, not just whether they happen. For every intervention your team runs (QBR, outreach, health score response, save attempt), it asks three questions: Was the target right? Was the timing right? Did the outcome change?
A hit requires all three. Which, in my experience, is a higher bar than most teams realize.
Most CS dashboards track activity, things like touches completed, QBRs held, cadences run, playbooks fired. Those are input metrics, and they confirm the team was active, which is useful for capacity planning and not much else. McKinsey's analysis of 100+ B2B SaaS companies found that top-quartile companies (median 24x EV/revenue multiple) achieve 113% NRR while bottom-quartile peers (5x multiple) sit at 98%. That's fifteen points of NRR separating the two groups, and the companies on the high end don't appear to be doing more. They appear to be choosing better.
The three questions work as a sequential filter, and the distinctions between them matter more than they seem at first.
First filter: was this the right account? The question asks whether this account had a problem you could actually influence. Some accounts are going to churn regardless. The deal was wrong, or the use case was a stretch, or the real constraint sits upstream of your product. Intervening there isn't proactive CS. It's a non-productive intervention, capacity spent on an account where the outcome was already determined (more on that in the Velocity Trap page).
Second: could the customer still change course? AI-enhanced health scores can now flag churn risk 60-90 days in advance at 85%+ accuracy. But flagging a risk at 85% confidence doesn't help if the decision already got made in a room you weren't in. If the decision to leave was made three weeks ago in a meeting you weren't invited to, your QBR deck isn't saving anything.
Third, and hardest: did the intervention actually matter? This is asking about the counterfactual. You're not asking whether the intervention happened. You're asking whether the account would look different today if it hadn't. In my experience, most teams never get around to asking this question, it's the kind of question where the difficulty of the methodology provides convenient cover for not wanting the answer.
I use this primarily as a retrospective exercise. Once a quarter, pull a sample of 50-80 interventions and score each one against all three questions. The scoring is imperfect, you're making judgment calls on each criterion, but even rough scoring reveals patterns that activity metrics completely miss. I'll admit I resisted doing this the first time because I suspected the answer would be uncomfortable.
Across the client portfolios where I've run this exercise, the first-time hit rate consistently lands around 15%.
That number shocks people, though it probably shouldn't. Bain's CS report found that CSMs spend over half their time on low-value, repetitive tasks. If more than 50% of activity is structurally unproductive, a 15% hit rate on outcome-changing interventions shifts the question. You stop asking "why aren't my CSMs more effective" and start asking "why are we sending them into accounts where the outcome was already decided."
The math: 200 interventions per quarter at 15% accuracy generates 30 meaningful outcomes. The other 170 are non-productive. Now restructure targeting (using something like the Milestone-to-Intervention Model) and cut volume to 120 interventions while raising accuracy to 40%. That's 48 meaningful outcomes. You end up with 60% more impact while running 40% fewer interventions. The leverage is entirely in targeting.
At fully-loaded CSM costs of $55-75/hour, those 80 eliminated non-productive interventions (averaging 2 hours each with prep, execution, follow-up) represent $8,800-$12,000 per quarter in recaptured capacity per CSM. Across a team of 8, that's $70K-$96K quarterly redirected from waste to impact.
There's a related problem that I think gets underexplored. When I've audited client expansion pipelines, roughly 40% of deals classified as "expansion" are actually repair. A second team adopting the product because the first implementation was too narrow. An "upsell" that's really a feature the customer should have had from day one. Benchmarkit's 2025 data shows expansion becomes the dominant growth engine beyond $20M ARR, with companies in the $15-30M range getting 40% of growth from existing customers (up from 30% in 2021). But if a significant chunk of that expansion revenue is repair in disguise, you can carry that distortion for quarters before anyone notices, which is exactly what makes it dangerous.
At one client, I tagged every expansion deal. Forty-one percent were repair. The forecast had them at 115% of target. Adjusted for repair, they were at 68%. Nobody wants to hear their forecast just dropped 47 points. But they stopped defending the number and started planning around reality, which is the point.
Outcome Hit Rate applied to expansion motions catches this, because repair doesn't satisfy the "did the outcome change?" criterion the way genuine expansion does.
Question three, the counterfactual, is where this whole framework gets shaky, and I want to be honest about that. Would this account have renewed without the QBR? Would that expansion have happened organically? You won't always know. Sometimes the best you can do is tag the intervention and check back in 90 days. Staircase AI's research found that customers with regular QBRs are twice as likely to renew, which is a useful baseline. But "regular QBRs" and "well-targeted QBRs" are not the same population. There's clearly a relationship between activity and retention. The harder question, and the one the QBR studies don't answer, is whether you could strip out half the activity and keep the retention gains.
Some teams I've worked with have started using a simple attribution framework: tag interventions at execution time, then at renewal or churn, have the CSM (and ideally the customer) assess which interventions were consequential. It's subjective and it's directional, but I'd rather have a flawed attribution signal than continue allocating resources with zero feedback on what worked, and based on the teams I've audited, that's where most are operating today.
Dillon Young is the founder of Customer Value Labs, where he builds and maintains revenue systems infrastructure for B2B SaaS teams. If you suspect your team's interventions aren't landing where they should, that's worth a conversation.