Customer Success

Churn Risk Shows Up in CS Calls Before It Shows Up in Health Scores

April 26, 2026·8 min read·Alex Handsaker

Most churn detection models are built from product usage data, NPS scores, and support ticket frequency. Those inputs are useful once a customer is already disengaging - they measure consequences of a problem that usually started earlier in the customer relationship. By the time usage drops or a detractor score appears, the root cause has been developing across CS conversations that were recorded but never properly analysed.

The signals that predict churn exist in CS callsbefore they show up in health scores. Escalation language, stakeholder engagement changes, absent expansion mentions, deferred action item follow-through - all surface in transcripts weeks before they register in any dashboard. The question isn't whether the signals are there. It's whether the extraction layer is there to surface them, and whether it's calibrated to mean something specific in your customer context.

Why sentiment scores miss the mechanism

Sentiment analysis applied to CS call transcripts sounds like a direct measure of customer health. In practice, it measures emotional register rather than churn risk. A customer can express everything politely while systematically pulling back: deferring to new stakeholders, skipping follow-ups, asking fewer questions about product roadmap. The sentiment score stays neutral. The churn risk is building.

The signals that actually predict churn are behavioural and structural, not emotional. Escalation language reflects urgency and unresolved issues. Stakeholder engagement changes are often more predictive still - a champion who was highly engaged starts attending calls intermittently, or introduces a new contact without explanation. Action item follow-through is a third: a customer who consistently defers what they committed to is signalling something the transcript captures directly, even when the call tone stays positive. If your health model is built primarily on sentiment, you're measuring how customers sound rather than what they're doing - and those two things diverge most sharply in the weeks before a churn decision is made.

Hand-sketched churn risk diagram showing a neutral sentiment gauge missing risk while engagement down, actions deferred, and no expansion signals show the churn mechanism. — Sentiment can stay neutral while the structural churn signals are already moving.

The signals that matter - and what makes them reliable

A practical CS churn risk extraction schema covers five fields, each defined as a Brick in a churn-risk Kit: escalation language detected (yes/no with supporting quote), stakeholder engagement score (1-5, based on who attended and how actively they participated), expansion intent mentioned (yes/no - customers who are growing rarely churn), action item completion referenced (whether prior actions were followed through or deferred), and review attendance signal (was a scheduled QBR or check-in confirmed, attended, or avoided). These fields are deterministic - the same Bricks and Kits mechanism ensures consistent results, and the answers exist in the transcript.

But extracting these fields generically - with no knowledge of what escalation means at your company, what good engagement looks like for your customer tier, or what expansion signals are specific to your product - produces signals calibrated to a model's inference of what churn risk looks like in general. That's a starting point. It isn't a reliable foundation for a health model, because the definitions that matter are yours, not the model's.

What knowledge grounding changes

Knowledge grounding means attaching your own CS documents to the evaluation so it checks against your definitions rather than model inference. Your escalation policy tells the evaluation what constitutes a formal escalation in your support process versus a standard concern. Your CS methodology document defines what a healthy QBR looks like for your product and customer tier. Your product expansion documentation tells it which signals - a customer asking about additional seats, a new use case, a question about a higher tier - actually constitute genuine expansion intent for your offering.

When those documents are attached, extraction becomes specific rather than generic. The escalation Brick doesn't detect frustration language in general - it checks whether the customer expressed something that meets your company's defined threshold. The expansion intent Brick doesn't flag any mention of product growth - it checks for the specific signals your team has defined as meaningful. Account-level grounding goes further: when the evaluation has access to account context - product tier, committed outcomes from the original sale, renewal date, stakeholder structure - it can check whether what happened on a call is consistent with where that account should be. A commitment made at sale that has never been confirmed in a QBR is a different signal from a feature a customer simply hasn't needed. Without that context, both accounts look the same.

This is the distinction that matters most. Generic extraction tells you something happened. Knowledge-grounded extraction tells you whether what happened was significant for this customer, against your criteria, in your product context. See knowledge grounding for how document attachment and field-level scoping work in practice.

Hand-sketched workflow showing CS transcript and CS knowledge documents feeding a churn risk Kit that returns health score inputs for escalation, engagement, and follow-through. — Grounded CS signals calibrate churn risk to your escalation, success, and expansion definitions.

Building a health score from conversation data

Once those five fields are being extracted consistently - grounded in your definitions - they feed a health score that updates after each touchpoint rather than on a fixed reporting cycle. The score accumulates across the account's call history: a customer whose last three calls showed declining stakeholder engagement, no expansion mentions, and deferred action items has a very different risk profile from one where the same calls showed growing participation and expansion discussion, even if both have similar product usage metrics.

The weighting of individual signals should reflect what predicts churn in your specific customer base. For some products, escalation language is the strongest leading indicator. For others, the absence of expansion intent is more predictive - customers who aren't growing are disproportionately likely to churn at renewal. Stakeholder engagement changes tend to be broadly predictive because they reflect how much the customer's organisation values the relationship, which shows up in call attendance before it shows up anywhere else. Because the signals are grounded in your knowledge base, the score reflects your standards rather than generic model assumptions - calibration that generic extraction can't achieve.

The practical consequence is an earlier intervention window. Sentiment scores and usage data typically surface churn risk inside the window when there's limited time to act. Knowledge-grounded conversation signals typically surface it outside that window, when there's still room to change the outcome. Retrospective analysis on churned accounts usually reveals a gap of three to six weeks between when grounded conversation signals first appeared and when health scores dropped. That's the early warning time being left on the table while transcripts sit in unstructured storage.

Hand-sketched timeline showing escalation, stakeholder change, and deferred action signals appearing in CS calls before later usage drop and health score drop. — CS call signals can surface churn risk weeks before traditional health inputs move.

What this changes operationally

When churn risk signals are extracted automatically and grounded in your definitions, CS managers stop making health assessments based on intuition and call notes. They review a structured signal record: which accounts showed escalation language meeting your policy threshold in the last 30 days, which had declining stakeholder engagement across three consecutive calls by your rubric, which haven't mentioned expansion in the last two QBRs as your product defines it. That's a prioritisation tool - it directs attention to accounts that need it before the situation becomes critical, while there's still time to intervene.

For SaaS CS teams managing 40 to 80 accounts per CSM, the question isn't whether signals exist in the transcripts - they always do. The question is whether there's an extraction layer that surfaces them automatically, against your standards, across every account. Manual transcript review at that volume isn't a reliable system. An extraction pipeline that runs on every call and routes grounded signals into your CS platform is.

How Semarize supports CS churn detection

Semarize is built around the combination of structured extraction and knowledge grounding that makes churn signal detection reliable rather than directional. Each signal is defined as a Brick in a Kit: escalation detected (yes/no plus supporting quote), stakeholder engagement score (1-5 with a rubric you define), expansion intent mentioned (yes/no), action item completion referenced (yes/no with context). The same Kit runs against every CS touchpoint - QBRs, check-ins, onboarding calls, executive reviews - and returns a consistent signal record across the account timeline.

Knowledge grounding is applied at the Kit level. Attach your escalation policy, CS methodology, and product expansion documentation and each Brick evaluates against those documents rather than model inference. Critically, each Brick accesses only the knowledge relevant to its specific question - so attention isn't diluted across the full knowledge base. The signals you get back are calibrated to your definitions, not to what a general model assumes churn risk looks like. Because the outputs are structured fields, they route directly into your CS platform via the API - health score inputs update after each call, without a manual review step, across every account.

Semarize extracts deterministic, knowledge-grounded churn risk signals from CS call recordings and returns them as structured fields your health scoring model can use.

Start building →

Common questions

Can we use sentiment at all, or should we drop it entirely?

Sentiment is worth keeping as one signal among several, but it shouldn't be the primary churn indicator. Sentiment captures emotional register; churn risk is more often structural - declining engagement, missing follow-through, stakeholder changes. A customer can sound positive on every call while systematically pulling back. Combine sentiment with the deterministic signals (escalation language, engagement score, action completion) rather than relying on it alone. If sentiment is the only signal you're measuring, you're likely getting a false sense of health on the accounts that matter most.

What does “escalation language” look like in practice for extraction rules?

Escalation language covers phrases that express urgency, unresolved frustration, or explicit dissatisfaction: “this is becoming a blocker,” “we expected this to be fixed by now,” “our team is losing confidence.” Without knowledge grounding, the evaluation uses its own inference about what escalation means. With your escalation policy attached, it checks against your specific threshold - which may be higher or lower than the model's default, and will vary by account tier and customer communication style. The supporting quote is returned alongside the yes/no so the CSM has the specific context to act on, not just a flag.

How do we score stakeholder engagement without making it subjective?

Define the rubric for each score level against observable facts in the transcript. A 5 means the primary stakeholder and at least one additional decision-relevant contact were present, asked substantive questions, and confirmed next steps. A 3 means the primary contact attended but engagement was passive - no questions, no commitments. A 1 means key stakeholders were absent or delegated to someone without authority. Each level is an evidence standard, not a judgment call. Attach this rubric as part of your knowledge base and every call is evaluated against the same definition, regardless of which CSM ran it.

What documents should we attach to a CS churn risk Kit?

Start with three: your escalation policy (so the evaluation knows your specific threshold for a formal escalation versus a standard concern), your CS methodology or success plan template (so the evaluation understands what a healthy touchpoint looks like for your customer tiers), and your product expansion documentation (so expansion intent is checked against your actual product model, not a generic definition). Account-specific context - committed outcomes from the original sale, renewal dates, known stakeholder structure - can be added as account-level knowledge for the most precise signal calibration.

How do we validate that conversation signals predict churn earlier than health scores?

Run a retrospective analysis on churned accounts from the previous 6-12 months. Extract the churn risk signals from their CS call transcripts going back 90 days before churn. Compare when the conversation signals first appeared to when health scores dropped or support tickets spiked. If the conversation signals preceded the other indicators by three to six weeks, they're giving you the early warning window you need. Run this with knowledge-grounded extraction specifically - grounded signals show earlier and more precise onset points because they're calibrated to your definitions rather than model inference.

Continue reading

Conversation Intelligence Doesn't Fail on Calls. It Fails on Knowledge.

Early CI tools were built on ML classifiers - talk ratios, question counts, keyword detection. LLMs changed what's possible. But they introduced a new risk: model knowledge. When scoring runs against what the AI infers from training rather than your pricing, ICP criteria, and qualification playbooks, outputs are plausible and wrong.

Read post

Product

Bricks and Kits: the mechanism for stable conversation evaluation

Freeform prompts produce inconsistent evaluation results - scores drift, output shapes change, and you can't tell whether coaching improved anything or whether the rubric moved. Bricks define a locked evaluation schema: one question, one output type. Kits group them into reusable evaluation workflows. The result is schema-stable conversation analysis you control.

Read post

RevOps

Conversation Intelligence Produces the Signals. Outcomes Depend on What You Build With Them.

CI vendors sell outcomes - better forecasts, improved coaching, higher win rates. The outcome claims are accurate for teams that wire CI signals into their downstream workflows. For teams that don't, the dashboards fill up and the outcomes don't move. The gap between running CI and seeing results is always an implementation gap, not a vendor gap.

Read post