Semarize

Get Your Data

CallMiner - How to Get Your Conversation Data

A practical guide to getting your conversation data out of CallMiner - covering REST API authentication, historical backfill, incremental polling, real-time API flows, and how to route structured data into your downstream systems.

What you'll learn

  • What interaction data you can extract from CallMiner - audio, text, chat, video transcripts, metadata, and speaker labels
  • How to access data via the CallMiner REST API - OAuth 2.0 authentication, endpoints, and pagination
  • Three extraction patterns: historical backfill, incremental polling, and real-time API
  • How to connect CallMiner data pipelines to Zapier, n8n, and Make
  • Advanced use cases - custom compliance scoring, attrition prediction, omnichannel consistency, and warehouse analytics

Data

What Data You Can Extract From CallMiner

CallMiner captures interactions across multiple channels - voice, chat, email, and video. Every interaction produces a set of structured assets that can be extracted via API - the transcript, speaker identification, timing metadata, channel type, and contextual information about the interaction and its associated contact.

Common fields teams care about

Full transcript text (audio, chat, email, video)
Speaker labels (agent vs. customer)
Agent ID and agent name
Channel type (voice / chat / email / video)
Interaction date, time, and duration
Customer identifier and contact metadata
Interaction direction (inbound / outbound)
Disposition and wrap-up codes
OVTS-compatible transcript format
Associated campaign or queue metadata

API Access

How to Get Transcripts via the CallMiner API

CallMiner exposes interactions and transcripts through a REST API secured with OAuth 2.0. The workflow is: obtain an access token from the developer portal, list interactions by date range, then fetch the transcript for each interaction ID.

1

Authenticate with OAuth 2.0

CallMiner uses OAuth 2.0 with client credentials. Register your application at developer.callminer.com to obtain a client_id and client_secret. Exchange them for a Bearer token via the token endpoint.

POST https://auth.callminer.com/oauth/token
Content-Type: application/x-www-form-urlencoded

grant_type=client_credentials
&client_id=<your_client_id>
&client_secret=<your_client_secret>

# Response:
# { "access_token": "eyJ...", "token_type": "Bearer", "expires_in": 3600 }
API access is enterprise or partner-gated. Contact your CallMiner account representative or apply through developer.callminer.com to provision credentials. Tokens expire - implement automatic refresh in your pipeline.
2

List interactions by date range

Call the GET /v1/interactions endpoint with startDate and endDate query parameters. Results are paginated - each response includes an offset or nextPage token to fetch the next page.

GET https://api.callminer.com/v1/interactions?startDate=2025-01-01T00:00:00Z&endDate=2025-02-01T00:00:00Z&limit=100
Authorization: Bearer <access_token>
Content-Type: application/json

The response returns an array of interaction objects with id, channel, agentId, duration, startTime, and associated metadata. Keep paginating until no more results are returned.

3

Fetch the transcript

For each interaction ID, request the transcript via GET /v1/interactions/{id}/transcript. The response contains an array of utterances, each with a speaker role, timestamp, and text segment.

GET https://api.callminer.com/v1/interactions/INT-20250115-00482/transcript
Authorization: Bearer <access_token>

Each utterance in the response includes speakerRole (agent / customer), startTime, endTime, and text. Reassemble into plain text by concatenating utterances, or preserve the structured format for per-speaker analysis. CallMiner also supports OVTS format for cross-platform interoperability.

4

Handle rate limits and transcript availability

Rate limits

CallMiner enforces per-endpoint rate limits that vary by access tier. When you receive a 429 response, back off using the Retry-After header. For bulk operations, pace requests and persist your pagination token between runs.

Transcript timing

Audio transcripts are not available the instant an interaction ends. CallMiner processes recordings asynchronously - typical lag varies by interaction length and system load. Text-based channels (chat, email) are generally available faster. Build a buffer into your extraction timing or implement a retry with exponential backoff.

Patterns

Key Extraction Flows

There are three practical patterns for getting transcripts out of CallMiner. The right choice depends on whether you're doing a one-off migration, running ongoing extraction, or need near real-time processing via CallMiner's real-time API.

Backfill (Historical Export)

One-off migration of past interactions

1

Define your date range — typically 6–12 months of historical interactions, or all available data if migrating off CallMiner’s native analytics

2

Call GET /v1/interactions with startDate and endDate parameters. Paginate through the full result set, collecting all interaction IDs

3

For each interaction ID, fetch the transcript via GET /v1/interactions/{id}/transcript. Pace requests to stay within rate limits

4

Store each transcript with its interaction metadata (interaction ID, date, agent, channel, disposition) in your data warehouse or object store

5

Once the backfill completes, run your analysis pipeline against the stored data in bulk

Tip: Persist your pagination token between batches. If the process is interrupted, you can resume from where you left off instead of re-scanning from the start.

Incremental Polling

Ongoing extraction on a schedule

1

Set a cron job or scheduled trigger (hourly, daily, etc.) that runs your extraction script

2

On each run, call GET /v1/interactions with startDate set to your last successful poll timestamp

3

Fetch transcripts for any new interaction IDs returned. Use the interaction ID as a deduplication key to avoid reprocessing

4

Route each transcript and its metadata to your downstream pipeline — analysis tool, warehouse, or automation platform

5

Update your stored timestamp to the current run time for the next poll cycle

Tip: Account for transcript processing delay on audio channels. An interaction that ended 10 minutes ago may not have a transcript yet. Polling with a 1\u20132 hour lag reduces empty fetches. Text channels are typically available sooner.

Real-Time API

Near real-time on interaction completion

1

Configure a real-time API endpoint or webhook listener in your CallMiner admin settings. CallMiner fires events when an interaction is processed and the transcript becomes available

2

When the event fires, parse the payload to extract the interaction ID and metadata

3

Immediately fetch the transcript via GET /v1/interactions/{id}/transcript using the interaction ID from the event

4

Route the transcript and metadata downstream — to your analysis pipeline, CRM updater, or automation tool

Note: Real-time API availability depends on your CallMiner plan and access tier. Not all accounts have access to real-time event triggers. Check with your CallMiner account representative for your plan's capabilities.

Automation

Send CallMiner Transcripts to Automation Tools

Once you can extract transcripts from CallMiner, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from CallMiner trigger through Semarize evaluation to CRM, Slack, or database output.

ZapierNo-code automation

CallMiner → Zapier → Semarize → CRM

Detect new CallMiner interactions on a schedule, fetch the transcript, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.

Example Zap
Trigger: Schedule (Every Hour)
Polls for new CallMiner interactions
App: Schedule by Zapier
Event: Every Hour
Output: triggers extraction flow
Webhooks by Zapier
List new interactions from CallMiner API
Method: GET
URL: https://api.callminer.com/v1/interactions
Auth: Bearer (OAuth token)
Params: startDate={{last_run}}&limit=50
For each interaction
Webhooks by Zapier
Fetch transcript from CallMiner API
Method: GET
URL: https://api.callminer.com/v1/interactions/{{id}}/transcript
Auth: Bearer (OAuth token)
Transcript returned
Webhooks by Zapier
POST /v1/runs (sync) to Semarize
Method: POST
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output returned
Formatter by Zapier
Extract brick values from Semarize response
Extract: bricks.compliance_score.value
Extract: bricks.empathy_score.value
Extract: bricks.escalation_risk.value
Salesforce - Update Record
Write scored signals to Contact record
Object: Contact
Compliance Score: {{compliance_score}}
Empathy Score: {{empathy_score}}
Escalation Risk: {{escalation_risk}}

Setup steps

1

Create a new Zap. Choose Schedule by Zapier as the trigger and set it to run every hour. This avoids needing a direct CallMiner trigger integration.

2

Add a "Webhooks by Zapier" Action (Custom Request) to list new interactions from CallMiner. Set method to GET, URL to https://api.callminer.com/v1/interactions, add your OAuth Bearer token, and pass startDate as a parameter.

3

Add another "Webhooks by Zapier" Action to fetch the transcript for each interaction. Set method to GET, URL to https://api.callminer.com/v1/interactions/{{id}}/transcript with the Bearer token.

4

Add a third "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the transcript text into input.transcript.

5

Add a Formatter step to extract individual brick values from the Semarize JSON response — compliance_score, empathy_score, escalation_risk, etc.

6

Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record.

7

Test each step end-to-end, then turn on the Zap.

Watch out for: Zapier has step data size limits that can truncate very long transcripts. For interactions over 60 minutes, consider storing the transcript in cloud storage and passing a reference URL instead of inline text. Use mode: "sync" so Semarize returns results inline - Zapier doesn't natively support polling loops.
Learn more about Zapier automation
n8nSelf-hosted workflows

CallMiner → n8n → Semarize → Database

Poll CallMiner for new interactions on a schedule, fetch transcripts, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles pagination and batch processing.

Example Workflow
Cron - Every Hour
Triggers the workflow on schedule
Mode: Every Hour
Timezone: UTC
HTTP Request - List Interactions
GET /v1/interactions (CallMiner)
Method: GET
URL: https://api.callminer.com/v1/interactions
Auth: Bearer (OAuth token)
Params: startDate={{$now.minus(1, 'hour')}}&limit=100
For each interaction ID
HTTP Request - Fetch Transcript
GET /v1/interactions/{id}/transcript (CallMiner)
URL: https://api.callminer.com/v1/interactions/{{$json.id}}/transcript
Code - Reassemble Transcript
Concatenate utterances into plain text
Join: utterances[].text by speakerRole
HTTP Request - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Scores & signals returned
Postgres - Insert Row
Write structured output to database
Table: interaction_evaluations
Columns: interaction_id, agent_id, channel, compliance_score, empathy_score

Setup steps

1

Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most contact center volumes).

2

Add an HTTP Request node to list new interactions from CallMiner. Set method to GET, URL to https://api.callminer.com/v1/interactions, configure OAuth Bearer auth, and set startDate to one interval ago.

3

Add a Split In Batches node to iterate over the returned interaction IDs. Inside the loop, add an HTTP Request node to fetch each transcript via GET /v1/interactions/{id}/transcript.

4

Add a Code node (JavaScript) to reassemble the utterances array into a single transcript string. Join each utterance’s text, prefixed by speaker role.

5

Add another HTTP Request node to send the transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.

6

Add a Code node to extract the brick values from the Semarize response — compliance_score, empathy_score, escalation_risk, evidence, confidence.

7

Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use interaction_id as the primary key for upserts.

8

Activate the workflow. Monitor the first few runs to verify Semarize responses are arriving and writing correctly.

Watch out for: Use interaction IDs as deduplication keys to prevent reprocessing. You can also use async mode with n8n's native loop - POST /v1/runs (default async), then poll GET /v1/runs/:runId with a Wait + IF loop until status is "succeeded".
Learn more about n8n automation
MakeVisual automation with branching

CallMiner → Make → Semarize → CRM + Slack

Fetch new CallMiner transcripts on a schedule, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on compliance flags via Slack and write all signals to your CRM.

Example Scenario
Schedule - Every 30 min
Triggers the scenario on interval
Interval: 30 minutes
HTTP - List New Interactions
GET /v1/interactions (CallMiner)
Method: GET
Auth: Bearer (OAuth token)
Params: startDate={{formatDate(...)}}&limit=100
HTTP - Fetch Transcript
GET /v1/interactions/{id}/transcript (per interaction)
Iterator: for each interaction in response
URL: /v1/interactions/{{item.id}}/transcript
HTTP - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output
Router - Branch on Compliance Flag
Route by Semarize output
Branch 1: IF compliance_score < 0.7
Branch 2: ALL (fallthrough)
Branch 1 - Compliance risk
Slack - Alert Channel
Notify team about flagged interaction
Channel: #compliance-alerts
Message: Low compliance on {{interaction_id}}, score: {{score}}
Branch 2 - All interactions
Salesforce - Update Record
Write all scored signals to Contact
Compliance Score: {{compliance_score}}
Empathy Score: {{empathy_score}}
Escalation Risk: {{escalation_risk}}

Setup steps

1

Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15–60 minutes is typical for contact center volumes).

2

Add an HTTP module to list new interactions from CallMiner. Set method to GET, URL to https://api.callminer.com/v1/interactions, configure OAuth Bearer auth, and filter by startDate since the last run.

3

Add an Iterator module to loop through each interaction. For each, add an HTTP module to fetch the transcript via GET /v1/interactions/{id}/transcript.

4

Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the previous step. Parse the response as JSON.

5

Add a Router module. Define Branch 1 with a filter: bricks.compliance_score.value less than 0.7. Leave Branch 2 as a fallthrough (no filter).

6

On Branch 1, add a Slack module to alert your compliance team when a low score is detected. Map the score, interaction ID, and agent into the message.

7

On Branch 2, add a Salesforce module to write all brick values (compliance_score, empathy_score, escalation_risk) to the Contact record.

8

Set the scenario schedule and activate. Monitor the first few runs in Make’s execution log.

Watch out for: Each API call counts as an operation. A scenario processing 50 interactions uses ~150 operations (list + transcript + Semarize per interaction). Use mode: "sync" to avoid needing a polling loop for each run.
Learn more about Make automation

What you can build

What You Can Do With CallMiner Data in Semarize

Semarize delivers portable compliance scoring, attrition prediction, consistent omnichannel measurement, and the ability to build your own analytics on structured conversation signals from CallMiner.

Custom Scoring Framework Portability

Compliance on Your Terms

What Semarize generates

framework_version = "TCPA-v2026.1"disclosure_compliance = 0.94prohibited_phrases = 0evidence_linked = true

Your compliance team needs scores that match your exact regulatory framework — updated on your timeline, against your jurisdiction’s requirements. Pull interaction transcripts from CallMiner and run them through your own compliance kit in Semarize. You define the exact disclosure sequences, consent language, and prohibited phrases for your jurisdiction. When regulations change, you update your Semarize kit the same day. The structured output feeds directly into your compliance database. Auditors get evidence-backed scores against your framework, with every violation linked to the exact transcript evidence.

Learn more about QA & Compliance
Compliance Framework ComparisonSame Day Update
CallMiner Default
Coverage68%
Update cadenceQuarterly
Your Custom Framework
Coverage94%
Update cadenceSame Day
Disclosure sequence
Consent language (state-specific)
Prohibited phrases (TCPA v2026.1)
Mini-Miranda compliance
4 rules checked · Custom framework covers all · Default misses 2

Agent Attrition Prediction Model

Workforce Intelligence

What Semarize generates

frustration_trend = "rising"coaching_receptivity = 0.42attrition_risk = 0.78early_warning_weeks = 6

Your workforce planning team wants to predict which agents will leave within 90 days. Pull 12 months of transcripts and score every interaction through an agent wellbeing kit. Semarize extracts frustration_frequency, coaching_receptivity, performance_trend_slope, and customer_escalation_rate per agent per month. Feed the structured output into a gradient boosting model. The model identifies that agents with declining coaching_receptivity AND rising frustration_frequency churn within 90 days with 78% accuracy. HR intervenes with targeted support 6 weeks earlier.

Learn more about Data Science
Agent Attrition Risk - 90 Day Window78% accuracy
Agent R. Torres78%
receptivity: 0.42
Intervention Triggered
Agent K. Patel61%
receptivity: 0.55
Watch
Agent M. Chen34%
receptivity: 0.71
Healthy
Agent J. Brooks82%
receptivity: 0.38
Intervention Triggered
HR intervenes 6 weeks earlier with targeted support

Omnichannel Experience Consistency

Unified CX Scoring

What Semarize generates

phone_empathy = 0.81chat_empathy = 0.59email_empathy = 0.72gap = 22%

Your contact center handles calls, chats, and emails through CallMiner. CallMiner scores each channel separately with different models. Your CX team needs one consistent score. Pull transcripts from all channels and run them through the same Semarize experience quality kit. Every interaction — regardless of channel — gets scored for empathy_demonstrated, resolution_clarity, effort_reduction, and brand_alignment. A quarterly report shows that chat interactions score 22% lower on empathy than phone calls. The training team builds a chat-specific empathy module and scores normalise within 8 weeks.

Learn more about Customer Success
Omnichannel Experience ConsistencyScored by same Semarize kit
ChannelEmpathyResolutionEffortBrand
Phone0.810.770.690.74
Chat0.590.720.650.68
Email0.720.80.740.71
Chat empathy is 22% lower than phone — training module deployed, scores normalised in 8 weeks

Custom Speech Analytics Data Lake

Structured Pipeline to Snowflake

Vibe-coded

What Semarize generates

daily_interactions = 2,500+typed_columns = 7pipeline_latency = "< 5min"storage = "Snowflake"

A data engineering lead vibe-codes an Airflow pipeline that exports every CallMiner interaction via API, scores it through Semarize, and lands typed rows in Snowflake. Each interaction becomes a row with: agent_id, channel, compliance_score (float), empathy_score (float), resolution_achieved (bool), escalation_risk (float), topic_primary (varchar). dbt models build agent daily scorecards, compliance trend reports, and CSAT prediction features. The BI team builds Tableau dashboards on conversation data that’s queryable, joinable, and fully owned by the organisation.

Learn more about RevOps
Speech Analytics Data Lake PipelineVibe-coded with Airflow
CallMiner
REST API
Semarize
Structured JSON
Snowflake
Typed Rows
Tableau
Dashboards
Snowflake schema · 7 typed columns
agent_id(varchar)
channel(varchar)
compliance_score(float)
empathy_score(float)
resolution_achieved(bool)
escalation_risk(float)
topic_primary(varchar)
2,500+ daily interactions·< 5min latency·Owned by the organisation

Watch out for

Common Challenges & Gotchas

These are the issues that come up most often when teams start extracting transcripts from CallMiner at scale.

Enterprise / partner-gated access

CallMiner API access is not self-serve. You need to work with your account representative or apply through the developer portal. Budget time for provisioning — it can take days to weeks depending on your agreement.

OAuth 2.0 token management

CallMiner uses OAuth 2.0 for authentication. Access tokens expire and must be refreshed. If your automation does not handle token refresh gracefully, requests will start failing silently after the token TTL.

Multi-channel data shape differences

Audio, chat, email, and video interactions return different metadata fields. A pipeline built for audio transcripts may miss fields from chat interactions or break on missing speaker labels in email threads.

API rate limits

Exceeding rate limits results in throttled responses. Implement exponential backoff and pace bulk operations to avoid hitting ceilings, especially during large historical backfills.

Transcript processing delays

Audio interactions require transcription before data is available via API. Attempting to fetch a transcript too soon after an interaction ends will return empty or incomplete data. Build in a delay or retry mechanism.

Large payload sizes at scale

Contact centers generate thousands of interactions daily. Fetching all interactions in a single request is not feasible. Plan for pagination, batching, and incremental processing from the start.

Duplicate processing protection

Without idempotency checks, re-running an extraction flow can process the same interaction twice. Use interaction IDs as deduplication keys to ensure each transcript is handled exactly once.

FAQ

Frequently Asked Questions

Explore

Explore Semarize