Get Your Data
CallMiner - How to Get Your Conversation Data
A practical guide to getting your conversation data out of CallMiner - covering REST API authentication, historical backfill, incremental polling, real-time API flows, and how to route structured data into your downstream systems.
What you'll learn
- What interaction data you can extract from CallMiner - audio, text, chat, video transcripts, metadata, and speaker labels
- How to access data via the CallMiner REST API - OAuth 2.0 authentication, endpoints, and pagination
- Three extraction patterns: historical backfill, incremental polling, and real-time API
- How to connect CallMiner data pipelines to Zapier, n8n, and Make
- Advanced use cases - custom compliance scoring, attrition prediction, omnichannel consistency, and warehouse analytics
Data
What Data You Can Extract From CallMiner
CallMiner captures interactions across multiple channels - voice, chat, email, and video. Every interaction produces a set of structured assets that can be extracted via API - the transcript, speaker identification, timing metadata, channel type, and contextual information about the interaction and its associated contact.
Common fields teams care about
API Access
How to Get Transcripts via the CallMiner API
CallMiner exposes interactions and transcripts through a REST API secured with OAuth 2.0. The workflow is: obtain an access token from the developer portal, list interactions by date range, then fetch the transcript for each interaction ID.
Authenticate with OAuth 2.0
CallMiner uses OAuth 2.0 with client credentials. Register your application at developer.callminer.com to obtain a client_id and client_secret. Exchange them for a Bearer token via the token endpoint.
POST https://auth.callminer.com/oauth/token
Content-Type: application/x-www-form-urlencoded
grant_type=client_credentials
&client_id=<your_client_id>
&client_secret=<your_client_secret>
# Response:
# { "access_token": "eyJ...", "token_type": "Bearer", "expires_in": 3600 }List interactions by date range
Call the GET /v1/interactions endpoint with startDate and endDate query parameters. Results are paginated - each response includes an offset or nextPage token to fetch the next page.
GET https://api.callminer.com/v1/interactions?startDate=2025-01-01T00:00:00Z&endDate=2025-02-01T00:00:00Z&limit=100 Authorization: Bearer <access_token> Content-Type: application/json
The response returns an array of interaction objects with id, channel, agentId, duration, startTime, and associated metadata. Keep paginating until no more results are returned.
Fetch the transcript
For each interaction ID, request the transcript via GET /v1/interactions/{id}/transcript. The response contains an array of utterances, each with a speaker role, timestamp, and text segment.
GET https://api.callminer.com/v1/interactions/INT-20250115-00482/transcript Authorization: Bearer <access_token>
Each utterance in the response includes speakerRole (agent / customer), startTime, endTime, and text. Reassemble into plain text by concatenating utterances, or preserve the structured format for per-speaker analysis. CallMiner also supports OVTS format for cross-platform interoperability.
Handle rate limits and transcript availability
Rate limits
CallMiner enforces per-endpoint rate limits that vary by access tier. When you receive a 429 response, back off using the Retry-After header. For bulk operations, pace requests and persist your pagination token between runs.
Transcript timing
Audio transcripts are not available the instant an interaction ends. CallMiner processes recordings asynchronously - typical lag varies by interaction length and system load. Text-based channels (chat, email) are generally available faster. Build a buffer into your extraction timing or implement a retry with exponential backoff.
Patterns
Key Extraction Flows
There are three practical patterns for getting transcripts out of CallMiner. The right choice depends on whether you're doing a one-off migration, running ongoing extraction, or need near real-time processing via CallMiner's real-time API.
Backfill (Historical Export)
One-off migration of past interactions
Define your date range — typically 6–12 months of historical interactions, or all available data if migrating off CallMiner’s native analytics
Call GET /v1/interactions with startDate and endDate parameters. Paginate through the full result set, collecting all interaction IDs
For each interaction ID, fetch the transcript via GET /v1/interactions/{id}/transcript. Pace requests to stay within rate limits
Store each transcript with its interaction metadata (interaction ID, date, agent, channel, disposition) in your data warehouse or object store
Once the backfill completes, run your analysis pipeline against the stored data in bulk
Incremental Polling
Ongoing extraction on a schedule
Set a cron job or scheduled trigger (hourly, daily, etc.) that runs your extraction script
On each run, call GET /v1/interactions with startDate set to your last successful poll timestamp
Fetch transcripts for any new interaction IDs returned. Use the interaction ID as a deduplication key to avoid reprocessing
Route each transcript and its metadata to your downstream pipeline — analysis tool, warehouse, or automation platform
Update your stored timestamp to the current run time for the next poll cycle
Real-Time API
Near real-time on interaction completion
Configure a real-time API endpoint or webhook listener in your CallMiner admin settings. CallMiner fires events when an interaction is processed and the transcript becomes available
When the event fires, parse the payload to extract the interaction ID and metadata
Immediately fetch the transcript via GET /v1/interactions/{id}/transcript using the interaction ID from the event
Route the transcript and metadata downstream — to your analysis pipeline, CRM updater, or automation tool
Automation
Send CallMiner Transcripts to Automation Tools
Once you can extract transcripts from CallMiner, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from CallMiner trigger through Semarize evaluation to CRM, Slack, or database output.
CallMiner → Zapier → Semarize → CRM
Detect new CallMiner interactions on a schedule, fetch the transcript, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.
Setup steps
Create a new Zap. Choose Schedule by Zapier as the trigger and set it to run every hour. This avoids needing a direct CallMiner trigger integration.
Add a "Webhooks by Zapier" Action (Custom Request) to list new interactions from CallMiner. Set method to GET, URL to https://api.callminer.com/v1/interactions, add your OAuth Bearer token, and pass startDate as a parameter.
Add another "Webhooks by Zapier" Action to fetch the transcript for each interaction. Set method to GET, URL to https://api.callminer.com/v1/interactions/{{id}}/transcript with the Bearer token.
Add a third "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the transcript text into input.transcript.
Add a Formatter step to extract individual brick values from the Semarize JSON response — compliance_score, empathy_score, escalation_risk, etc.
Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record.
Test each step end-to-end, then turn on the Zap.
CallMiner → n8n → Semarize → Database
Poll CallMiner for new interactions on a schedule, fetch transcripts, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles pagination and batch processing.
Setup steps
Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most contact center volumes).
Add an HTTP Request node to list new interactions from CallMiner. Set method to GET, URL to https://api.callminer.com/v1/interactions, configure OAuth Bearer auth, and set startDate to one interval ago.
Add a Split In Batches node to iterate over the returned interaction IDs. Inside the loop, add an HTTP Request node to fetch each transcript via GET /v1/interactions/{id}/transcript.
Add a Code node (JavaScript) to reassemble the utterances array into a single transcript string. Join each utterance’s text, prefixed by speaker role.
Add another HTTP Request node to send the transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.
Add a Code node to extract the brick values from the Semarize response — compliance_score, empathy_score, escalation_risk, evidence, confidence.
Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use interaction_id as the primary key for upserts.
Activate the workflow. Monitor the first few runs to verify Semarize responses are arriving and writing correctly.
CallMiner → Make → Semarize → CRM + Slack
Fetch new CallMiner transcripts on a schedule, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on compliance flags via Slack and write all signals to your CRM.
Setup steps
Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15–60 minutes is typical for contact center volumes).
Add an HTTP module to list new interactions from CallMiner. Set method to GET, URL to https://api.callminer.com/v1/interactions, configure OAuth Bearer auth, and filter by startDate since the last run.
Add an Iterator module to loop through each interaction. For each, add an HTTP module to fetch the transcript via GET /v1/interactions/{id}/transcript.
Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the previous step. Parse the response as JSON.
Add a Router module. Define Branch 1 with a filter: bricks.compliance_score.value less than 0.7. Leave Branch 2 as a fallthrough (no filter).
On Branch 1, add a Slack module to alert your compliance team when a low score is detected. Map the score, interaction ID, and agent into the message.
On Branch 2, add a Salesforce module to write all brick values (compliance_score, empathy_score, escalation_risk) to the Contact record.
Set the scenario schedule and activate. Monitor the first few runs in Make’s execution log.
What you can build
What You Can Do With CallMiner Data in Semarize
Semarize delivers portable compliance scoring, attrition prediction, consistent omnichannel measurement, and the ability to build your own analytics on structured conversation signals from CallMiner.
Custom Scoring Framework Portability
Compliance on Your Terms
What Semarize generates
Your compliance team needs scores that match your exact regulatory framework — updated on your timeline, against your jurisdiction’s requirements. Pull interaction transcripts from CallMiner and run them through your own compliance kit in Semarize. You define the exact disclosure sequences, consent language, and prohibited phrases for your jurisdiction. When regulations change, you update your Semarize kit the same day. The structured output feeds directly into your compliance database. Auditors get evidence-backed scores against your framework, with every violation linked to the exact transcript evidence.
Learn more about QA & ComplianceAgent Attrition Prediction Model
Workforce Intelligence
What Semarize generates
Your workforce planning team wants to predict which agents will leave within 90 days. Pull 12 months of transcripts and score every interaction through an agent wellbeing kit. Semarize extracts frustration_frequency, coaching_receptivity, performance_trend_slope, and customer_escalation_rate per agent per month. Feed the structured output into a gradient boosting model. The model identifies that agents with declining coaching_receptivity AND rising frustration_frequency churn within 90 days with 78% accuracy. HR intervenes with targeted support 6 weeks earlier.
Learn more about Data ScienceOmnichannel Experience Consistency
Unified CX Scoring
What Semarize generates
Your contact center handles calls, chats, and emails through CallMiner. CallMiner scores each channel separately with different models. Your CX team needs one consistent score. Pull transcripts from all channels and run them through the same Semarize experience quality kit. Every interaction — regardless of channel — gets scored for empathy_demonstrated, resolution_clarity, effort_reduction, and brand_alignment. A quarterly report shows that chat interactions score 22% lower on empathy than phone calls. The training team builds a chat-specific empathy module and scores normalise within 8 weeks.
Learn more about Customer SuccessCustom Speech Analytics Data Lake
Structured Pipeline to Snowflake
What Semarize generates
A data engineering lead vibe-codes an Airflow pipeline that exports every CallMiner interaction via API, scores it through Semarize, and lands typed rows in Snowflake. Each interaction becomes a row with: agent_id, channel, compliance_score (float), empathy_score (float), resolution_achieved (bool), escalation_risk (float), topic_primary (varchar). dbt models build agent daily scorecards, compliance trend reports, and CSAT prediction features. The BI team builds Tableau dashboards on conversation data that’s queryable, joinable, and fully owned by the organisation.
Learn more about RevOpsWatch out for
Common Challenges & Gotchas
These are the issues that come up most often when teams start extracting transcripts from CallMiner at scale.
Enterprise / partner-gated access
CallMiner API access is not self-serve. You need to work with your account representative or apply through the developer portal. Budget time for provisioning — it can take days to weeks depending on your agreement.
OAuth 2.0 token management
CallMiner uses OAuth 2.0 for authentication. Access tokens expire and must be refreshed. If your automation does not handle token refresh gracefully, requests will start failing silently after the token TTL.
Multi-channel data shape differences
Audio, chat, email, and video interactions return different metadata fields. A pipeline built for audio transcripts may miss fields from chat interactions or break on missing speaker labels in email threads.
API rate limits
Exceeding rate limits results in throttled responses. Implement exponential backoff and pace bulk operations to avoid hitting ceilings, especially during large historical backfills.
Transcript processing delays
Audio interactions require transcription before data is available via API. Attempting to fetch a transcript too soon after an interaction ends will return empty or incomplete data. Build in a delay or retry mechanism.
Large payload sizes at scale
Contact centers generate thousands of interactions daily. Fetching all interactions in a single request is not feasible. Plan for pagination, batching, and incremental processing from the start.
Duplicate processing protection
Without idempotency checks, re-running an extraction flow can process the same interaction twice. Use interaction IDs as deduplication keys to ensure each transcript is handled exactly once.
FAQ
Frequently Asked Questions
Explore