Get Your Data
Salesforce - How to Get Your Conversation Data
A practical guide to getting your conversation data from Salesforce - covering the Salesforce API, Einstein Conversation Insights, Service Cloud Voice recordings, and how to route structured data into downstream systems.
What you'll learn
- What conversation data you can extract from Salesforce - call logs, Einstein Conversation Insights transcripts, Service Cloud Voice recordings, and activity data
- How to access data via the Salesforce API - Connected Apps, OAuth 2.0, and SOQL queries
- Three extraction patterns: SOQL-based export, scheduled polling, and Platform Event-driven flows
- How to connect Salesforce data pipelines to Zapier, n8n, and Make
- Advanced use cases - custom scoring, CRM enrichment, compliance, and warehouse analytics
Data
What Data You Can Extract From Salesforce
Salesforce stores conversation data across multiple objects - Task records for call activity, VoiceCall records for Service Cloud Voice, and ConversationEntry objects for Einstein Conversation Insights. Each source provides different levels of detail depending on your org's configuration and add-ons.
Common fields teams care about
API Access
How to Get Call Data via the Salesforce API
Salesforce exposes call data through its REST API. The workflow is: authenticate via a Connected App with OAuth 2.0, query call records using SOQL, then fetch recordings and transcripts from the relevant objects.
Authenticate
Create a Connected App in Salesforce Setup (Setup → App Manager → New Connected App). Enable OAuth 2.0 with scopes: api, refresh_token, offline_access. Use the OAuth 2.0 JWT bearer or web server flow for app-to-app auth.
Authorization: Bearer {access_token}Query call records
For Task-based calls, Service Cloud Voice records, and Einstein Conversation Insights, use SOQL queries via the REST API endpoint GET /services/data/v59.0/query?q={SOQL}.
-- Task-based calls
SELECT Id, Subject, Description, CallDurationInSeconds,
CallType, ActivityDate, WhoId, WhatId
FROM Task
WHERE TaskSubtype = 'Call'
AND ActivityDate >= 2026-01-01
-- Service Cloud Voice
SELECT Id, CallType, CallDurationInSeconds,
FromPhoneNumber, ToPhoneNumber, VendorCallKey
FROM VoiceCall
WHERE CreatedDate >= 2026-01-01T00:00:00Z
-- Einstein Conversation Insights
-- Query ConversationEntry objects for transcript segmentsUse the REST API: GET /services/data/v59.0/query?q={SOQL}. Results are paginated - each response includes a nextRecordsUrl if more records exist.
Access recordings and transcripts
Service Cloud Voice recordings can be accessed via the VoiceCall content endpoint. Einstein Conversation Insights provides transcript segments through ConversationEntry objects with speaker labels. Third-party CTI recordings require following the provider's URL/API for recording access.
-- Service Cloud Voice recording
GET /services/data/v59.0/sobjects/VoiceCall/{id}/Content
-- Einstein Conversation Insights
-- Query ConversationEntry objects for transcript
-- segments with speaker labels
-- Third-party CTI recordings
-- Follow the provider's URL/API for recording accessEinstein Conversation Insights returns transcript data as ConversationEntry records, each with a speaker ID and text segment. Third-party CTI providers (e.g., Five9, RingCentral) store recordings externally - check their documentation for API access.
Handle authentication and limits
API limits
Salesforce enforces concurrent API call limits based on org edition (API calls per 24-hour period). Monitor usage via SELECT COUNT() FROM ApiEvent.
Einstein requirements
Einstein Conversation Insights requires Sales Cloud Einstein or Service Cloud Einstein. Not all orgs have it enabled. Check your org's entitlements before building transcript-dependent flows.
Patterns
Key Extraction Flows
There are three practical patterns for getting call data out of Salesforce. The right choice depends on whether you're doing a one-off migration, running ongoing extraction, or need near real-time processing.
Backfill (Historical Export)
One-off migration of past call data
Create a Connected App with the necessary OAuth scopes (api, refresh_token, offline_access)
Write a SOQL query for Tasks or VoiceCalls filtered by date range
Execute the query with pagination - use queryMore for result sets over 2,000 records. Salesforce returns a nextRecordsUrl for pagination
Fetch recordings where available via the VoiceCall content endpoint or provider API
Send each transcript and call metadata to Semarize for structured analysis
Incremental Polling
Ongoing extraction on a schedule
Schedule a job (cron, Lambda, etc.) that runs your extraction script at regular intervals
Query Tasks or VoiceCalls modified since the last run using SystemModstamp or LastModifiedDate
Filter out already-processed record IDs to avoid reprocessing
Fetch recordings and transcripts for new or updated records
Route each transcript and its metadata to Semarize for structured analysis
Platform Event-Driven
Near real-time on record creation
Create a Platform Event or enable Change Data Capture (CDC) on Task or VoiceCall objects in Salesforce Setup
Subscribe via CometD or an Apex trigger that listens for new call records
When new call data is captured, push the record to your processing endpoint
Fetch the recording and process via Semarize for structured analysis
Automation
Send Salesforce Call Data to Automation Tools
Once you can extract call data from Salesforce, the next step is routing it through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from Salesforce trigger through Semarize evaluation to CRM, Slack, or database output.
Salesforce → Zapier → Semarize → CRM
Detect new Salesforce call Tasks, fetch the recording, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - back to your Salesforce Opportunity.
Setup steps
Create a new Zap. Choose Salesforce as the trigger app and select "New Record" as the event. Set the object to Task and connect your Salesforce account.
Add a Filter step to only continue when TaskSubtype equals 'Call'. This prevents non-call activities from triggering the flow.
Add a "Webhooks by Zapier" Action (Custom Request) to fetch the recording from your telephony provider. Map the call ID or vendor call key from the Task record.
Add a second "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the transcript text into input.transcript.
Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, risk_flag, pain_point, etc.
Add a Salesforce Action to write the extracted scores and signals back to the related Opportunity record. Test each step end-to-end, then turn on the Zap.
Salesforce → n8n → Semarize → Database
Poll Salesforce for new call records on a schedule, fetch recordings, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's built-in Salesforce node handles auth and pagination automatically.
Setup steps
Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).
Add a Salesforce node. Configure OAuth credentials for your Connected App. Set the operation to SOQL Query and write your query to fetch call Tasks modified since the last run.
Add a Split In Batches node to iterate over the returned call records. Inside the loop, add an HTTP Request node to fetch each recording from your telephony provider.
Add a Code node (JavaScript) to prepare the call data - combine metadata from the Salesforce record with the transcript or recording content.
Add another HTTP Request node to send the data to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.
Add a Code node to extract the brick values from the Semarize response - overall_score, risk_flag, pain_point, evidence, confidence.
Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use call_id as the primary key for upserts.
Activate the workflow. Monitor the first few runs to verify Semarize responses are arriving and writing correctly.
Salesforce → Make → Semarize → CRM + Slack
Receive new Salesforce call activity via webhook, fetch the recording, send it to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals back to your CRM.
Setup steps
Create a new Scenario. Add a Webhook module as the trigger - configure it to receive events from Salesforce Outbound Messages or Platform Events.
In Salesforce Setup, configure an Outbound Message or Platform Event on the Task object (filtered to call Tasks) that sends data to your Make webhook URL.
Add an HTTP module to fetch the recording from your telephony provider. Map the call ID or vendor call key from the webhook payload.
Add another HTTP module to send the recording/transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the previous step. Parse the response as JSON.
Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).
On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and call ID into the message.
On Branch 2, add a Salesforce module to write all brick values (score, risk_flag, pain_point) back to the Opportunity record.
Activate the scenario. Monitor the first few runs in Make's execution log.
What you can build
What You Can Do With Salesforce Data in Semarize
Salesforce stores your data. Semarize structures it. When conversation content is evaluated against your own frameworks and returned as typed, programmable output, new possibilities open up.
Custom QA Rubric Scoring
Contact Center QA
What Semarize generates
Your contact center runs 500 calls per day. Your QA team has a 40-point rubric covering resolution quality, empathy, troubleshooting thoroughness, and escalation handling — and needs every call scored against it. Semarize evaluates every call against YOUR rubric, returning typed scores for each dimension. QA coverage goes from 5% random sampling to 100% automated evaluation. The QA team shifts from scoring calls to coaching on the scores.
Learn more about QA & ComplianceGrounded against: QA Rubric v6
Knowledge-Grounded Resolution Accuracy
Policy & Product Verification
What Semarize generates
Your support team handles hundreds of calls daily. When agents quote return windows, warranty terms, or troubleshooting sequences, are they getting it right? Run a knowledge-grounded kit against your product documentation, return policies, and troubleshooting guides on every call. Semarize checks whether the return window quoted was accurate, whether the warranty terms matched the current policy, and whether troubleshooting steps followed the approved sequence. After scoring 3,000 calls, you discover that 12% of agents cite outdated return policy terms. The cost of honouring incorrect promises drops immediately once you target the specific agents and the specific policy sections they get wrong.
Learn more about QA & ComplianceFrustration detected, no resolution attempted
Technical issue, agent lacked product knowledge
Standard request, minor confusion
No frustration, feature feedback captured
Predicted to prevent 8 unnecessary escalations today
Cross-Call Commitment Continuity Scoring
Follow-Through Analysis
What Semarize generates
Action items get captured after every meeting — but do those commitments actually carry through to the next call? Run pairs of consecutive meeting transcripts through a commitment tracking kit. Semarize compares commitments_made in meeting N with commitments_referenced in meeting N+1, scoring follow_through_rate, dropped_commitments, and new_blockers_introduced. Across your sales team, the data shows a 41% commitment drop-off between discovery and demo calls. Deals where commitments carry through close at 2.8x the rate. Pipeline reviews now include commitment continuity as a deal health metric.
Learn more about RevOpsChat → Phone handoff lost context. Customer repeated issue.
Custom Win/Loss Conversation Evidence Engine
Outcome-Linked Signal Analysis
What Semarize generates
A data analyst vibe-codes a Retool app that pulls Semarize scores from every transcript associated with closed deals. The app correlates conversation signals with outcomes: which Brick values predict wins vs losses? After analysing 500 closed deals, the model reveals that deals where budget_confirmed=true AND decision_maker_present=true AND discovery_depth>65 close at 4x the rate. The team updates their deal qualification criteria based on actual conversation evidence — not CRM checkbox data.
Learn more about RevOpsWatch out for
Common Challenges & Gotchas
These are the issues that come up most often when teams start extracting call data from Salesforce at scale.
Multiple call data sources
Salesforce stores call data across Task objects, VoiceCall records, and Einstein Conversation entries. You may need to query multiple objects.
Einstein Conversation Insights requires add-on
Full transcript access requires Einstein for Sales or Service. Without it, you only get Task metadata and notes.
API call limits vary by edition
Salesforce enforces daily API call limits (e.g., 15,000 for Enterprise Edition). Bulk operations should use the Bulk API to conserve limits.
Recording access varies by provider
Service Cloud Voice uses Amazon Connect. Third-party CTI providers store recordings externally. Each has different access patterns.
SOQL query complexity
Querying related records (call → contact → account → opportunity) requires multiple queries or relationship queries. Plan your data model carefully.
Sandbox vs. production differences
API endpoints and data differ between sandbox and production. Always test in sandbox before deploying to production.
Change Data Capture setup
CDC requires admin configuration and has per-org event delivery limits. Monitor event bus capacity for high-volume orgs.
FAQ
Frequently Asked Questions
Explore