Semarize

Get Your Data

Dialpad - How to Get Your Conversation Data

A practical guide to getting your conversation data out of Dialpad - covering API access, per-call transcript extraction, Ai Moments, webhook-triggered flows, and how to route structured data into your downstream systems.

What you'll learn

  • What conversation data you can extract from Dialpad - transcripts, Ai Moments, call recordings, and metadata
  • How to access data via the Dialpad API - authentication, endpoints, and per-call transcript retrieval
  • Two extraction patterns: batch polling and webhook-triggered via call_transcription events
  • How to connect Dialpad data pipelines to Zapier, n8n, and Make
  • Advanced use cases - agent QA scoring, contact center analytics, moment trend analysis, and custom dashboards

Data

What Data You Can Extract From Dialpad

Dialpad is a cloud communications platform with built-in Ai. Every call produces a transcript, moment detections, and rich metadata that can be extracted via API - the transcript text, speaker identification, Ai-detected key phrases and action items, call recordings, and contextual call detail records.

Common fields teams care about

Full transcript text (per-call)
Speaker labels (agent vs. caller)
Ai Moments (key phrases & action items)
Call recording (secure blob URL)
Call date, time, and duration
Call direction (inbound / outbound)
Contact / caller ID and name
Department or call center queue
Disposition and call outcome
CSAT score (if post-call survey enabled)

API Access

How to Get Transcripts via the Dialpad API

Dialpad exposes call data and transcripts through a REST API at developers.dialpad.com. The workflow is: authenticate with an API key or OAuth token, list calls to get call IDs, then fetch the transcript for each call individually.

1

Authenticate

Dialpad supports two authentication methods: API key and OAuth 2.0. For automation and server-to-server integrations, an API key is simplest - generate one in the Dialpad admin console under Integrations, then pass it as a Bearer token or apikey query parameter on every request.

Authorization: Bearer <your_api_key>
Content-Type: application/json

# Or as a query parameter:
GET https://dialpad.com/api/v2/calls?apikey=<your_api_key>
Your API key needs admin-level permissions to access transcripts and call recordings. For OAuth, request the calls.read and transcripts.read scopes. Contact your Dialpad admin to provision credentials if you don't have them.
2

List calls to get call IDs

Use the GET /api/v2/stats/calls endpoint or the call events webhook to collect call IDs. Filter by date range and pagination parameters. Each call object includes a call_id you will use to fetch the transcript.

GET https://dialpad.com/api/v2/stats/calls
    ?start_date=2025-01-01T00:00:00Z
    &end_date=2025-02-01T00:00:00Z
    &limit=100
    &cursor=<next_page_cursor>

The response returns an array of call objects with call_id, date_started, duration, direction, and participant details. Keep paginating using the cursor until no more results are returned.

3

Fetch the transcript

For each call ID, request the transcript via GET /api/v2/transcripts/{call_id}. The response contains the transcript text, speaker labels, timestamps, and any detected Ai Moments (key phrases and action items).

GET https://dialpad.com/api/v2/transcripts/5678901234

// Response includes:
{
  "call_id": "5678901234",
  "transcript": [
    {
      "speaker": "Agent - Sarah M.",
      "text": "Thanks for calling, how can I help?",
      "timestamp": 0.5
    },
    ...
  ],
  "moments": [
    { "type": "action_item", "text": "Follow up on billing" },
    { "type": "keyword", "text": "competitor mentioned" }
  ]
}

Each entry in the transcript array includes speaker, text, and timestamp. The moments array contains Dialpad Ai-detected key phrases and action items. Reassemble into plain text by concatenating entries, or preserve the structured format for per-speaker analysis.

4

Handle rate limits and transcript availability

Rate limits

Dialpad enforces a rate limit of approximately 1,200 requests/minute. When you receive a 429 response, back off using exponential retry logic. For bulk operations, pace requests at ~15–20 per second and persist your pagination cursor between runs.

Transcript timing

Dialpad Ai processes transcripts in near real-time during the call, but the finalized version becomes available shortly after the call ends - typically within a few minutes. For longer calls or during peak load, allow up to 15–30 minutes. Use the call_transcription webhook event to be notified when the transcript is ready.

Patterns

Key Extraction Flows

There are two primary patterns for getting transcripts out of Dialpad. The right choice depends on whether you're doing a historical backfill or need near real-time processing as calls complete.

Batch Polling (Backfill & Incremental)

Historical export or scheduled ongoing extraction

1

Define your date range - for backfills, this may be several months of historical calls. For incremental polling, use your last successful poll timestamp as the start

2

Call GET /api/v2/stats/calls with start_date and end_date filters. Paginate through the full result set, collecting all call IDs

3

For each call ID, fetch the transcript via GET /api/v2/transcripts/{call_id}. Pace requests to stay within the 1,200/minute rate limit

4

Store each transcript with its call metadata (call ID, date, duration, participants, moments) in your data warehouse or object store

5

Route stored data to your analysis pipeline - Semarize for structured evaluation, or direct to your BI tool for reporting

6

Update your stored cursor / timestamp to the current run time for the next poll cycle

Tip: Persist your pagination cursor between batches. If the process is interrupted, you can resume from where you left off. Use call_id as a deduplication key to prevent reprocessing calls you've already handled.

Webhook-Triggered

Near real-time on call transcription completion

1

Register a webhook endpoint in the Dialpad admin console. Subscribe to the call_transcription event type - this fires when Dialpad Ai finishes processing a call's transcript

2

When the webhook fires, parse the event payload to extract the call_id and basic call metadata

3

Fetch the full transcript via GET /api/v2/transcripts/{call_id} using the call ID from the event payload

4

Route the transcript and metadata downstream - to Semarize for structured analysis, your CRM updater, or automation platform

Note: Webhook events may be delivered more than once or missed during outages. Implement idempotency using call_id as a deduplication key, and run a daily reconciliation poll to catch any events your webhook handler missed.

Automation

Send Dialpad Transcripts to Automation Tools

Once you can extract transcripts from Dialpad, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from Dialpad trigger through Semarize evaluation to CRM, Slack, or database output.

ZapierNo-code automation

Dialpad → Zapier → Semarize → CRM

Detect new Dialpad calls via webhook, fetch the transcript, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.

Example Zap
Trigger: Webhook (Dialpad)
Fires on call_transcription event
Trigger: Catch Hook
Event: call_transcription
Output: call_id, direction, duration
Webhooks by Zapier
Fetch transcript from Dialpad API
Method: GET
URL: https://dialpad.com/api/v2/transcripts/{{call_id}}
Auth: Bearer <api_key>
Transcript returned
Webhooks by Zapier
POST /v1/runs (sync) to Semarize
Method: POST
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output returned
Formatter by Zapier
Extract brick values from Semarize response
Extract: bricks.agent_score.value
Extract: bricks.compliance_flag.value
Extract: bricks.resolution_status.value
Salesforce - Update Record
Write scored signals to Contact / Case
Object: Case
Agent Score: {{agent_score}}
Compliance Flag: {{compliance_flag}}
Resolution: {{resolution_status}}

Setup steps

1

Create a new Zap. Choose "Webhooks by Zapier" as the trigger and select "Catch Hook". Copy the webhook URL and register it in Dialpad's admin console under Webhooks, subscribing to the call_transcription event.

2

Add a "Webhooks by Zapier" Action (Custom Request) to fetch the transcript from Dialpad. Set method to GET, URL to https://dialpad.com/api/v2/transcripts/{{call_id}}, and add your API key as a Bearer token.

3

Add a second "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the transcript text into input.transcript.

4

Add a Formatter step to extract individual brick values from the Semarize JSON response - agent_score, compliance_flag, resolution_status, etc.

5

Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record.

6

Test each step end-to-end, then turn on the Zap.

Watch out for: Zapier has step data size limits that can truncate very long transcripts. For calls over 60 minutes, consider storing the transcript in cloud storage and passing a reference URL instead of inline text. Use mode: "sync" so Semarize returns results inline - Zapier doesn't natively support polling loops.
Learn more about Zapier automation
n8nSelf-hosted workflows

Dialpad → n8n → Semarize → Database

Poll Dialpad for new calls on a schedule, fetch transcripts, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles pagination and batch processing.

Example Workflow
Cron - Every Hour
Triggers the workflow on schedule
Mode: Every Hour
Timezone: UTC
HTTP Request - List Calls
GET /api/v2/stats/calls (Dialpad)
Method: GET
URL: https://dialpad.com/api/v2/stats/calls
Auth: Bearer <api_key>
Params: start_date={{$now.minus(1, 'hour')}}&limit=100
For each call ID
HTTP Request - Fetch Transcript
GET /api/v2/transcripts/{call_id} (Dialpad)
URL: .../transcripts/{{$json.call_id}}
Code - Reassemble Transcript
Concatenate utterances into plain text
Join: transcript[].text by speaker
HTTP Request - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Scores & signals returned
Postgres - Insert Row
Write structured output to database
Table: call_evaluations
Columns: call_id, agent_score, compliance, resolution

Setup steps

1

Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).

2

Add an HTTP Request node to list new calls from Dialpad. Set method to GET, URL to https://dialpad.com/api/v2/stats/calls, configure Bearer auth with your API key, and set start_date to one interval ago.

3

Add a Split In Batches node to iterate over the returned call IDs. Inside the loop, add an HTTP Request node to fetch each transcript via GET /api/v2/transcripts/{call_id}.

4

Add a Code node (JavaScript) to reassemble the transcript array into a single text string. Join each entry's text, prefixed by speaker name.

5

Add another HTTP Request node to send the transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.

6

Add a Code node to extract the brick values from the Semarize response - agent_score, compliance_flag, resolution_status, evidence, confidence.

7

Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use call_id as the primary key for upserts.

8

Activate the workflow. Monitor the first few runs to verify Semarize responses are arriving and writing correctly.

Watch out for: Use call IDs as deduplication keys to prevent reprocessing. You can also use async mode with n8n's native loop - POST /v1/runs (default async), then poll GET /v1/runs/:runId with a Wait + IF loop until status is "succeeded".
Learn more about n8n automation
MakeVisual automation with branching

Dialpad → Make → Semarize → CRM + Slack

Fetch new Dialpad transcripts on a schedule, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on escalation flags via Slack and write all signals to your CRM.

Example Scenario
Schedule - Every 30 min
Triggers the scenario on interval
Interval: 30 minutes
HTTP - List New Calls
GET /api/v2/stats/calls (Dialpad)
Method: GET
Auth: Bearer <api_key>
Params: start_date={{formatDate(...)}}&limit=100
HTTP - Fetch Transcript
GET /api/v2/transcripts/{call_id} (per call)
Iterator: for each call in response
URL: .../transcripts/{{item.call_id}}
HTTP - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output
Router - Branch on Escalation
Route by Semarize output
Branch 1: IF escalation_flag.value = true
Branch 2: ALL (fallthrough)
Branch 1 - Escalation detected
Slack - Alert Channel
Notify team about flagged call
Channel: #support-escalations
Message: Escalation on {{call_id}}, score: {{score}}
Branch 2 - All calls
Salesforce - Update Record
Write all scored signals to Case / Contact
Agent Score: {{agent_score}}
Compliance: {{compliance_flag}}
Resolution: {{resolution_status}}

Setup steps

1

Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15-60 minutes is typical for contact center data).

2

Add an HTTP module to list new calls from Dialpad. Set method to GET, URL to https://dialpad.com/api/v2/stats/calls, configure Bearer auth, and filter by start_date since the last run.

3

Add an Iterator module to loop through each call. For each, add an HTTP module to fetch the transcript via GET /api/v2/transcripts/{call_id}.

4

Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the previous step. Parse the response as JSON.

5

Add a Router module. Define Branch 1 with a filter: bricks.escalation_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).

6

On Branch 1, add a Slack module to alert your team when an escalation is detected. Map the score, escalation flag, and call ID into the message.

7

On Branch 2, add a Salesforce module to write all brick values (agent_score, compliance_flag, resolution_status) to the Case or Contact record.

8

Set the scenario schedule and activate. Monitor the first few runs in Make's execution log.

Watch out for: Each API call counts as an operation. A scenario processing 50 calls uses ~150 operations (list + transcript + Semarize per call). Use mode: "sync" to avoid needing a polling loop for each run.
Learn more about Make automation

What you can build

What You Can Do With Dialpad Data in Semarize

Custom QA scoring grounded against your playbook, cross-channel analytics, deep moment trend analysis, and building your own tools on structured conversation signals.

Knowledge-Grounded Disclosure Sequence Verification

Regulatory Evidence Automation

What Semarize generates

disclosure_sequence = "correct"consent_verbatim = truerisk_warning = "delivered"evidence_package = "complete"

Your financial services contact centre handles regulated calls. Required disclosures must be delivered in the right order, with the right phrasing, within the required timeframe. Run every transcript through a regulatory compliance kit grounded against your compliance policy document. Semarize verifies disclosure_sequence_correct, consent_language_verbatim_match, risk_warning_delivered, and opt_out_offered — with the exact timestamp and evidence span for each. Every call generates a structured evidence package that maps directly to your regulatory filing template. Your compliance team gets weekly audit coverage of 100% of calls. Audit prep drops from 3 weeks to 3 days.

Learn more about QA & Compliance
Regulatory Disclosure AuditKit: Financial Services Compliance v2.4
1Disclosure sequence0:42
2Consent language (verbatim)1:18
3Risk warning delivered-
"Agent skipped required risk disclosure for credit product."
4Opt-out offered3:05
4 requirements checked · 1 failed · flagged for reviewAction Required

Policy & Procedure Accuracy Audit

Knowledge-Grounded Agent Verification

What Semarize generates

return_policy_correct = falsewarranty_terms_accurate = truetroubleshooting_step_skipped = "power_cycle"policy_version_cited = "outdated"

Run a knowledge-grounded kit against your product documentation, return policies, and troubleshooting guides on every call. Semarize checks whether the return window quoted was accurate, whether the warranty terms matched the current policy, and whether troubleshooting steps followed the approved sequence. After scoring 3,000 calls, you discover that 12% of agents cite outdated return policy terms — and the specific sections they get wrong. The cost of honouring incorrect promises drops immediately once targeted retraining addresses the exact knowledge gaps the data revealed.

Learn more about QA & Compliance
IVR-to-Agent Handoff Quality
IVR
Agent
First 60 seconds scored
34%
Context Acknowledged
of handoffs
2.4x
Avg Repeat Requests
per call
0.54
Resolution Efficiency
score
AHT Impact - Warm Handoff Training
Before
6m 42s
After
5m 10s
-23% AHT when context acknowledged in first 15s

Structured Coaching Signal Pipeline

Typed Conversation Data for Analytics

What Semarize generates

discovery_depth = 72empathy_demonstrated = trueresolution_quality = 0.68typed_columns = 6

A data architect builds an Airflow DAG that runs every call through Semarize. Each call lands in BigQuery with typed columns: agent_id (string), empathy_score (float), resolution_quality (float), compliance_pass (bool), discovery_depth (int), escalation_risk (float). dbt models build derived tables: agent daily scorecards, weekly compliance summaries, and customer sentiment trends. The BI team builds Looker dashboards on structured conversation data that didn’t exist 3 months ago — no data science team required, just an API and an orchestrator.

Learn more about Data Science
Escalation Prediction - Active CallsModel accuracy: 89%
Call #4821Low
frustration: 0.22confidence: 0.81esc_prob: 0.12
Call #4823Medium
frustration: 0.55confidence: 0.52esc_prob: 0.47
Call #4827HighSupervisor Alert
frustration: 0.73confidence: 0.38esc_prob: 0.82
Threshold rule: frustration > 0.7 AND confidence < 0.4 = 82% escalation rate
Result: 31% fewer escalations with proactive supervisor alerts

Custom Workforce Analytics Engine

Structured Signals Joined with WFM & CRM Data

Vibe-coded

What Semarize generates

afternoon_empathy_drop = -15%quality_cliff_call = 35top_10_csat_multiplier = 2.0xrecommended_cap = 35

A workforce analytics lead vibe-codes a Metabase dashboard that joins Semarize scores from every Dialpad call with WFM schedule data and CRM outcomes. The dashboard reveals that afternoon shifts have 15% lower empathy_scores than morning shifts, that agents who handle more than 40 calls/day see quality drop by 20% after call 35, and that the top 10% of agents by resolution_quality handle 30% fewer calls — but generate 2x the CSAT. Staffing models get adjusted: high-performers get premium time slots, and daily call caps prevent quality degradation.

Learn more about RevOps
Workforce Analytics EngineVibe-coded in Metabase
Empathy Score by Shift
Morning
82
Afternoon
70
Afternoon shifts: -15% empathy score
Call Volume vs. Quality
Call 1
Quality cliff at call 35
Call 40
Recommendations
Set daily call cap to 35 calls
Top 10% by quality: 2.0x CSAT, assign premium slots
Rotate afternoon shifts to prevent empathy fatigue

Watch out for

Common Challenges & Gotchas

These are the issues that come up most often when teams start extracting transcripts from Dialpad at scale.

Transcripts are per-call only - no bulk endpoint

Unlike some platforms that offer bulk transcript export, Dialpad requires you to fetch transcripts one call at a time using the call_id. For historical backfills, this means building a loop that lists calls, collects IDs, and fetches each transcript individually. Plan for longer backfill times on large datasets.

Recording URLs expire

Dialpad serves call recordings via time-limited secure blob URLs. If you need the audio file, you must download it promptly after receiving the URL. Storing the URL for later retrieval won't work - the link will have expired. Build download-on-receipt into your pipeline.

OAuth token refresh required

If you use OAuth rather than a static API key, access tokens expire and must be refreshed periodically. Automation workflows that run on a schedule need to handle token refresh transparently, or they'll fail silently when the token expires mid-run.

Webhook delivery is not guaranteed exactly-once

Dialpad webhooks (including call_transcription events) may deliver the same event more than once, or miss delivery during outages. Implement idempotency checks using the call ID as a deduplication key, and run a periodic reconciliation poll to catch any missed events.

Ai features require specific plan tiers

Dialpad Ai features - transcription, Ai Moments, Ai Scorecards - are not available on all plan tiers. If your account doesn't include Dialpad Ai, transcript and moments endpoints will return empty or be unavailable. Confirm your plan includes these features before building your extraction pipeline.

Contact center vs. business line data separation

Dialpad separates data between its UCaaS (business phone) and CCaaS (contact center) products. API calls for contact center data may use different endpoints or require separate credentials from business line calls. Make sure your integration targets the correct product line for the data you need.

FAQ

Frequently Asked Questions

Explore

Explore Semarize