Semarize

Get Your Data

Deepgram - How to Get Your Transcript Data

A practical guide to getting transcripts from Deepgram - covering API authentication, pre-recorded and real-time transcription, callback-triggered flows, and how to route structured transcript data into your downstream systems.

What you'll learn

  • What transcript data you can extract from Deepgram - full text, word timestamps, speaker labels, confidence scores, and summaries
  • How to access data via the Deepgram API - authentication, pre-recorded and real-time endpoints
  • Three extraction patterns: batch processing, scheduled polling, and callback-triggered
  • How to connect Deepgram data pipelines to Zapier, n8n, and Make
  • Advanced use cases - real-time analytics, contact center QA, multi-language intelligence, and custom pipelines

Data

What Data You Can Extract From Deepgram

Deepgram is a speech-to-text API - you provide audio, it returns transcripts. Every transcription request produces structured output that goes well beyond raw text, including timing, confidence, speaker identification, and optional intelligence features like summarization and topic detection.

Common fields teams care about

Full transcript text
Word-level timestamps and confidence scores
Speaker diarization labels
Paragraph and sentence segmentation
Smart formatting (numbers, dates, currency)
Punctuation and capitalization
Language detection
Summarization (when enabled)
Topic detection (when enabled)
Audio duration and channel info

API Access

How to Get Transcripts via the Deepgram API

Deepgram offers two primary transcription modes: pre-recorded (batch) and real-time (streaming). Both use the same /v1/listen endpoint with different protocols - REST for pre-recorded, WebSocket for real-time.

1

Authenticate

Deepgram uses API key authentication. Generate a key in the Deepgram console under your project settings. Pass it as a Token in the Authorization header on every request.

Authorization: Token <your-deepgram-api-key>
Content-Type: audio/wav
API keys can be scoped to specific projects and permissions. Use a dedicated key for each integration and rotate keys periodically. Never expose keys in client-side code.
2

Transcribe pre-recorded audio

Send audio to the POST /v1/listen endpoint. You can upload a file directly or provide a URL to a hosted audio file. Add query parameters to enable features like diarization, smart formatting, and summarization.

POST https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true&diarize=true

# Option A: Upload audio file directly
Content-Type: audio/wav
Body: <binary audio data>

# Option B: Provide a URL
Content-Type: application/json
Body: { "url": "https://storage.example.com/call-recording.wav" }

The response returns a results object containing channels[].alternatives[] with the transcript text, confidence, word-level timing, and speaker labels (if diarization is enabled). Max file size is 2GB.

3

Stream real-time audio

For live transcription, open a WebSocket connection to wss://api.deepgram.com/v1/listen. Stream audio chunks and receive transcript results as they're generated. Deepgram returns both interim (partial) and final results.

// WebSocket connection
const ws = new WebSocket(
  "wss://api.deepgram.com/v1/listen?model=nova-2&smart_format=true",
  ["token", "<your-api-key>"]
)

ws.onopen = () => {
  // Stream audio chunks as binary data
  audioStream.on("data", (chunk) => ws.send(chunk))
}

ws.onmessage = (event) => {
  const result = JSON.parse(event.data)
  const transcript = result.channel.alternatives[0].transcript
  // Process each transcript segment in real time
}

Real-time results include is_final to distinguish interim from final transcripts, speech_final to detect end of utterance, and word-level timing data. Use SDKs (Python, JavaScript, .NET, Go) for managed WebSocket handling.

4

Key features and parameters

Transcription features

Enable features via query parameters: smart_format=true for intelligent formatting, diarize=true for speaker labels, summarize=v2 for AI summaries, and detect_topics=true for topic detection.

Rate limits & concurrency

Deepgram enforces concurrency limits that vary by plan. When you hit the limit, requests return 429. For bulk operations, queue requests and limit parallelism to 5-10 concurrent calls. Real-time WebSocket connections also count against concurrency limits.

Patterns

Key Extraction Flows

There are three practical patterns for getting transcripts from Deepgram into your analysis pipeline. The right choice depends on whether you're processing a backlog of recordings, running ongoing transcription, or need real-time results.

Batch Processing (Historical Backfill)

One-off transcription of existing recordings

1

Gather your audio files - from cloud storage, a call recording platform, or local archives. Supported formats include WAV, MP3, FLAC, OGG, and WebM

2

For each file, POST to /v1/listen with the audio data or a URL reference. Add parameters: model=nova-2, smart_format=true, diarize=true

3

Control concurrency - queue requests and limit to 5-10 parallel calls depending on your plan tier. Handle 429 responses with exponential backoff

4

Store each transcript response with its source metadata (file name, recording date, participants) in your data warehouse or object store

5

Once the backfill completes, run your Semarize analysis pipeline against the stored transcripts in bulk

Tip: Use Deepgram's callback feature for large batches. Instead of waiting for each response synchronously, provide a callback URL and Deepgram will POST results when processing completes. This lets you fire-and-forget large volumes.

Scheduled Polling

Ongoing transcription on a schedule

1

Set a cron job or scheduled trigger that checks your recording source (S3 bucket, call platform API, or file system) for new audio files

2

For each new file, submit it to Deepgram's /v1/listen endpoint. Use the file URL approach if recordings are already in cloud storage

3

Track which files have been processed using a database or state file. Use the file path or recording ID as a deduplication key

4

Route each transcript and its metadata to your downstream pipeline - Semarize for analysis, your warehouse, or automation platform

5

Update your processed-files log and set the next poll cycle

Tip: If your recordings land in S3 or GCS, use bucket event notifications to trigger transcription immediately instead of polling on a timer. This reduces latency and eliminates empty poll cycles.

Callback-Triggered

Async transcription with webhook delivery

1

Submit audio to /v1/listen with a callback query parameter pointing to your webhook endpoint: callback=https://your-server.com/deepgram-callback

2

Deepgram processes the audio asynchronously and POSTs the full transcript result to your callback URL when complete

3

Your webhook handler receives the transcript JSON, extracts the text and metadata, and routes it downstream

4

Send the transcript to Semarize for structured analysis, then write results to your CRM, database, or notification channels

Note: Your callback URL must be publicly accessible and return a 200 status code. Deepgram does not retry failed callback deliveries indefinitely - implement your own error handling and consider using a queue (SQS, Cloud Tasks) as a buffer between Deepgram and your processing pipeline.

Automation

Send Deepgram Transcripts to Automation Tools

Once you can extract transcripts from Deepgram, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from audio input through Deepgram transcription to Semarize evaluation and final output.

ZapierNo-code automation

Audio → Deepgram → Zapier → Semarize → CRM

Detect new audio files in cloud storage, send them to Deepgram for transcription, route the transcript to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.

Example Zap
Trigger: New File in S3
Fires when a new recording appears
App: Amazon S3
Event: New Object
Output: file_url, file_name
Webhooks by Zapier
Transcribe via Deepgram API
Method: POST
URL: https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true&diarize=true
Auth: Token <api-key>
Body: { "url": "{{file_url}}" }
Transcript returned
Webhooks by Zapier
POST /v1/runs (sync) to Semarize
Method: POST
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output returned
Formatter by Zapier
Extract brick values from Semarize response
Extract: bricks.overall_score.value
Extract: bricks.sentiment.value
Extract: bricks.action_items.value
Salesforce - Update Record
Write scored signals to Contact/Case
Object: Case
AI Score: {{overall_score}}
Sentiment: {{sentiment}}
Action Items: {{action_items}}

Setup steps

1

Create a new Zap. Choose your storage trigger (S3, Google Drive, Dropbox, etc.) and configure it to fire when new audio files appear in your recordings folder.

2

Add a "Webhooks by Zapier" Action (Custom Request) to transcribe the audio via Deepgram. Set method to POST, URL to https://api.deepgram.com/v1/listen with your desired parameters, add your API key as Authorization: Token <key>, and pass the file URL in the JSON body.

3

Add a second "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the Deepgram transcript text into input.transcript.

4

Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, sentiment, action_items, etc.

5

Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record.

6

Test each step end-to-end, then turn on the Zap.

Watch out for: Zapier has step data size limits that can truncate very long transcripts. For recordings over 60 minutes, consider extracting just the transcript text (not the full Deepgram response with word-level timing). Use mode: "sync" so Semarize returns results inline - Zapier doesn't natively support polling loops.
Learn more about Zapier automation
n8nSelf-hosted workflows

Audio → Deepgram → n8n → Semarize → Database

Poll for new recordings on a schedule, transcribe each via Deepgram, send the transcript to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles batch processing and error recovery.

Example Workflow
Cron - Every Hour
Triggers the workflow on schedule
Mode: Every Hour
Timezone: UTC
HTTP Request - List New Files
Check storage for new recordings
Method: GET
URL: https://storage.example.com/api/recordings
Filter: created_after={{$now.minus(1, 'hour')}}
For each audio file
HTTP Request - Deepgram
POST /v1/listen (transcribe)
URL: https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true&diarize=true
Auth: Token <api-key>
Body: { "url": "{{$json.file_url}}" }
Code - Extract Transcript
Pull transcript text from Deepgram response
Extract: results.channels[0].alternatives[0].transcript
HTTP Request - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Scores & signals returned
Postgres - Insert Row
Write structured output to database
Table: transcript_evaluations
Columns: file_id, score, sentiment, action_items

Setup steps

1

Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).

2

Add an HTTP Request node to check your recording source for new files. This could be an S3 listing, a call platform API, or any storage system with an API.

3

Add a Split In Batches node to iterate over new files. Inside the loop, add an HTTP Request node to transcribe each file via Deepgram's POST /v1/listen endpoint with your desired parameters.

4

Add a Code node (JavaScript) to extract the transcript text from Deepgram's response. Pull results.channels[0].alternatives[0].transcript for the full text.

5

Add another HTTP Request node to send the transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.

6

Add a Code node to extract the brick values from the Semarize response - overall_score, sentiment, action_items, evidence, confidence.

7

Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use the file ID or recording ID as the primary key for upserts.

8

Activate the workflow. Monitor the first few runs to verify Deepgram and Semarize responses are arriving and writing correctly.

Watch out for: Use file IDs or recording IDs as deduplication keys to prevent reprocessing. For large files, Deepgram may take longer to respond - increase your HTTP timeout. You can also use async mode with n8n's native loop - POST /v1/runs (default async), then poll GET /v1/runs/:runId with a Wait + IF loop until status is "succeeded".
Learn more about n8n automation
MakeVisual automation with branching

Audio → Deepgram → Make → Semarize → CRM + Slack

Watch for new recordings, transcribe each via Deepgram, send the transcript to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals to your CRM.

Example Scenario
Schedule - Every 30 min
Triggers the scenario on interval
Interval: 30 minutes
HTTP - List New Recordings
Check source for new audio files
Method: GET
URL: storage API or call platform API
Filter: created_after={{formatDate(...)}}
HTTP - Deepgram Transcribe
POST /v1/listen (per file)
URL: https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true
Auth: Token <api-key>
Body: { "url": "{{item.file_url}}" }
HTTP - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output
Router - Branch on Sentiment
Route by Semarize output
Branch 1: IF sentiment.value < 0.3
Branch 2: ALL (fallthrough)
Branch 1 - Negative sentiment
Slack - Alert Channel
Notify team about flagged call
Channel: #support-alerts
Message: Negative sentiment on {{file_id}}, score: {{score}}
Branch 2 - All calls
Salesforce - Update Record
Write all scored signals to Case/Contact
AI Score: {{overall_score}}
Sentiment: {{sentiment}}
Action Items: {{action_items}}

Setup steps

1

Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15-60 minutes is typical).

2

Add an HTTP module to list new recordings from your source. This could be an S3 bucket listing, a call recording platform API, or any storage system.

3

Add an Iterator module to loop through each new file. For each, add an HTTP module to transcribe via Deepgram's POST /v1/listen endpoint with your audio URL and desired parameters.

4

Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the Deepgram response. Parse the response as JSON.

5

Add a Router module. Define Branch 1 with a filter: bricks.sentiment.value less than 0.3 (negative sentiment threshold). Leave Branch 2 as a fallthrough (no filter).

6

On Branch 1, add a Slack module to alert your team when negative sentiment is detected. Map the score, sentiment, and recording ID into the message.

7

On Branch 2, add a Salesforce module to write all brick values (score, sentiment, action_items) to the Case or Contact record.

8

Set the scenario schedule and activate. Monitor the first few runs in Make's execution log.

Watch out for: Each API call counts as an operation. A scenario processing 50 recordings uses ~150 operations (list + Deepgram + Semarize per file). Use mode: "sync" to avoid needing a polling loop for each run. For large audio files, increase the HTTP module timeout.
Learn more about Make automation

What you can build

What You Can Do With Deepgram Data in Semarize

Deepgram gives you the transcript. Semarize gives you structure. Here's what becomes possible when raw audio becomes scored, categorized, actionable intelligence.

Knowledge-Grounded Bot Response Verification

AI Agent Factual Accuracy Scoring

What Semarize generates

response_accuracy = 0.91hallucination_detected = truepolicy_fabricated = "90-day_guarantee"escalation_appropriate = true

Your product runs a voice bot powered by Deepgram’s streaming API. Every interaction produces a transcript — but are the bot’s responses factually correct? Run a knowledge-grounded kit against your policy documents and product database on every bot conversation. Semarize verifies each response against the source of truth: did the bot quote the correct return policy? Did it fabricate a “90-day satisfaction guarantee” that doesn’t exist? Did it accurately describe product specifications? A weekly quality report shows that the bot hallucinated pricing information in 4.2% of calls. The AI team retrains the pricing module guided by specific, evidence-backed accuracy signals.

Learn more about AI Evaluation
Voice Bot Quality Report - Weekly1,247 conversations scored
Conversation TypeAccuracyHallucinationEscalationStatus
Account Inquiry96%1.1%98% Pass
Billing84%4.2%71% Fail
Technical91%2.8%89% Pass
Billing — hallucination rate 4.2% on pricing info. Retraining recommended for pricing module.

Same-Week Coaching Signal Detection

Structured Skill Metrics for Immediate Action

What Semarize generates

objection_handling_score = 0.41discovery_technique = "surface_level"calls_since_training = 6regression_detected = true

Your sales team records calls through a Deepgram-powered dialler. Build an automated coaching signal system: every transcript runs through a coaching signal kit. Semarize evaluates discovery_depth, objection_handling_quality, value_articulation, and competitive_positioning as typed scores. When a rep’s objection handling drops below threshold on 3 consecutive calls, the structured output triggers a Slack alert to their manager the same week — with the specific coaching area, the score, and a link to the evidence span. The time from skill regression to coaching intervention shrinks from 6 weeks to 3 days.

Learn more about Sales Coaching
#coaching-alerts
Semarize BotAPP2:34 PM

Coaching alert for Marcus T. — skill regression detected on 3 consecutive calls

Objection Handling Regression

Skill: objection_handling

Score: 34 / 100 (threshold: 60)

Consecutive: 3 calls below threshold

Rep: Marcus T. — @jennifer.m (manager)

Evidence span

“When the prospect raised budget concerns, the rep moved to discounting without exploring the objection further…”

Custom Regulatory Framework Scoring

Your Rules, Your Timeline, Every Language

What Semarize generates

consent_obtained = 0.88disclosure_delivered = trueprohibited_language = falseregulatory_version = "v2026-Q1"

Your contact centre handles calls in English, Spanish, and French. Compliance requirements apply equally across all languages. Deepgram transcribes all three. Semarize evaluates every transcript — regardless of language — against your compliance kit, grounded against your regulatory policy document: consent_obtained, required_disclosure_delivered, prohibited_language_absent, and data_handling_statement_given. When regulations change, you update your Semarize kit the same day — scoring stays current on your timeline. A quarterly audit reveals that Spanish-language calls have a 12% lower consent_obtained rate — a gap that becomes visible and actionable through structured cross-language scoring.

Learn more about QA & Compliance
Multilingual Compliance Dashboard100% of calls scored
LanguageConsentDisclosureNo ProhibitedData Handling
English97%95%99%93%
Spanish85%91%98%88%
French94%92%97%90%
Spanish consent_obtained rate is 12% lower than English — gap invisible before cross-language scoring.

Custom Conversation Intelligence Pipeline

Your Signals as SQL Columns

Vibe-coded

What Semarize generates

daily_volume = 500+custom_columns = 8 typedpipeline_latency = "< 3min"storage = "BigQuery"

A data architect vibe-codes an Airflow DAG that processes every call through Deepgram for transcription, then Semarize for structured evaluation. The DAG handles 500+ calls per day. Each call lands in BigQuery with YOUR custom typed columns — fields that don’t exist in any platform’s native output: playbook_adherence (float), claim_accuracy (bool), coaching_priority (varchar), qualification_score (float), compliance_pass (bool). dbt models build derived tables: agent daily scorecards, weekly accuracy reports, and coaching signal dashboards. The BI team builds Looker dashboards on conversation intelligence that’s fully custom, fully owned, and fully queryable.

Learn more about Data Science
Data Architecture PipelineVibe-coded
Audio
.wav / .mp3
Deepgram
JSON transcript
Semarize
Typed Bricks
BigQuery
6 typed columns
Looker
Dashboards
Daily throughput500+ calls / day
BigQuery output schema
agent_id : stringcall_duration : intsentiment_score : floatcompliance_pass : boolresolution_quality : scoreescalation_risk : float
Airflow DAG · dbt models · < 3 min latency · no data science team required

Watch out for

Common Challenges & Gotchas

These are the issues that come up most often when teams start using Deepgram for transcription at scale.

Audio quality affects accuracy

Deepgram's transcription quality depends heavily on the input audio. Background noise, low bitrate recordings, heavy accents, and speaker overlap all reduce accuracy. Pre-process audio where possible and use the appropriate model tier for your use case.

WebSocket connection management

Real-time streaming via WebSocket requires careful connection management. Handle disconnections, implement reconnection logic, and manage audio buffering. Dropped connections during a live call mean lost transcript segments.

Concurrency limits on bulk processing

Sending hundreds of audio files simultaneously will hit concurrency limits. Implement a queue with controlled parallelism - typically 5-10 concurrent requests depending on your plan tier. Monitor 429 responses and back off accordingly.

Large file processing time

Files approaching the 2GB limit can take significant time to process. For very long recordings, consider splitting audio into smaller segments before sending to Deepgram, then reassembling the transcripts downstream.

Speaker diarization accuracy

Diarization works well for 2-3 speakers with clear turn-taking, but accuracy drops with more participants, crosstalk, or poor audio separation. Validate speaker labels before using them for per-speaker analysis or agent scoring.

Callback URL reliability

When using Deepgram's callback feature, your endpoint must be publicly accessible and handle retries. If your callback URL is down when Deepgram tries to deliver results, you may need to re-submit the transcription request.

Cost management at scale

Deepgram charges per audio hour. Real-time streaming and pre-recorded transcription have different pricing. At high volumes, costs can grow quickly. Monitor usage, choose the right model tier for each use case, and avoid re-transcribing audio unnecessarily.

FAQ

Frequently Asked Questions

Explore

Explore Semarize