Get Your Data
Deepgram - How to Get Your Transcript Data
A practical guide to getting transcripts from Deepgram - covering API authentication, pre-recorded and real-time transcription, callback-triggered flows, and how to route structured transcript data into your downstream systems.
What you'll learn
- What transcript data you can extract from Deepgram - full text, word timestamps, speaker labels, confidence scores, and summaries
- How to access data via the Deepgram API - authentication, pre-recorded and real-time endpoints
- Three extraction patterns: batch processing, scheduled polling, and callback-triggered
- How to connect Deepgram data pipelines to Zapier, n8n, and Make
- Advanced use cases - real-time analytics, contact center QA, multi-language intelligence, and custom pipelines
Data
What Data You Can Extract From Deepgram
Deepgram is a speech-to-text API - you provide audio, it returns transcripts. Every transcription request produces structured output that goes well beyond raw text, including timing, confidence, speaker identification, and optional intelligence features like summarization and topic detection.
Common fields teams care about
API Access
How to Get Transcripts via the Deepgram API
Deepgram offers two primary transcription modes: pre-recorded (batch) and real-time (streaming). Both use the same /v1/listen endpoint with different protocols - REST for pre-recorded, WebSocket for real-time.
Authenticate
Deepgram uses API key authentication. Generate a key in the Deepgram console under your project settings. Pass it as a Token in the Authorization header on every request.
Authorization: Token <your-deepgram-api-key> Content-Type: audio/wav
Transcribe pre-recorded audio
Send audio to the POST /v1/listen endpoint. You can upload a file directly or provide a URL to a hosted audio file. Add query parameters to enable features like diarization, smart formatting, and summarization.
POST https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true&diarize=true
# Option A: Upload audio file directly
Content-Type: audio/wav
Body: <binary audio data>
# Option B: Provide a URL
Content-Type: application/json
Body: { "url": "https://storage.example.com/call-recording.wav" }The response returns a results object containing channels[].alternatives[] with the transcript text, confidence, word-level timing, and speaker labels (if diarization is enabled). Max file size is 2GB.
Stream real-time audio
For live transcription, open a WebSocket connection to wss://api.deepgram.com/v1/listen. Stream audio chunks and receive transcript results as they're generated. Deepgram returns both interim (partial) and final results.
// WebSocket connection
const ws = new WebSocket(
"wss://api.deepgram.com/v1/listen?model=nova-2&smart_format=true",
["token", "<your-api-key>"]
)
ws.onopen = () => {
// Stream audio chunks as binary data
audioStream.on("data", (chunk) => ws.send(chunk))
}
ws.onmessage = (event) => {
const result = JSON.parse(event.data)
const transcript = result.channel.alternatives[0].transcript
// Process each transcript segment in real time
}Real-time results include is_final to distinguish interim from final transcripts, speech_final to detect end of utterance, and word-level timing data. Use SDKs (Python, JavaScript, .NET, Go) for managed WebSocket handling.
Key features and parameters
Transcription features
Enable features via query parameters: smart_format=true for intelligent formatting, diarize=true for speaker labels, summarize=v2 for AI summaries, and detect_topics=true for topic detection.
Rate limits & concurrency
Deepgram enforces concurrency limits that vary by plan. When you hit the limit, requests return 429. For bulk operations, queue requests and limit parallelism to 5-10 concurrent calls. Real-time WebSocket connections also count against concurrency limits.
Patterns
Key Extraction Flows
There are three practical patterns for getting transcripts from Deepgram into your analysis pipeline. The right choice depends on whether you're processing a backlog of recordings, running ongoing transcription, or need real-time results.
Batch Processing (Historical Backfill)
One-off transcription of existing recordings
Gather your audio files - from cloud storage, a call recording platform, or local archives. Supported formats include WAV, MP3, FLAC, OGG, and WebM
For each file, POST to /v1/listen with the audio data or a URL reference. Add parameters: model=nova-2, smart_format=true, diarize=true
Control concurrency - queue requests and limit to 5-10 parallel calls depending on your plan tier. Handle 429 responses with exponential backoff
Store each transcript response with its source metadata (file name, recording date, participants) in your data warehouse or object store
Once the backfill completes, run your Semarize analysis pipeline against the stored transcripts in bulk
Scheduled Polling
Ongoing transcription on a schedule
Set a cron job or scheduled trigger that checks your recording source (S3 bucket, call platform API, or file system) for new audio files
For each new file, submit it to Deepgram's /v1/listen endpoint. Use the file URL approach if recordings are already in cloud storage
Track which files have been processed using a database or state file. Use the file path or recording ID as a deduplication key
Route each transcript and its metadata to your downstream pipeline - Semarize for analysis, your warehouse, or automation platform
Update your processed-files log and set the next poll cycle
Callback-Triggered
Async transcription with webhook delivery
Submit audio to /v1/listen with a callback query parameter pointing to your webhook endpoint: callback=https://your-server.com/deepgram-callback
Deepgram processes the audio asynchronously and POSTs the full transcript result to your callback URL when complete
Your webhook handler receives the transcript JSON, extracts the text and metadata, and routes it downstream
Send the transcript to Semarize for structured analysis, then write results to your CRM, database, or notification channels
Automation
Send Deepgram Transcripts to Automation Tools
Once you can extract transcripts from Deepgram, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from audio input through Deepgram transcription to Semarize evaluation and final output.
Audio → Deepgram → Zapier → Semarize → CRM
Detect new audio files in cloud storage, send them to Deepgram for transcription, route the transcript to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.
Setup steps
Create a new Zap. Choose your storage trigger (S3, Google Drive, Dropbox, etc.) and configure it to fire when new audio files appear in your recordings folder.
Add a "Webhooks by Zapier" Action (Custom Request) to transcribe the audio via Deepgram. Set method to POST, URL to https://api.deepgram.com/v1/listen with your desired parameters, add your API key as Authorization: Token <key>, and pass the file URL in the JSON body.
Add a second "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the Deepgram transcript text into input.transcript.
Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, sentiment, action_items, etc.
Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record.
Test each step end-to-end, then turn on the Zap.
Audio → Deepgram → n8n → Semarize → Database
Poll for new recordings on a schedule, transcribe each via Deepgram, send the transcript to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles batch processing and error recovery.
Setup steps
Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).
Add an HTTP Request node to check your recording source for new files. This could be an S3 listing, a call platform API, or any storage system with an API.
Add a Split In Batches node to iterate over new files. Inside the loop, add an HTTP Request node to transcribe each file via Deepgram's POST /v1/listen endpoint with your desired parameters.
Add a Code node (JavaScript) to extract the transcript text from Deepgram's response. Pull results.channels[0].alternatives[0].transcript for the full text.
Add another HTTP Request node to send the transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.
Add a Code node to extract the brick values from the Semarize response - overall_score, sentiment, action_items, evidence, confidence.
Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use the file ID or recording ID as the primary key for upserts.
Activate the workflow. Monitor the first few runs to verify Deepgram and Semarize responses are arriving and writing correctly.
Audio → Deepgram → Make → Semarize → CRM + Slack
Watch for new recordings, transcribe each via Deepgram, send the transcript to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals to your CRM.
Setup steps
Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15-60 minutes is typical).
Add an HTTP module to list new recordings from your source. This could be an S3 bucket listing, a call recording platform API, or any storage system.
Add an Iterator module to loop through each new file. For each, add an HTTP module to transcribe via Deepgram's POST /v1/listen endpoint with your audio URL and desired parameters.
Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the Deepgram response. Parse the response as JSON.
Add a Router module. Define Branch 1 with a filter: bricks.sentiment.value less than 0.3 (negative sentiment threshold). Leave Branch 2 as a fallthrough (no filter).
On Branch 1, add a Slack module to alert your team when negative sentiment is detected. Map the score, sentiment, and recording ID into the message.
On Branch 2, add a Salesforce module to write all brick values (score, sentiment, action_items) to the Case or Contact record.
Set the scenario schedule and activate. Monitor the first few runs in Make's execution log.
What you can build
What You Can Do With Deepgram Data in Semarize
Deepgram gives you the transcript. Semarize gives you structure. Here's what becomes possible when raw audio becomes scored, categorized, actionable intelligence.
Knowledge-Grounded Bot Response Verification
AI Agent Factual Accuracy Scoring
What Semarize generates
Your product runs a voice bot powered by Deepgram’s streaming API. Every interaction produces a transcript — but are the bot’s responses factually correct? Run a knowledge-grounded kit against your policy documents and product database on every bot conversation. Semarize verifies each response against the source of truth: did the bot quote the correct return policy? Did it fabricate a “90-day satisfaction guarantee” that doesn’t exist? Did it accurately describe product specifications? A weekly quality report shows that the bot hallucinated pricing information in 4.2% of calls. The AI team retrains the pricing module guided by specific, evidence-backed accuracy signals.
Learn more about AI Evaluation| Conversation Type | Accuracy | Hallucination | Escalation | Status |
|---|---|---|---|---|
| Account Inquiry | 96% | 1.1% | 98% | Pass |
| Billing | 84% | 4.2% | 71% | Fail |
| Technical | 91% | 2.8% | 89% | Pass |
Same-Week Coaching Signal Detection
Structured Skill Metrics for Immediate Action
What Semarize generates
Your sales team records calls through a Deepgram-powered dialler. Build an automated coaching signal system: every transcript runs through a coaching signal kit. Semarize evaluates discovery_depth, objection_handling_quality, value_articulation, and competitive_positioning as typed scores. When a rep’s objection handling drops below threshold on 3 consecutive calls, the structured output triggers a Slack alert to their manager the same week — with the specific coaching area, the score, and a link to the evidence span. The time from skill regression to coaching intervention shrinks from 6 weeks to 3 days.
Learn more about Sales CoachingCoaching alert for Marcus T. — skill regression detected on 3 consecutive calls
Objection Handling Regression
Skill: objection_handling
Score: 34 / 100 (threshold: 60)
Consecutive: 3 calls below threshold
Rep: Marcus T. — @jennifer.m (manager)
Evidence span
“When the prospect raised budget concerns, the rep moved to discounting without exploring the objection further…”
Custom Regulatory Framework Scoring
Your Rules, Your Timeline, Every Language
What Semarize generates
Your contact centre handles calls in English, Spanish, and French. Compliance requirements apply equally across all languages. Deepgram transcribes all three. Semarize evaluates every transcript — regardless of language — against your compliance kit, grounded against your regulatory policy document: consent_obtained, required_disclosure_delivered, prohibited_language_absent, and data_handling_statement_given. When regulations change, you update your Semarize kit the same day — scoring stays current on your timeline. A quarterly audit reveals that Spanish-language calls have a 12% lower consent_obtained rate — a gap that becomes visible and actionable through structured cross-language scoring.
Learn more about QA & Compliance| Language | Consent | Disclosure | No Prohibited | Data Handling |
|---|---|---|---|---|
| English | 97% | 95% | 99% | 93% |
| Spanish | 85% | 91% | 98% | 88% |
| French | 94% | 92% | 97% | 90% |
Custom Conversation Intelligence Pipeline
Your Signals as SQL Columns
What Semarize generates
A data architect vibe-codes an Airflow DAG that processes every call through Deepgram for transcription, then Semarize for structured evaluation. The DAG handles 500+ calls per day. Each call lands in BigQuery with YOUR custom typed columns — fields that don’t exist in any platform’s native output: playbook_adherence (float), claim_accuracy (bool), coaching_priority (varchar), qualification_score (float), compliance_pass (bool). dbt models build derived tables: agent daily scorecards, weekly accuracy reports, and coaching signal dashboards. The BI team builds Looker dashboards on conversation intelligence that’s fully custom, fully owned, and fully queryable.
Learn more about Data ScienceWatch out for
Common Challenges & Gotchas
These are the issues that come up most often when teams start using Deepgram for transcription at scale.
Audio quality affects accuracy
Deepgram's transcription quality depends heavily on the input audio. Background noise, low bitrate recordings, heavy accents, and speaker overlap all reduce accuracy. Pre-process audio where possible and use the appropriate model tier for your use case.
WebSocket connection management
Real-time streaming via WebSocket requires careful connection management. Handle disconnections, implement reconnection logic, and manage audio buffering. Dropped connections during a live call mean lost transcript segments.
Concurrency limits on bulk processing
Sending hundreds of audio files simultaneously will hit concurrency limits. Implement a queue with controlled parallelism - typically 5-10 concurrent requests depending on your plan tier. Monitor 429 responses and back off accordingly.
Large file processing time
Files approaching the 2GB limit can take significant time to process. For very long recordings, consider splitting audio into smaller segments before sending to Deepgram, then reassembling the transcripts downstream.
Speaker diarization accuracy
Diarization works well for 2-3 speakers with clear turn-taking, but accuracy drops with more participants, crosstalk, or poor audio separation. Validate speaker labels before using them for per-speaker analysis or agent scoring.
Callback URL reliability
When using Deepgram's callback feature, your endpoint must be publicly accessible and handle retries. If your callback URL is down when Deepgram tries to deliver results, you may need to re-submit the transcription request.
Cost management at scale
Deepgram charges per audio hour. Real-time streaming and pre-recorded transcription have different pricing. At high volumes, costs can grow quickly. Monitor usage, choose the right model tier for each use case, and avoid re-transcribing audio unnecessarily.
FAQ
Frequently Asked Questions
Explore