Get Your Data
Zoom - How to Get Your Conversation Data
A practical guide to getting your conversation data from Zoom meetings - covering the Zoom API, cloud recording transcripts, real-time webhooks, and how to route structured data into downstream systems.
What you'll learn
- What conversation data you can extract from Zoom - meeting transcripts, recording metadata, participant details, and audio files
- How to access data via the Zoom API - Server-to-Server OAuth, endpoints, and pagination
- Three extraction patterns: cloud recording downloads, scheduled polling, and webhook-triggered via Zoom events
- How to connect Zoom data pipelines to Zapier, n8n, and Make
- Advanced use cases - custom scoring, CRM enrichment, compliance, and warehouse analytics
Data
What Data You Can Extract From Zoom
Zoom cloud recordings produce a rich set of data assets beyond just the video file. Every meeting with cloud recording enabled generates transcripts, audio files, participant logs, and metadata that can be extracted via the Zoom API.
Common fields teams care about
API Access
How to Get Transcripts via the Zoom API
Zoom exposes cloud recordings and transcripts through a REST API. The workflow is: authenticate with Server-to-Server OAuth, list recordings by date range, then download the transcript file for each meeting.
Authenticate
Create a Server-to-Server OAuth app in the Zoom App Marketplace. Grant scopes: cloud_recording:read:admin, meeting:read:admin. Generate an access token via POST https://zoom.us/oauth/token with account_credentials grant type.
Authorization: Bearer {access_token}List recordings
Call the GET /v2/users/{userId}/recordings endpoint with from and to date parameters. Returns meetings with their recording files. Each meeting object contains an array of recording files (video, audio, transcript). Pagination via next_page_token.
GET https://api.zoom.us/v2/users/{userId}/recordings?from=2026-01-01&to=2026-01-31The response returns an array of meeting objects, each containing recording_files[] with file_type, download_url, and status. Keep paginating until next_page_token is empty.
Download transcript
Find the recording file where file_type is TRANSCRIPT. Download it using the download_url with your access token. The file is in VTT format with timestamps and speaker labels. Parse VTT to extract utterances.
GET {download_url}?access_token={token}The response is a WebVTT file containing timestamped utterances with speaker labels. Parse the VTT format to extract plain text segments by speaker for downstream analysis.
Handle recording availability and limits
Cloud recording requirement
Cloud recordings are available only if cloud recording is enabled. Local recordings do not appear in the API.
Rate limits & retention
Zoom enforces API rate limits (varies by plan). Use the Retry-After header. Recordings are auto-deleted after the retention period set by admin.
Patterns
Key Extraction Flows
There are three practical patterns for getting transcripts out of Zoom. The right choice depends on whether you're doing a one-off migration, running ongoing extraction, or need near real-time processing.
Backfill (Historical Export)
One-off migration of past meetings
Create a Server-to-Server OAuth app in the Zoom App Marketplace and generate an access token
List recordings by date range using GET /v2/users/{userId}/recordings with from and to parameters
Filter the recording files array for entries where file_type is TRANSCRIPT
Download and parse the VTT transcript files, extracting plain text with speaker attribution
Send the parsed transcript text to Semarize for structured analysis
Incremental Polling
Ongoing extraction on a schedule
Schedule a job (cron, cloud function, or automation tool) to run at regular intervals
List recordings from the last N hours using the from date parameter
Cross-reference returned meeting UUIDs against your list of already-processed meetings
Download transcripts for new meetings and parse the VTT format
Route the parsed transcript text to Semarize for structured analysis
Webhook-Triggered
Near real-time on recording completion
Create a Zoom webhook-only app in the Zoom App Marketplace and subscribe to the recording.completed event
When the webhook fires, parse the event payload to extract download URLs and meeting metadata
Download the transcript file from the payload's download URL using the included download token
Parse the VTT content and route the structured transcript to Semarize for analysis
Automation
Send Zoom Transcripts to Automation Tools
Once you can extract transcripts from Zoom, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from Zoom trigger through Semarize evaluation to CRM, Slack, or database output.
Zoom → Zapier → Semarize → CRM
Detect new Zoom recordings, download the transcript, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.
Setup steps
Create a new Zap. Choose Zoom as the trigger app and select "New Recording" as the event. Connect your Zoom account.
Add a "Webhooks by Zapier" Action (Custom Request) to download the transcript. Filter the recording files for file_type = TRANSCRIPT and GET the download_url with your access token.
Add a second "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the parsed transcript text into input.transcript.
Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, risk_flag, pain_point, etc.
Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record.
Test each step end-to-end, then turn on the Zap.
Zoom → n8n → Semarize → Database
Poll Zoom for new recordings on a schedule, download transcripts, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles pagination and batch processing.
Setup steps
Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).
Add an HTTP Request node to list new recordings from Zoom. Set method to GET, URL to https://api.zoom.us/v2/users/me/recordings, configure OAuth2 auth, and set the from parameter to one interval ago.
Add a Split In Batches node to iterate over the returned meetings. Inside the loop, filter recording files for file_type = TRANSCRIPT.
Add an HTTP Request node to download each VTT transcript file using the download_url from the recording file object.
Add a Code node (JavaScript) to parse the VTT format into structured utterances. Extract speaker labels and text segments.
Add another HTTP Request node to send the parsed transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.
Add a Code node to extract the brick values from the Semarize response - overall_score, risk_flag, pain_point, evidence, confidence.
Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use meeting UUID as the primary key for upserts. Activate the workflow and monitor the first few runs.
Zoom → Make → Semarize → CRM + Slack
Receive Zoom recording webhooks, download transcripts, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals to your CRM.
Setup steps
Create a new Scenario. Add a Webhook module as the trigger to receive Zoom's recording.completed event.
Configure your Zoom webhook-only app to send recording.completed events to your Make webhook URL. Handle the endpoint.url_validation verification request.
Add an HTTP module to download the transcript. Filter the recording files for file_type = TRANSCRIPT and GET the download_url with the included download token.
Add a Text Parser module or Code module to parse the VTT format into plain text with speaker labels.
Add another HTTP module to send the parsed transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the previous step. Parse the response as JSON.
Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).
On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and meeting ID into the message.
On Branch 2, add a Salesforce module to write all brick values (score, risk_flag, pain_point) to the Opportunity record. Activate the scenario.
What you can build
What You Can Do With Zoom Data in Semarize
When raw transcripts become structured, grounded, and programmable, new possibilities open up. Here's what you can build.
Customer Research Insight Extraction
Product Intelligence
What Semarize generates
Your product team runs 30 customer research calls per month on Zoom. Insights live in a PM’s notebook — if they remember to take notes. With Semarize, every call is scored against a product feedback kit. Feature requests, pain points, competitor mentions, and sentiment signals are extracted as structured data. The output feeds directly into your product backlog tool — each feature request becomes a typed record with severity score, customer segment, and the exact quote as evidence. Product decisions are made on aggregated signal data, not anecdotes.
Learn more about Data Science use cases“...we need to import 10,000 records and there's no way to do it...”
“...we'd love real-time notifications when...”
“...it took us three weeks to get the team set up...”
Meeting Effectiveness Scoring
Operational Intelligence
What Semarize generates
Your COO wants to fix meeting culture. You run a meeting effectiveness kit on every internal Zoom call. It scores whether an agenda was followed, how many decisions were explicitly stated, whether action items were assigned with owners, and whether the meeting stayed on track. Scores feed a weekly report showing which teams run effective meetings and which waste time. After one quarter of visibility, average meeting efficiency scores improve 30% — because what gets measured gets managed.
Learn more about QA & ComplianceTraining Session Evaluation
Enablement Intelligence
What Semarize generates
Your enablement team records training sessions on Zoom. Instead of manually reviewing 2-hour recordings, they run a training evaluation kit grounded against the actual curriculum document. Semarize checks whether the trainer covered all required topics, whether roleplay exercises met the rubric, and flags any statements that contradict the official playbook. New hires get a structured skills report after each session — and trainers get feedback on what they missed.
Learn more about Sales CoachingGrounded against: Sales Curriculum v2
Playbook accuracy: 91% - 1 inaccurate claim detected
Multi-Platform Coaching Console
Sales Coaching
What Semarize generates
Your reps use both Zoom and Gong for customer calls depending on the account. A sales manager vibe-codes a coaching console that pulls Semarize scores from calls on both platforms. The console shows a unified skill view per rep — discovery, objection handling, closing — regardless of which tool recorded the call. One consistent scoring framework, one dashboard, no vendor lock-in. Coaching decisions are based on ALL conversations, not just the ones in one platform.
Learn more about Sales CoachingWatch out for
Common Challenges & Gotchas
These are the issues that come up most often when teams start extracting transcripts from Zoom at scale.
Cloud recording must be enabled
Transcripts are only generated for cloud recordings. Local recordings don't produce API-accessible transcripts.
Server-to-Server OAuth setup
JWT apps are deprecated. Setting up S2S OAuth requires marketplace app creation and admin approval.
VTT parsing required
Transcripts come as WebVTT files that need parsing to extract plain text with speaker labels.
Recording retention limits
Cloud recordings are auto-deleted after the admin-configured retention period. Export before they expire.
API rate limits vary by plan
Rate limits differ between Pro, Business, and Enterprise plans. Implement backoff logic.
Meeting UUID vs Meeting ID
Zoom uses two identifiers. Recurring meetings share a meeting ID but have unique UUIDs per instance. Always use UUID for API calls.
Transcript quality varies
Auto-generated transcripts may have accuracy issues with accents, technical jargon, or poor audio quality. Speaker attribution can be inconsistent for dial-in participants.
FAQ
Frequently Asked Questions
Explore