Get Your Data
Microsoft Teams - How to Get Your Conversation Data
Practical guide to getting conversation data from Microsoft Teams meetings - covering the Microsoft Graph API, meeting transcription settings, compliance exports, webhook-triggered flows, and how to route structured data into downstream systems.
What you'll learn
- What conversation data you can extract from Teams - meeting transcripts, attendance, chat context, and recording metadata
- How to access data via the Microsoft Graph API - authentication, permissions, and endpoints
- Three extraction patterns: compliance content search, Graph API polling, and webhook-triggered via subscriptions
- How to connect Teams data pipelines to Zapier, n8n, and Make
- Advanced use cases - custom scoring, CRM enrichment, compliance, and warehouse analytics
Data
What Data You Can Extract From Microsoft Teams
Microsoft Teams captures more than just the recording. Every meeting produces a set of structured assets that can be extracted via the Graph API - the transcript itself, speaker identification, attendance data, and contextual information about the meeting and its associated calendar event.
Common fields teams care about
API Access
How to Get Transcripts via the Microsoft Graph API
Microsoft Teams exposes meetings and transcripts through the Microsoft Graph API. The workflow is: authenticate via Azure AD, find the meeting via calendar events or online meeting endpoints, then fetch the transcript content for each meeting.
Authenticate
Register an app in Azure AD (now Microsoft Entra ID). Grant Microsoft Graph permissions: OnlineMeetings.Read, CallRecords.Read.All, Communications.Read. Use OAuth 2.0 client credentials flow for app-only access.
Authorization: Bearer {access_token}List meeting transcripts
Call the GET /users/{userId}/onlineMeetings/{meetingId}/transcripts endpoint to get a list of transcript objects for a specific meeting. You'll first need to find the meeting via GET /users/{userId}/onlineMeetings?$filter=... or via calendar events.
GET https://graph.microsoft.com/v1.0/users/{userId}/onlineMeetings/{meetingId}/transcriptsThe response returns an array of transcript objects. Paginate using @odata.nextLink if the result set spans multiple pages.
Fetch transcript content
For each transcript ID, request the content via GET /users/{userId}/onlineMeetings/{meetingId}/transcripts/{transcriptId}/content. Returns VTT or DOCX format depending on the $format query parameter. Parse VTT to extract speaker-labeled utterances with timestamps.
GET https://graph.microsoft.com/v1.0/users/{userId}/onlineMeetings/{meetingId}/transcripts/{transcriptId}/contentThe default format is WebVTT (.vtt), which includes timestamps and speaker labels. Use a VTT parser to extract structured utterances, or request $format=text/vtt or $format=application/vnd.openxmlformats-officedocument.wordprocessingml.document for DOCX format.
Handle permissions and throttling
Rate limits
Graph API has per-app and per-tenant throttling. When you receive a 429 response, back off using the Retry-After header. Implement exponential backoff for bulk operations and batch requests where possible.
Transcription availability
Transcription must be enabled at the tenant level by an admin. Not all meeting types produce transcripts - ad-hoc calls, PSTN calls, and meetings where transcription wasn't started won't have transcript data. Build your pipeline to handle missing transcripts gracefully.
Patterns
Key Extraction Flows
There are three practical patterns for getting transcripts out of Microsoft Teams. The right choice depends on whether you're doing a one-off migration, running ongoing extraction, or need near real-time processing.
Backfill (Historical Export)
One-off migration of past meetings
Register an Azure AD app with the required Graph API permissions and obtain admin consent
Query calendar events for your target date range via GET /users/{userId}/calendar/events?$filter=start/dateTime ge '...' and start/dateTime le '...'
Resolve each calendar event to its corresponding onlineMeeting ID via the joinWebUrl or event metadata
List and fetch transcripts for each meeting via the transcripts endpoint. Parse VTT into structured utterances
Send each parsed transcript to Semarize for structured analysis, then store results in your data warehouse
Incremental Polling
Ongoing extraction on a schedule
Set up a scheduled job (cron or Cloud Function) that runs at your desired interval - hourly works well for most teams
On each run, query recent calendar events from the last N hours via the Graph API
Cross-reference returned meeting IDs with your already-processed set to avoid duplicates
Fetch new transcripts, parse VTT to structured text, and route each to Semarize for analysis
Store results in your downstream system and update your tracking store with the newly processed meeting IDs
Webhook-Triggered (Graph Subscriptions)
Near real-time on call completion
Create a Graph subscription for /communications/callRecords. Your endpoint must be a publicly accessible HTTPS URL that can handle the validation handshake
When a call record is created, Microsoft sends a webhook notification to your endpoint with the callRecord ID
Use the callRecord ID to find the associated meeting, then resolve to the onlineMeeting and fetch the transcript
Parse the VTT content and route it to Semarize for structured analysis, then push results to your downstream systems
Automation
Send Teams Transcripts to Automation Tools
Once you can extract transcripts from Microsoft Teams, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from Teams trigger through Semarize evaluation to CRM, Slack, or database output.
Teams → Zapier → Semarize → CRM
Detect new Teams meetings on a schedule, fetch the transcript via Graph API, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.
Setup steps
Create a new Zap. Choose "Schedule by Zapier" as the trigger and set it to run every hour. This will poll for new meetings.
Add a "Webhooks by Zapier" Action (Custom Request) to list recent calendar events from the Graph API. Use your Microsoft OAuth connection or pass a Bearer token. Filter events by start time since the last run.
Add another "Webhooks by Zapier" Action to fetch the transcript content for each meeting. Call the Graph API transcript content endpoint with the meeting ID and transcript ID.
Add a third "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. Set kit_code, mode to "sync", and map the parsed transcript text into input.transcript.
Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, risk_flag, pain_point, etc.
Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record. Test each step end-to-end, then turn on the Zap.
Teams → n8n → Semarize → Database
Poll Microsoft Graph for new meetings on a schedule, fetch transcripts, parse VTT to text, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's built-in Microsoft Graph node handles OAuth automatically.
Setup steps
Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).
Add a Microsoft Graph node to list recent calendar events. Use n8n's built-in Microsoft Graph credential - it handles OAuth token refresh automatically.
Add a Split In Batches node to iterate over the returned events. Filter for events that have online meeting URLs to identify Teams meetings.
Add an HTTP Request node to resolve each calendar event to its onlineMeeting ID, then fetch the transcript content from the transcripts endpoint.
Add a Code node (JavaScript) to parse the VTT content into plain text with speaker labels. Extract timestamps, speaker names, and utterance text.
Add another HTTP Request node to send the parsed transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token.
Add a Code node to extract the brick values from the Semarize response - overall_score, risk_flag, pain_point, evidence, confidence.
Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use meeting_id as the primary key for upserts. Activate the workflow and monitor the first few runs.
Teams → Make → Semarize → CRM + Slack
Fetch new Teams meeting transcripts on a schedule, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals to your CRM.
Setup steps
Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15-60 minutes is typical).
Add an HTTP module to list recent calendar events from the Graph API. Configure OAuth2 authentication with your Microsoft app registration credentials.
Add an Iterator module to loop through each calendar event. Filter for events with online meeting URLs to identify Teams meetings.
Add an HTTP module to fetch the transcript content for each meeting. Resolve the calendar event to its onlineMeeting ID, then fetch from the transcripts endpoint.
Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript. Parse the response as JSON.
Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).
On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and meeting ID into the message.
On Branch 2, add a Salesforce module to write all brick values (score, risk_flag, pain_point) to the Opportunity record. Set the scenario schedule and activate.
What you can build
What You Can Do With Teams Data in Semarize
When conversation data becomes structured, grounded, and programmable, new possibilities open up. Here's what you can build.
Compliance Playbook Verification
Grounded Compliance Automation
What Semarize generates
Your legal team maintains a 50-page compliance playbook that every customer-facing call must follow. You run a compliance kit grounded against this document on every Teams meeting. Semarize checks whether required disclosures were given, whether any unauthorized commitments were made, and whether regulatory references were accurate. When auditors arrive, you export a structured evidence report — not recordings, but typed compliance signals with confidence scores and exact text span evidence for every call.
Learn more about QA & ComplianceDecision & Action Item Extraction
Meeting Intelligence
What Semarize generates
Important decisions get lost in hour-long Teams meetings. You run a decision extraction kit on leadership meetings that pulls out every decision, the owner, the deadline (if stated), and any open questions. The output pipes into Jira automatically — decisions become tracked tickets with owners. "I didn't know about that decision" is no longer a valid excuse because every meeting produces a structured, machine-readable decision log.
Learn more about Customer Success| Decision | Owner | Deadline | Status |
|---|---|---|---|
| Expand to APAC in Q3 | Sarah K. | 2026-03-15 | |
| Hire 2 SDRs for enterprise | Tom W. | 2026-02-28 | |
| Deprecate legacy pricing | James M. | Not set | ⚠ |
| Review SOC 2 timeline | Priya R. | Not set | ⚠ |
Onboarding Quality Scoring
Customer Success
What Semarize generates
Your CS team runs onboarding calls on Teams. Onboarding quality is measured by CSAT surveys weeks later — too late to intervene. With Semarize, every onboarding call is scored against your onboarding playbook. You know immediately whether the CSM covered all setup steps, whether the customer showed signs of confusion, and how clearly the next steps were communicated. At-risk onboardings get flagged for same-week intervention, reducing time-to-value by 3 weeks on average.
Learn more about Customer SuccessKnowledge Gap Detector
Sales Enablement
What Semarize generates
An enablement manager vibe-codes a React dashboard that runs a knowledge accuracy kit — grounded against your product documentation — on every customer call recorded in Teams. The dashboard shows which product areas reps consistently get wrong. "Pricing tiers" shows up red across 4 reps. "API rate limits" is amber for the whole team. Instead of guessing what to train on, the next enablement session targets the exact knowledge gaps the data revealed.
Learn more about Sales Coaching| Sarah K. | James M. | Priya R. | Tom W. | |
|---|---|---|---|---|
| Pricing | 52% | 48% | 61% | 45% |
| API Limits | 68% | 55% | 59% | 63% |
| Security | 85% | 72% | 90% | 77% |
| Integrations | 78% | 81% | 74% | 88% |
Watch out for
Common Challenges & Gotchas
These are the issues that come up most often when teams start extracting transcripts from Microsoft Teams at scale.
Transcription must be enabled at the tenant level
If the admin hasn't enabled transcription, no transcript data will be generated for meetings. This is a tenant-wide policy setting that must be configured before any transcript data becomes available.
Permissions require Azure AD admin consent
Application-level permissions for Microsoft Graph need admin consent, which can be slow in enterprise environments. Plan for lead time when setting up integrations.
Not all meeting types produce transcripts
Ad-hoc calls, PSTN calls, and channel meetings without transcription enabled won't have transcript data. Build your pipeline to skip gracefully when no transcript is found.
VTT format requires parsing
Transcripts come in WebVTT format by default, which needs parsing to extract plain text with speaker labels. You'll need a VTT parser or custom code to convert to structured text.
Graph API throttling
Microsoft Graph enforces per-app and per-tenant rate limits. Implement exponential backoff for 429 responses and batch requests where possible to stay within limits.
Meeting ID resolution
Finding the correct onlineMeeting ID from a calendar event requires additional API calls and can be non-trivial for recurring meetings. Each occurrence has a different meeting ID.
Subscription renewal
Graph webhook subscriptions expire after a short period (max 3 days for call records) and must be renewed programmatically. Build renewal logic into your pipeline to avoid missed events.
FAQ
Frequently Asked Questions
Explore