Semarize

Get Your Data

Microsoft Teams - How to Get Your Conversation Data

Practical guide to getting conversation data from Microsoft Teams meetings - covering the Microsoft Graph API, meeting transcription settings, compliance exports, webhook-triggered flows, and how to route structured data into downstream systems.

What you'll learn

  • What conversation data you can extract from Teams - meeting transcripts, attendance, chat context, and recording metadata
  • How to access data via the Microsoft Graph API - authentication, permissions, and endpoints
  • Three extraction patterns: compliance content search, Graph API polling, and webhook-triggered via subscriptions
  • How to connect Teams data pipelines to Zapier, n8n, and Make
  • Advanced use cases - custom scoring, CRM enrichment, compliance, and warehouse analytics

Data

What Data You Can Extract From Microsoft Teams

Microsoft Teams captures more than just the recording. Every meeting produces a set of structured assets that can be extracted via the Graph API - the transcript itself, speaker identification, attendance data, and contextual information about the meeting and its associated calendar event.

Common fields teams care about

Meeting transcript text (full utterance-level transcript with speaker labels)
Speaker identification (participant names mapped to utterances)
Meeting metadata (subject, organizer, start/end time, duration)
Attendee list (who joined, when they joined/left)
Recording URL (link to the meeting recording if enabled)
Chat messages (in-meeting chat and channel messages)
Meeting series context (recurring meeting metadata)
Calendar metadata (organizer, required/optional attendees)
Compliance records (eDiscovery and communication compliance exports)
Channel and team context (which team/channel the meeting belongs to)

API Access

How to Get Transcripts via the Microsoft Graph API

Microsoft Teams exposes meetings and transcripts through the Microsoft Graph API. The workflow is: authenticate via Azure AD, find the meeting via calendar events or online meeting endpoints, then fetch the transcript content for each meeting.

1

Authenticate

Register an app in Azure AD (now Microsoft Entra ID). Grant Microsoft Graph permissions: OnlineMeetings.Read, CallRecords.Read.All, Communications.Read. Use OAuth 2.0 client credentials flow for app-only access.

Authorization: Bearer {access_token}
Requires admin consent for application-level permissions. Contact your Azure AD admin to approve the app registration and grant the required Graph API scopes.
2

List meeting transcripts

Call the GET /users/{userId}/onlineMeetings/{meetingId}/transcripts endpoint to get a list of transcript objects for a specific meeting. You'll first need to find the meeting via GET /users/{userId}/onlineMeetings?$filter=... or via calendar events.

GET https://graph.microsoft.com/v1.0/users/{userId}/onlineMeetings/{meetingId}/transcripts

The response returns an array of transcript objects. Paginate using @odata.nextLink if the result set spans multiple pages.

3

Fetch transcript content

For each transcript ID, request the content via GET /users/{userId}/onlineMeetings/{meetingId}/transcripts/{transcriptId}/content. Returns VTT or DOCX format depending on the $format query parameter. Parse VTT to extract speaker-labeled utterances with timestamps.

GET https://graph.microsoft.com/v1.0/users/{userId}/onlineMeetings/{meetingId}/transcripts/{transcriptId}/content

The default format is WebVTT (.vtt), which includes timestamps and speaker labels. Use a VTT parser to extract structured utterances, or request $format=text/vtt or $format=application/vnd.openxmlformats-officedocument.wordprocessingml.document for DOCX format.

4

Handle permissions and throttling

Rate limits

Graph API has per-app and per-tenant throttling. When you receive a 429 response, back off using the Retry-After header. Implement exponential backoff for bulk operations and batch requests where possible.

Transcription availability

Transcription must be enabled at the tenant level by an admin. Not all meeting types produce transcripts - ad-hoc calls, PSTN calls, and meetings where transcription wasn't started won't have transcript data. Build your pipeline to handle missing transcripts gracefully.

Patterns

Key Extraction Flows

There are three practical patterns for getting transcripts out of Microsoft Teams. The right choice depends on whether you're doing a one-off migration, running ongoing extraction, or need near real-time processing.

Backfill (Historical Export)

One-off migration of past meetings

1

Register an Azure AD app with the required Graph API permissions and obtain admin consent

2

Query calendar events for your target date range via GET /users/{userId}/calendar/events?$filter=start/dateTime ge '...' and start/dateTime le '...'

3

Resolve each calendar event to its corresponding onlineMeeting ID via the joinWebUrl or event metadata

4

List and fetch transcripts for each meeting via the transcripts endpoint. Parse VTT into structured utterances

5

Send each parsed transcript to Semarize for structured analysis, then store results in your data warehouse

Tip: Use $filter on calendar events by date range. Not all meetings produce transcripts - skip gracefully when no transcript is found for a meeting.

Incremental Polling

Ongoing extraction on a schedule

1

Set up a scheduled job (cron or Cloud Function) that runs at your desired interval - hourly works well for most teams

2

On each run, query recent calendar events from the last N hours via the Graph API

3

Cross-reference returned meeting IDs with your already-processed set to avoid duplicates

4

Fetch new transcripts, parse VTT to structured text, and route each to Semarize for analysis

5

Store results in your downstream system and update your tracking store with the newly processed meeting IDs

Tip: Track processed meeting IDs in a persistent store to avoid duplicates. Build in a 15-30 minute buffer after meeting end time before attempting to fetch transcripts.

Webhook-Triggered (Graph Subscriptions)

Near real-time on call completion

1

Create a Graph subscription for /communications/callRecords. Your endpoint must be a publicly accessible HTTPS URL that can handle the validation handshake

2

When a call record is created, Microsoft sends a webhook notification to your endpoint with the callRecord ID

3

Use the callRecord ID to find the associated meeting, then resolve to the onlineMeeting and fetch the transcript

4

Parse the VTT content and route it to Semarize for structured analysis, then push results to your downstream systems

Note: Graph subscriptions require a publicly accessible HTTPS endpoint and periodic renewal (max 3 days for callRecords). Build automatic renewal logic to avoid missed events.

Automation

Send Teams Transcripts to Automation Tools

Once you can extract transcripts from Microsoft Teams, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from Teams trigger through Semarize evaluation to CRM, Slack, or database output.

ZapierNo-code automation

Teams → Zapier → Semarize → CRM

Detect new Teams meetings on a schedule, fetch the transcript via Graph API, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.

Example Zap
Trigger: Schedule (Every Hour)
Polls for new meetings on interval
App: Schedule by Zapier
Event: Every Hour
Output: timestamp
Webhooks by Zapier
List recent meetings via Graph API
Method: GET
URL: https://graph.microsoft.com/v1.0/users/{userId}/calendar/events
Auth: Bearer (OAuth token)
Query: $filter=start/dateTime ge '{{timestamp}}'
For each meeting
Webhooks by Zapier
Fetch transcript from Graph API
Method: GET
URL: .../onlineMeetings/{meetingId}/transcripts/{id}/content
Auth: Bearer (OAuth token)
Transcript returned
Webhooks by Zapier
POST /v1/runs (sync) to Semarize
Method: POST
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output returned
Formatter by Zapier
Extract brick values from Semarize response
Extract: bricks.overall_score.value
Extract: bricks.risk_flag.value
Extract: bricks.pain_point.value
Salesforce - Update Record
Write scored signals to Opportunity
Object: Opportunity
AI Score: {{overall_score}}
Risk Flag: {{risk_flag}}
Pain Point: {{pain_point}}

Setup steps

1

Create a new Zap. Choose "Schedule by Zapier" as the trigger and set it to run every hour. This will poll for new meetings.

2

Add a "Webhooks by Zapier" Action (Custom Request) to list recent calendar events from the Graph API. Use your Microsoft OAuth connection or pass a Bearer token. Filter events by start time since the last run.

3

Add another "Webhooks by Zapier" Action to fetch the transcript content for each meeting. Call the Graph API transcript content endpoint with the meeting ID and transcript ID.

4

Add a third "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. Set kit_code, mode to "sync", and map the parsed transcript text into input.transcript.

5

Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, risk_flag, pain_point, etc.

6

Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record. Test each step end-to-end, then turn on the Zap.

Watch out for: OAuth token refresh. Use Zapier's built-in Microsoft connection or store refresh tokens for API calls. VTT transcripts need parsing before sending to Semarize - add a Code step to convert VTT to plain text if needed.
Learn more about Zapier automation
n8nSelf-hosted workflows

Teams → n8n → Semarize → Database

Poll Microsoft Graph for new meetings on a schedule, fetch transcripts, parse VTT to text, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's built-in Microsoft Graph node handles OAuth automatically.

Example Workflow
Cron - Every Hour
Triggers the workflow on schedule
Mode: Every Hour
Timezone: UTC
Microsoft Graph - List Calendar Events
GET /users/{userId}/calendar/events
Resource: Calendar Event
Operation: Get All
Filter: start/dateTime ge {{$now.minus(1, 'hour')}}
For each meeting
HTTP Request - Get Transcript
GET .../transcripts/{id}/content
Auth: OAuth2 (Microsoft Graph)
URL: .../onlineMeetings/{meetingId}/transcripts/{id}/content
Code - Parse VTT to Text
Extract speaker-labeled utterances
Parse: VTT → plain text with speakers
HTTP Request - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Scores & signals returned
Postgres - Insert Row
Write structured output to database
Table: call_evaluations
Columns: meeting_id, score, risk_flag, pain_point

Setup steps

1

Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).

2

Add a Microsoft Graph node to list recent calendar events. Use n8n's built-in Microsoft Graph credential - it handles OAuth token refresh automatically.

3

Add a Split In Batches node to iterate over the returned events. Filter for events that have online meeting URLs to identify Teams meetings.

4

Add an HTTP Request node to resolve each calendar event to its onlineMeeting ID, then fetch the transcript content from the transcripts endpoint.

5

Add a Code node (JavaScript) to parse the VTT content into plain text with speaker labels. Extract timestamps, speaker names, and utterance text.

6

Add another HTTP Request node to send the parsed transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token.

7

Add a Code node to extract the brick values from the Semarize response - overall_score, risk_flag, pain_point, evidence, confidence.

8

Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use meeting_id as the primary key for upserts. Activate the workflow and monitor the first few runs.

Watch out for: n8n has a built-in Microsoft Graph node. Use it for auth instead of raw HTTP requests - it handles token refresh and pagination automatically. Track processed meeting IDs to avoid reprocessing.
Learn more about n8n automation
MakeVisual automation with branching

Teams → Make → Semarize → CRM + Slack

Fetch new Teams meeting transcripts on a schedule, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals to your CRM.

Example Scenario
Schedule - Every 30 min
Triggers the scenario on interval
Interval: 30 minutes
HTTP - List Recent Meetings
GET /users/{userId}/calendar/events
Method: GET
Auth: OAuth2 (Microsoft Graph)
Query: $filter=start/dateTime ge {{formatDate(...)}}
HTTP - Fetch Transcript
GET .../transcripts/{id}/content
Iterator: for each meeting in response
URL: .../onlineMeetings/{meetingId}/transcripts/{id}/content
HTTP - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output
Router - Branch on Risk Flag
Route by Semarize output
Branch 1: IF risk_flag.value = true
Branch 2: ALL (fallthrough)
Branch 1 - Risk detected
Slack - Alert Channel
Notify team about flagged meeting
Channel: #deal-alerts
Message: Risk on {{meeting_id}}, score: {{score}}
Branch 2 - All meetings
Salesforce - Update Record
Write all scored signals to Opportunity
AI Score: {{overall_score}}
Risk Flag: {{risk_flag}}
Pain Point: {{pain_point}}

Setup steps

1

Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15-60 minutes is typical).

2

Add an HTTP module to list recent calendar events from the Graph API. Configure OAuth2 authentication with your Microsoft app registration credentials.

3

Add an Iterator module to loop through each calendar event. Filter for events with online meeting URLs to identify Teams meetings.

4

Add an HTTP module to fetch the transcript content for each meeting. Resolve the calendar event to its onlineMeeting ID, then fetch from the transcripts endpoint.

5

Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript. Parse the response as JSON.

6

Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).

7

On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and meeting ID into the message.

8

On Branch 2, add a Salesforce module to write all brick values (score, risk_flag, pain_point) to the Opportunity record. Set the scenario schedule and activate.

Watch out for: Graph API pagination. Each page returns up to 100 results. Use Make's repeater module to handle @odata.nextLink for large result sets. Also handle VTT parsing - add a Text Parser module before sending to Semarize.
Learn more about Make automation

What you can build

What You Can Do With Teams Data in Semarize

When conversation data becomes structured, grounded, and programmable, new possibilities open up. Here's what you can build.

Compliance Playbook Verification

Grounded Compliance Automation

What Semarize generates

policy_adherence = 0.94disclosure_given = trueunauthorized_promise = falseregulatory_reference_accurate = true

Your legal team maintains a 50-page compliance playbook that every customer-facing call must follow. You run a compliance kit grounded against this document on every Teams meeting. Semarize checks whether required disclosures were given, whether any unauthorized commitments were made, and whether regulatory references were accurate. When auditors arrive, you export a structured evidence report — not recordings, but typed compliance signals with confidence scores and exact text span evidence for every call.

Learn more about QA & Compliance
Compliance VerificationGrounded against: Compliance Playbook v3.1
Required disclosure givenconfidence: 0.96"...I need to inform you..."
No unauthorized promisesconfidence: 0.91No violations detected
Regulatory refs accurateconfidence: 0.89"...under Section 12B..."
Data handling noticeconfidence: 0.94Not mentioned in call
3 of 4 checks passedOverall: 94%

Decision & Action Item Extraction

Meeting Intelligence

What Semarize generates

decisions_made = 4owners_assigned = truedeadlines_set = falsefollow_up_needed = true

Important decisions get lost in hour-long Teams meetings. You run a decision extraction kit on leadership meetings that pulls out every decision, the owner, the deadline (if stated), and any open questions. The output pipes into Jira automatically — decisions become tracked tickets with owners. "I didn't know about that decision" is no longer a valid excuse because every meeting produces a structured, machine-readable decision log.

Learn more about Customer Success
Meeting Decision Log - Q1 Planning4 decisions extracted
DecisionOwnerDeadlineStatus
Expand to APAC in Q3Sarah K.2026-03-15
Hire 2 SDRs for enterpriseTom W.2026-02-28
Deprecate legacy pricingJames M.Not set
Review SOC 2 timelinePriya R.Not set
2 decisions missing deadlines

Onboarding Quality Scoring

Customer Success

What Semarize generates

setup_steps_covered = 0.80customer_confusion_detected = truetime_to_value_risk = "medium"handoff_clarity = 0.65

Your CS team runs onboarding calls on Teams. Onboarding quality is measured by CSAT surveys weeks later — too late to intervene. With Semarize, every onboarding call is scored against your onboarding playbook. You know immediately whether the CSM covered all setup steps, whether the customer showed signs of confusion, and how clearly the next steps were communicated. At-risk onboardings get flagged for same-week intervention, reducing time-to-value by 3 weeks on average.

Learn more about Customer Success
Onboarding Quality - Acme CorpGrounded against: Onboarding Playbook
Setup Coverage80%
Account setup walkthroughcompleted
Integration configurationcompleted
Advanced features overviewmissed
Customer showed confusionflagged at 23:41
Handoff Clarity65%
Flag for follow-up: advanced features session needed

Knowledge Gap Detector

Sales Enablement

Vibe-coded

What Semarize generates

accuracy_score = 0.73incorrect_claims = 2knowledge_gaps = "api_rate_limits"confidence = 0.88

An enablement manager vibe-codes a React dashboard that runs a knowledge accuracy kit — grounded against your product documentation — on every customer call recorded in Teams. The dashboard shows which product areas reps consistently get wrong. "Pricing tiers" shows up red across 4 reps. "API rate limits" is amber for the whole team. Instead of guessing what to train on, the next enablement session targets the exact knowledge gaps the data revealed.

Learn more about Sales Coaching
Knowledge Accuracy by TopicVibe-coded with Next.js
Sarah K.James M.Priya R.Tom W.
Pricing
52%
48%
61%
45%
API Limits
68%
55%
59%
63%
Security
85%
72%
90%
77%
Integrations
78%
81%
74%
88%
< 60
60-80
> 80
Grounded against: Product Docs v4.2 · 47 calls analyzed

Watch out for

Common Challenges & Gotchas

These are the issues that come up most often when teams start extracting transcripts from Microsoft Teams at scale.

Transcription must be enabled at the tenant level

If the admin hasn't enabled transcription, no transcript data will be generated for meetings. This is a tenant-wide policy setting that must be configured before any transcript data becomes available.

Permissions require Azure AD admin consent

Application-level permissions for Microsoft Graph need admin consent, which can be slow in enterprise environments. Plan for lead time when setting up integrations.

Not all meeting types produce transcripts

Ad-hoc calls, PSTN calls, and channel meetings without transcription enabled won't have transcript data. Build your pipeline to skip gracefully when no transcript is found.

VTT format requires parsing

Transcripts come in WebVTT format by default, which needs parsing to extract plain text with speaker labels. You'll need a VTT parser or custom code to convert to structured text.

Graph API throttling

Microsoft Graph enforces per-app and per-tenant rate limits. Implement exponential backoff for 429 responses and batch requests where possible to stay within limits.

Meeting ID resolution

Finding the correct onlineMeeting ID from a calendar event requires additional API calls and can be non-trivial for recurring meetings. Each occurrence has a different meeting ID.

Subscription renewal

Graph webhook subscriptions expire after a short period (max 3 days for call records) and must be renewed programmatically. Build renewal logic into your pipeline to avoid missed events.

FAQ

Frequently Asked Questions

Explore

Explore Semarize