Get Your Data
Fireflies.ai - How to Get Your Meeting Data
A practical guide to getting your meeting data out of Fireflies.ai - covering GraphQL API access, transcript extraction, batch polling, webhook-triggered flows, and how to route structured data into your downstream systems.
What you'll learn
- What meeting data you can extract from Fireflies.ai - transcripts, sentences, speaker labels, audio/video URLs, and meeting metadata
- How to access data via the Fireflies GraphQL API - API key authentication, queries, and pagination
- Two extraction patterns: batch polling and webhook-triggered processing
- How to connect Fireflies data pipelines to Zapier, n8n, and Make
- Advanced use cases - meeting intelligence, deal tracking, speaker analysis, and custom dashboards
Data
What Data You Can Extract From Fireflies.ai
Fireflies.ai is an AI meeting assistant that joins your calls, records them, and generates transcripts automatically. Every meeting produces a rich set of structured data that can be extracted via the GraphQL API - the full transcript, per-sentence speaker labels, timing metadata, audio and video URLs, and contextual meeting information.
Common fields teams care about
API Access
How to Get Transcripts via the Fireflies GraphQL API
Fireflies exposes meeting data through a GraphQL API at api.fireflies.ai/graphql. The workflow is: authenticate with an API key, query transcripts with filters, then extract the fields you need from the response.
Authenticate
Fireflies uses API key authentication. Generate your API key from app.fireflies.ai/integrations/custom/fireflies. Pass it as a Bearer token in the Authorization header on every request.
Authorization: Bearer <your_fireflies_api_key> Content-Type: application/json
List transcripts
Use the transcripts query to list meetings. You can filter by date, organizer email, or other fields. Results are returned as a paginated array.
POST https://api.fireflies.ai/graphql
{
"query": "query {
transcripts {
id
title
date
duration
organizer_email
participants
}
}"
}The response returns an array of transcript objects with id, title, date, duration, and participant information. Use the IDs to fetch detailed transcript content in the next step.
Fetch the full transcript
For each transcript ID, query the transcript field with the specific fields you need. The sentences array gives you per-sentence granularity with speaker labels.
POST https://api.fireflies.ai/graphql
{
"query": "query Transcript($id: String!) {
transcript(id: $id) {
id
title
date
duration
organizer_email
participants
audio_url
video_url
sentences {
speaker_name
text
start_time
end_time
}
}
}",
"variables": {
"id": "abc123def456"
}
}Each object in the sentences array includes speaker_name, text, start_time, and end_time. Concatenate sentence text for a plain transcript, or preserve the structured format for per-speaker analysis.
Handle rate limits and media URLs
Rate limits
Fireflies enforces daily request limits that vary by plan. Free and Pro plans allow50 requests/day. Business+ plans offer higher or unlimited limits. Each GraphQL query counts as one request regardless of how many fields you request. Plan your extraction to stay within limits, especially during backfills.
Media URL expiration
The audio_url and video_url fields return time-limited download links that expire after approximately 24 hours. If your workflow needs the audio or video files, download them immediately upon retrieval - don't store the URL for later use.
Patterns
Key Extraction Flows
There are two practical patterns for getting transcripts out of Fireflies.ai. The right choice depends on whether you're doing a one-off migration and ongoing batch polling, or need near real-time processing via webhooks.
Batch Polling (Backfill & Ongoing)
Scheduled extraction of transcripts
Set up a scheduled trigger (daily or hourly) that runs your extraction script. For historical backfills, run in daily batches to stay within API rate limits
Query the transcripts endpoint via GraphQL to list recent meetings. Filter by date to fetch only new transcripts since your last poll
For each transcript ID returned, fetch the full transcript including sentences, speaker labels, and metadata. Each fetch counts as one API request
Store each transcript with its metadata (transcript ID, date, participants, duration) in your data warehouse or object store
Route stored transcripts to your analysis pipeline - Semarize for structured scoring, your CRM for enrichment, or a dashboard for reporting
Webhook-Triggered
Near real-time on transcription completion
Register a webhook endpoint in your Fireflies settings or use the Zapier "Transcription Complete" trigger. Fireflies fires an event when a meeting transcript is ready
When the webhook fires, parse the event payload to extract the transcript ID and basic meeting metadata
Immediately fetch the full transcript via the GraphQL API using the transcript ID from the event payload
Route the transcript and metadata downstream - to Semarize for structured analysis, your CRM for enrichment, or Slack for notifications
Automation
Send Fireflies Transcripts to Automation Tools
Once you can extract transcripts from Fireflies, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from Fireflies trigger through Semarize evaluation to CRM, Slack, or database output.
Fireflies → Zapier → Semarize → CRM
Detect new Fireflies transcriptions, fetch the full transcript, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.
Setup steps
Create a new Zap. Choose Fireflies.ai as the trigger app and select "Transcription Complete" as the event. Connect your Fireflies account.
Add a "Webhooks by Zapier" Action (Custom Request) to fetch the full transcript from Fireflies. Set method to POST, URL to https://api.fireflies.ai/graphql, add your Bearer token, and pass a GraphQL query for the transcript ID from the trigger.
Add a Code step or Formatter to concatenate the sentences array into a plain text transcript. Join each sentence's text, prefixed by speaker_name.
Add a second "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the transcript text into input.transcript.
Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, risk_flag, action_items, etc.
Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record. Test each step end-to-end, then turn on the Zap.
Fireflies → n8n → Semarize → Database
Poll Fireflies for new transcripts on a schedule, fetch each one via GraphQL, send to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles pagination and batch processing.
Setup steps
Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams, but daily may be better for Free/Pro plans to conserve API requests).
Add an HTTP Request node to list new transcripts from Fireflies. Set method to POST, URL to https://api.fireflies.ai/graphql, configure Bearer auth, and send a GraphQL query for recent transcripts.
Add a Code node to filter results to only transcripts newer than your last successful poll. Store the last poll timestamp in a static data node or external store.
Add a Split In Batches node to iterate over the returned transcript IDs. Inside the loop, add an HTTP Request node to fetch each full transcript via GraphQL.
Add a Code node (JavaScript) to reassemble the sentences array into a single transcript string. Join each sentence's text, prefixed by speaker_name.
Add another HTTP Request node to send the transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.
Add a Code node to extract the brick values from the Semarize response - overall_score, risk_flag, action_items, evidence, confidence.
Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use transcript_id as the primary key for upserts. Activate the workflow and monitor the first few runs.
Fireflies → Make → Semarize → CRM + Slack
Fetch new Fireflies transcripts on a schedule, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals to your CRM.
Setup steps
Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15-60 minutes is typical).
Add an HTTP module to list new transcripts from Fireflies. Set method to POST, URL to https://api.fireflies.ai/graphql, configure Bearer auth, and send a GraphQL query for recent transcripts.
Add an Iterator module to loop through each transcript. For each, add an HTTP module to fetch the full transcript via GraphQL with the transcript ID.
Add a Text Aggregator or Tools module to concatenate the sentences array into plain text. Join speaker_name and text for each sentence.
Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the previous step. Parse the response as JSON.
Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).
On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and meeting title into the message.
On Branch 2, add a Salesforce module to write all brick values (score, risk_flag, action_items) to the Opportunity record. Set the scenario schedule and activate.
What you can build
What You Can Do With Fireflies Data in Semarize
Custom scoring frameworks, multi-meeting deal tracking, speaker performance benchmarking, and building your own tools on structured meeting signals.
Feature Claim Verification at Scale
Source-of-Truth Grounded QA
What Semarize generates
Your team runs sales meetings every day. Feature claims, integration capabilities, and pricing references get stated on every call — but are they accurate? Run a knowledge-grounded kit against your product documentation on every meeting. Semarize verifies each feature claim, integration capability statement, and pricing reference against the source of truth. After 200 meetings, the data shows reps consistently overstate API rate limits and misquote the enterprise tier’s SSO configuration. Product marketing gets a weekly accuracy report targeting the exact knowledge gaps — messaging corrections happen within days, not after the next QBR.
Learn more about QA & ComplianceMeeting Outcome Accountability Scoring
Decision Quality & Follow-Through
What Semarize generates
Your company runs product, engineering, and customer success meetings — all recorded. Leadership wants to know which meetings actually produce outcomes with accountability. Run every meeting through an outcome accountability kit. Semarize scores each for committed_decisions (decisions with explicit owners and deadlines), owner_assignment_rate, decision_evidence_quality (was the decision grounded in data?), and follow_through_score (did previous meeting’s decisions get referenced?). A quarterly report shows that engineering standups produce 3.2 committed decisions per hour but CS team meetings produce 0.8 — and only 45% of CS decisions have assigned owners. The CS director restructures meetings around decision-forcing frameworks with explicit ownership. Accountability scores improve 2x within a month.
Learn more about Customer SuccessStale Battlecard Detection
Competitive Intel Currency Scoring
What Semarize generates
Competitive battlecards get updated quarterly — but are reps actually using the current version on calls? Run a knowledge-grounded kit against your latest competitive intelligence docs on every sales meeting. Semarize checks each competitive claim against the current battlecard: is the competitor pricing they quoted still accurate? Did they use the approved positioning statement? Did they miss a key differentiator? After 300 meetings, the data shows 2 battlecard sections are cited incorrectly in 40% of competitive conversations. Product marketing updates those sections and measures adoption the following week.
Learn more about QA & ComplianceCustom Meeting ROI Calculator
Cost-per-Decision Analysis
What Semarize generates
A COO vibe-codes a React app that calculates the actual cost of meetings by combining Semarize scores from Fireflies transcripts with calendar and compensation data. Each meeting gets a decision_density score, a projected_revenue_impact (based on deal signals extracted), and a cost_per_decision (meeting duration × attendee hourly cost ÷ decisions made). The app reveals that the company spends $2.1M/year on meetings, but only 34% produce actionable decisions. “Status update” meetings cost $14,200 per decision vs. $1,800 for deal review meetings. The executive team cuts 30% of status meetings and redirects time to deal reviews.
Learn more about RevOpsWatch out for
Common Challenges & Gotchas
These are the issues that come up most often when teams start extracting transcripts from Fireflies.ai at scale.
Daily API request limits
Free and Pro plans are capped at 50 API requests per day. If you need to backfill hundreds of transcripts, you'll burn through the limit quickly. Plan your extraction in daily batches or upgrade to a Business plan for higher limits.
Time-limited media URLs
Audio and video download URLs returned by the API expire after approximately 24 hours. If your pipeline fetches a URL but doesn't download the file immediately, the link will be dead when you try to use it later. Always download media assets right away.
Video access requires Business plan
The video_url field is only populated for accounts on Business plans or higher. If you're on a Free or Pro plan and your workflow depends on video access, the field will be null. Plan your pipeline around audio-only or transcript-only processing if needed.
GraphQL query complexity
Unlike REST APIs, the Fireflies API uses GraphQL. If your team isn't familiar with GraphQL syntax, the learning curve can slow down initial setup. Structure your queries carefully - requesting too many nested fields can also slow down response times.
Speaker identification accuracy
Speaker labels depend on Fireflies correctly mapping participants from calendar invites and platform integrations. Unregistered guests, phone dial-ins, or unnamed participants can appear as generic labels. Validate speaker names before relying on them for per-speaker analysis.
Transcript processing delay
Transcripts are not available instantly after a meeting ends. Processing typically takes 5 to 15 minutes but can be longer during peak hours. If your automation triggers immediately on meeting end, it may fetch incomplete or unavailable data. Build in a retry with delay.
FAQ
Frequently Asked Questions
Explore