Get Your Data
Symbl.ai - How to Get Your Conversation Data
A practical guide to getting your conversation data out of Symbl.ai - covering the Async API, Streaming API, Conversation API, Tracker configuration, Nebula summaries, and how to route structured intelligence into your downstream systems.
What you'll learn
- What conversation data you can extract from Symbl.ai - transcripts, topics, action items, questions, sentiment, entities, and Tracker hits
- How to access data via the Symbl.ai APIs - authentication, Async API, Streaming API, and Conversation API
- Three extraction patterns: batch processing, real-time streaming, and webhook-triggered flows
- How to connect Symbl.ai data pipelines to Zapier, n8n, and Make
- Advanced use cases - compliance monitoring, real-time intelligence, sentiment trending, and custom processing pipelines
Data
What Data You Can Extract From Symbl.ai
Symbl.ai is a processing platform - you send it audio, video, or text, and it returns structured conversation intelligence. Unlike platforms that also record calls, Symbl.ai focuses entirely on the analysis layer. Every processed conversation produces a rich set of structured outputs accessible via the Conversation API.
Structured outputs available per conversation
API Access
How to Get Data via the Symbl.ai API
Symbl.ai exposes three main API surfaces: the Async API for batch processing, the Streaming API for real-time analysis, and the Conversation API for retrieving results. The workflow is: authenticate with your App ID and Secret, submit content for processing, then retrieve structured results via the Conversation API.
Authenticate
Symbl.ai uses OAuth 2.0-style token authentication. Send your App ID and App Secret to the POST /oauth2/token/generate endpoint to receive an access token. Include this token as a Bearer token in all subsequent API calls.
POST https://api.symbl.ai/oauth2/token/generate
{
"type": "application",
"appId": "<your_app_id>",
"appSecret": "<your_app_secret>"
}
// Response:
// { "accessToken": "eyJhb...", "expiresIn": 86400 }Submit content via the Async API
The Async API accepts audio, video, or text for batch processing. Submit a file URL via POST /v1/process/audio/url (or the video/text equivalents). The response returns a conversationId and a jobId for tracking processing status.
POST https://api.symbl.ai/v1/process/audio/url
{
"url": "https://storage.example.com/calls/discovery-call.mp3",
"name": "Discovery Call - Acme Corp",
"confidenceThreshold": 0.6,
"detectTopics": true,
"detectActionItems": true,
"detectQuestions": true,
"enableSpeakerDiarization": true,
"diarizationSpeakerCount": 2
}
// Response:
// { "conversationId": "5681...a3f2", "jobId": "9f1c...b7e4" }You can also submit raw audio via POST /v1/process/audio (multipart upload) or text via POST /v1/process/text. Each endpoint returns a conversationId for result retrieval.
Check processing status
Poll the job status via GET /v1/job/{jobId} until the status is completed. Alternatively, pass a webhookUrl in your submission request and Symbl.ai will POST a callback when processing finishes.
GET https://api.symbl.ai/v1/job/9f1c...b7e4
// Response:
// { "id": "9f1c...b7e4", "status": "completed" }Processing time varies by file length and format. Audio files typically process in a fraction of their recording duration. Implement exponential backoff when polling to avoid unnecessary API calls.
Retrieve results via the Conversation API
Structured endpoints
Once processing completes, use the Conversation API to retrieve individual signal types. Each endpoint returns structured JSON for a specific intelligence category: GET /v1/conversations/{id}/messages for transcript, /topics, /action-items, /questions, /follow-ups, /entities, and /analytics.
Trackers & Nebula
Retrieve custom Tracker detections via /trackers - each hit includes the matched phrase, context, and confidence score. For abstractive summaries, call the Nebula endpoint with the conversation ID to generate a natural-language summary of the entire conversation.
Patterns
Key Extraction Flows
There are three practical patterns for processing conversations through Symbl.ai. The right choice depends on whether you're doing a one-off batch analysis, running ongoing extraction from a recording platform, or need real-time intelligence during live calls.
Batch Processing (Async API)
Process historical recordings in bulk
Collect audio/video URLs from your recording platform (Zoom, Teams, your telephony system, cloud storage, etc.)
For each file, POST to /v1/process/audio/url (or /video/url) with your desired configuration - speaker diarisation, topic detection, Tracker IDs, and confidence thresholds
Store the returned conversationId and jobId. Poll GET /v1/job/{jobId} or use webhook callbacks to track completion
Once processing completes, call the Conversation API endpoints to retrieve topics, action items, questions, entities, sentiment, and transcript data
Run your analysis pipeline against the structured output - score with Semarize, push to your warehouse, or route to downstream systems
Real-Time Streaming (WebSocket API)
Live intelligence during conversations
Open a WebSocket connection to wss://api.symbl.ai/v1/streaming/{connectionId}. Pass your access token and configuration (speaker info, Trackers, language settings) in the start_request message
Stream raw audio packets over the WebSocket as the conversation happens. Symbl.ai processes audio in real time and emits structured events back over the same connection
Listen for real-time events: topic_response, action_item_response, question_response, tracker_response, and message_response (transcript segments)
When the conversation ends, send a stop_request. Symbl.ai finalises processing and the conversation becomes available via the Conversation API for full result retrieval
Webhook-Triggered Incremental Processing
Automatic processing when new recordings appear
Set up a webhook in your recording platform (Zoom, Teams, etc.) that fires when a new recording is available. The webhook payload includes the recording URL
Your webhook handler receives the event, extracts the recording URL, and submits it to Symbl.ai's Async API with a webhookUrl callback pointing back to your system
Symbl.ai processes the recording and POSTs a callback to your webhookUrl when complete, including the conversationId
Your callback handler retrieves structured results from the Conversation API and routes them downstream - to Semarize for scoring, your CRM, or your data warehouse
Log the conversationId and recording ID as a deduplication key to prevent reprocessing if webhooks fire multiple times
Automation
Send Symbl.ai Data to Automation Tools
Once you can extract structured conversation data from Symbl.ai, the next step is routing it through Semarize for structured scoring and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from recording trigger through Symbl.ai processing and Semarize evaluation to CRM, Slack, or database output.
Recording → Symbl.ai → Zapier → Semarize → CRM
Detect a new recording from your meeting platform, submit it to Symbl.ai for processing, retrieve the structured output, send it to Semarize for scoring, then write the scored signals directly to your CRM.
Setup steps
Create a new Zap. Choose your recording source as the trigger (Zoom, Teams, or a custom webhook). Connect your account and select the "New Recording" event.
Add a "Webhooks by Zapier" Action to generate a Symbl.ai access token. POST to https://api.symbl.ai/oauth2/token/generate with your App ID and App Secret.
Add another "Webhooks by Zapier" Action to submit the recording URL to Symbl.ai. POST to https://api.symbl.ai/v1/process/audio/url with the recording URL and your processing configuration.
Add a Delay step (2-5 minutes depending on typical call length), then a "Webhooks by Zapier" Action to poll the job status. Alternatively, use Zapier's webhook trigger to receive the Symbl.ai callback.
Add HTTP Request steps to retrieve topics, action items, and messages from the Conversation API using the conversationId.
Add a "Webhooks by Zapier" Action to send the Symbl.ai output to Semarize. POST to https://api.semarize.com/v1/runs with kit_code, mode: "sync", and the transcript in input.transcript.
Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the Semarize brick values to your CRM record.
Test each step end-to-end, then turn on the Zap.
Recording → Symbl.ai → n8n → Semarize → Database
Receive new recording notifications via webhook, process through Symbl.ai, retrieve structured intelligence, send to Semarize for scoring, then write the results to your database. n8n's native loop support handles Symbl.ai job polling and batch processing.
Setup steps
Add a Webhook node as the workflow trigger. Configure your recording platform to POST new recording notifications to this webhook URL.
Add an HTTP Request node to generate a Symbl.ai access token. POST to https://api.symbl.ai/oauth2/token/generate with your App ID and App Secret. Cache the token if running multiple workflows.
Add an HTTP Request node to submit the recording to Symbl.ai. POST to https://api.symbl.ai/v1/process/audio/url with the recording URL, detection options, and speaker diarisation settings.
Add a Wait node (or a polling loop with IF node) to check job status. GET https://api.symbl.ai/v1/job/{jobId} until status is "completed". Use exponential backoff in the loop.
Add HTTP Request nodes to retrieve structured data from the Conversation API - GET /v1/conversations/{id}/messages for transcript, /topics, /action-items, /questions, and /entities.
Add a Code node (JavaScript) to assemble the Symbl.ai output into a clean transcript string. Join message text by speaker, preserving the conversation flow.
Add an HTTP Request node to send the transcript to Semarize. POST to https://api.semarize.com/v1/runs, add your API key as a Bearer token, set kit_code, mode to "sync", and map the transcript into input.transcript.
Add a Postgres (or MySQL / HTTP Request) node to write the structured Semarize output. Use conversation_id as the primary key for upserts.
Activate the workflow. Monitor the first few runs to verify the full pipeline - Symbl.ai processing, result retrieval, Semarize scoring, and database writes.
Recording → Symbl.ai → Make → Semarize → CRM + Slack
Fetch new recordings on a schedule, process through Symbl.ai, retrieve structured intelligence, send to Semarize for scoring, then use a Router to branch the output - alert on risk flags via Slack and write all signals to your CRM.
Setup steps
Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (30 minutes works well for most teams).
Add an HTTP module to generate a Symbl.ai access token. POST to https://api.symbl.ai/oauth2/token/generate with your App ID and App Secret.
Add an HTTP module to fetch new recordings from your source platform. Use the platform's API to list recordings since the last run timestamp.
Add an Iterator module to loop through each recording. For each, add an HTTP module to submit to Symbl.ai's Async API with the recording URL and your processing config.
Add a Sleep module (or HTTP polling loop) to wait for Symbl.ai processing. Then add HTTP modules to retrieve topics, messages, and action items from the Conversation API.
Add an HTTP module to send the assembled transcript to Semarize. POST to https://api.semarize.com/v1/runs with kit_code, mode: "sync", and input.transcript. Parse the response as JSON.
Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).
On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and conversation ID into the message.
On Branch 2, add a Salesforce module to write all brick values to the Opportunity record. Set the scenario schedule and activate.
What you can build
What You Can Do With Symbl.ai Data in Semarize
Semarize unlocks structured compliance scoring, cross-session trend analysis, custom evaluation frameworks, and the ability to build your own intelligence layers on top of Symbl.ai's conversation output.
Real-Time Quality Gate for AI Agents
AI Response Evaluation & Safety Scoring
What Semarize generates
Your product team uses Symbl.ai’s streaming API to power an AI voice agent. Every interaction generates a transcript — but are the AI agent’s responses accurate, relevant, and safe? Pipe Symbl.ai’s real-time transcripts into Semarize after each interaction. A quality gate kit scores every AI conversation for response_relevance, hallucination_detected, tone_appropriate, and safety_violation. When the AI agent tells a customer “we offer a full refund within 90 days” but your policy says 30 days, Semarize flags the hallucination with evidence. The AI team gets a daily quality report with scored, actionable signals from every conversation.
Learn more about AI EvaluationAI Agent Knowledge Accuracy Audit
Grounded Verification for Automated Conversations
What Semarize generates
You process AI voice agent conversations through Symbl.ai — but how do you verify the AI agent gave correct information? Run a knowledge-grounded kit against your policy documents and product database on every AI-handled interaction. Semarize checks whether the AI cited the correct return policy, quoted accurate shipping timelines, and referenced real product specifications. After auditing 10,000 AI conversations, you discover the agent fabricates a “90-day satisfaction guarantee” that doesn’t exist in 3% of calls. The structured output feeds back into your AI agent’s training loop to close specific hallucination patterns.
Learn more about AI EvaluationConversation-Driven Product Feedback Loop
Feature-Level Sentiment & Urgency Scoring
What Semarize generates
Your support team processes customer calls through Symbl.ai. You need to go beyond topics and action items to quantify product sentiment at the feature level. Run every support transcript through a product feedback kit. Semarize scores each call for feature_satisfaction (per feature mentioned), churn_signal_strength, bug_report_severity, and feature_request_urgency. A monthly product board report shows structured feedback from 500+ calls: “Dashboard loading” has a satisfaction score of 32/100, mentioned in 89 calls, with 12 explicit churn threats. Product prioritisation shifts from gut feel to conversation evidence.
Learn more about Customer SuccessCustom Conversation Analytics Platform
End-to-End Scored Data Pipeline
What Semarize generates
A data engineer vibe-codes a FastAPI service that chains Symbl.ai processing with Semarize evaluation. Audio files drop into an S3 bucket, a Lambda triggers Symbl.ai’s async API for transcription, then passes the transcript to Semarize for structured scoring. The scored output lands in Snowflake with typed columns: discovery_depth (int), budget_confirmed (bool), competitor_mentioned (varchar), sentiment_score (float). A dbt model aggregates scores by rep, team, and quarter. The BI team builds dashboards on structured, query-ready conversation data — fully typed and ready for analytics at scale.
Learn more about Data ScienceWatch out for
Common Challenges & Gotchas
These are the issues that come up most often when teams start processing conversations through Symbl.ai at scale.
Async processing is not instant
The Async API queues files for processing. Attempting to retrieve results immediately after submission will return incomplete data. Poll the job status endpoint or use webhook callbacks to know when processing is done.
Access token expiration
Symbl.ai access tokens expire after a set period. Your integration must handle token refresh automatically - failing to do so will cause API calls to return 401 errors mid-pipeline. Cache the token and refresh before expiry.
Concurrent processing limits
Each plan tier has a limit on how many concurrent Async jobs or Streaming connections you can run. Exceeding the limit results in queued or rejected requests. For bulk backfills, implement a job queue with concurrency control.
Tracker configuration drift
Trackers need to be maintained as your business evolves. Competitor names change, product features launch, and compliance language updates. Stale Tracker configurations produce false negatives - regularly audit and update your Tracker vocabulary.
Speaker diarisation accuracy
Speaker separation quality depends on audio quality, microphone setup, and the number of speakers. Overlapping speech and poor-quality audio degrade diarisation accuracy. Validate speaker labels before using them for per-speaker analysis or attribution.
WebSocket connection management
The Streaming API uses WebSocket connections that can drop due to network issues. Implement reconnection logic with state recovery - losing a connection mid-conversation means losing real-time signals unless you have fallback processing via the Async API.
Conversation ID tracking
Every submission to Symbl.ai returns a conversation ID that you need to retrieve results. Losing or failing to store this ID means you cannot access the processed output. Use the conversation ID as your primary key and store it immediately after submission.
FAQ
Frequently Asked Questions
Explore