Semarize

Get Your Data

Symbl.ai - How to Get Your Conversation Data

A practical guide to getting your conversation data out of Symbl.ai - covering the Async API, Streaming API, Conversation API, Tracker configuration, Nebula summaries, and how to route structured intelligence into your downstream systems.

What you'll learn

  • What conversation data you can extract from Symbl.ai - transcripts, topics, action items, questions, sentiment, entities, and Tracker hits
  • How to access data via the Symbl.ai APIs - authentication, Async API, Streaming API, and Conversation API
  • Three extraction patterns: batch processing, real-time streaming, and webhook-triggered flows
  • How to connect Symbl.ai data pipelines to Zapier, n8n, and Make
  • Advanced use cases - compliance monitoring, real-time intelligence, sentiment trending, and custom processing pipelines

Data

What Data You Can Extract From Symbl.ai

Symbl.ai is a processing platform - you send it audio, video, or text, and it returns structured conversation intelligence. Unlike platforms that also record calls, Symbl.ai focuses entirely on the analysis layer. Every processed conversation produces a rich set of structured outputs accessible via the Conversation API.

Structured outputs available per conversation

Full transcript with speaker labels
Topics (auto-detected conversation themes)
Action items with assignee detection
Follow-up suggestions
Questions asked during conversation
Sentiment analysis (per message and overall)
Named entities (people, orgs, dates, etc.)
Tracker hits (custom keyword/phrase detection)
Conversation analytics (talk ratios, silence, etc.)
Nebula abstractive summaries

API Access

How to Get Data via the Symbl.ai API

Symbl.ai exposes three main API surfaces: the Async API for batch processing, the Streaming API for real-time analysis, and the Conversation API for retrieving results. The workflow is: authenticate with your App ID and Secret, submit content for processing, then retrieve structured results via the Conversation API.

1

Authenticate

Symbl.ai uses OAuth 2.0-style token authentication. Send your App ID and App Secret to the POST /oauth2/token/generate endpoint to receive an access token. Include this token as a Bearer token in all subsequent API calls.

POST https://api.symbl.ai/oauth2/token/generate

{
  "type": "application",
  "appId": "<your_app_id>",
  "appSecret": "<your_app_secret>"
}

// Response:
// { "accessToken": "eyJhb...", "expiresIn": 86400 }
Access tokens expire after 24 hours by default. Your integration must handle automatic token refresh - cache the token and regenerate before the expiresIn window closes to avoid mid-pipeline auth failures.
2

Submit content via the Async API

The Async API accepts audio, video, or text for batch processing. Submit a file URL via POST /v1/process/audio/url (or the video/text equivalents). The response returns a conversationId and a jobId for tracking processing status.

POST https://api.symbl.ai/v1/process/audio/url

{
  "url": "https://storage.example.com/calls/discovery-call.mp3",
  "name": "Discovery Call - Acme Corp",
  "confidenceThreshold": 0.6,
  "detectTopics": true,
  "detectActionItems": true,
  "detectQuestions": true,
  "enableSpeakerDiarization": true,
  "diarizationSpeakerCount": 2
}

// Response:
// { "conversationId": "5681...a3f2", "jobId": "9f1c...b7e4" }

You can also submit raw audio via POST /v1/process/audio (multipart upload) or text via POST /v1/process/text. Each endpoint returns a conversationId for result retrieval.

3

Check processing status

Poll the job status via GET /v1/job/{jobId} until the status is completed. Alternatively, pass a webhookUrl in your submission request and Symbl.ai will POST a callback when processing finishes.

GET https://api.symbl.ai/v1/job/9f1c...b7e4

// Response:
// { "id": "9f1c...b7e4", "status": "completed" }

Processing time varies by file length and format. Audio files typically process in a fraction of their recording duration. Implement exponential backoff when polling to avoid unnecessary API calls.

4

Retrieve results via the Conversation API

Structured endpoints

Once processing completes, use the Conversation API to retrieve individual signal types. Each endpoint returns structured JSON for a specific intelligence category: GET /v1/conversations/{id}/messages for transcript, /topics, /action-items, /questions, /follow-ups, /entities, and /analytics.

Trackers & Nebula

Retrieve custom Tracker detections via /trackers - each hit includes the matched phrase, context, and confidence score. For abstractive summaries, call the Nebula endpoint with the conversation ID to generate a natural-language summary of the entire conversation.

Patterns

Key Extraction Flows

There are three practical patterns for processing conversations through Symbl.ai. The right choice depends on whether you're doing a one-off batch analysis, running ongoing extraction from a recording platform, or need real-time intelligence during live calls.

Batch Processing (Async API)

Process historical recordings in bulk

1

Collect audio/video URLs from your recording platform (Zoom, Teams, your telephony system, cloud storage, etc.)

2

For each file, POST to /v1/process/audio/url (or /video/url) with your desired configuration - speaker diarisation, topic detection, Tracker IDs, and confidence thresholds

3

Store the returned conversationId and jobId. Poll GET /v1/job/{jobId} or use webhook callbacks to track completion

4

Once processing completes, call the Conversation API endpoints to retrieve topics, action items, questions, entities, sentiment, and transcript data

5

Run your analysis pipeline against the structured output - score with Semarize, push to your warehouse, or route to downstream systems

Tip: Respect concurrent processing limits. Submit files in controlled batches and track each jobId. If a batch is interrupted, you can resume from the last unprocessed file without re-submitting completed ones.

Real-Time Streaming (WebSocket API)

Live intelligence during conversations

1

Open a WebSocket connection to wss://api.symbl.ai/v1/streaming/{connectionId}. Pass your access token and configuration (speaker info, Trackers, language settings) in the start_request message

2

Stream raw audio packets over the WebSocket as the conversation happens. Symbl.ai processes audio in real time and emits structured events back over the same connection

3

Listen for real-time events: topic_response, action_item_response, question_response, tracker_response, and message_response (transcript segments)

4

When the conversation ends, send a stop_request. Symbl.ai finalises processing and the conversation becomes available via the Conversation API for full result retrieval

Note: WebSocket connections have concurrent limits per account. Implement reconnection logic with exponential backoff - a dropped connection during a live call means lost real-time signals unless you fall back to the Async API for post-call processing.

Webhook-Triggered Incremental Processing

Automatic processing when new recordings appear

1

Set up a webhook in your recording platform (Zoom, Teams, etc.) that fires when a new recording is available. The webhook payload includes the recording URL

2

Your webhook handler receives the event, extracts the recording URL, and submits it to Symbl.ai's Async API with a webhookUrl callback pointing back to your system

3

Symbl.ai processes the recording and POSTs a callback to your webhookUrl when complete, including the conversationId

4

Your callback handler retrieves structured results from the Conversation API and routes them downstream - to Semarize for scoring, your CRM, or your data warehouse

5

Log the conversationId and recording ID as a deduplication key to prevent reprocessing if webhooks fire multiple times

Tip: Using Symbl.ai's webhookUrl callback eliminates the need to poll for job completion. Your pipeline only activates when results are actually ready, reducing unnecessary API calls and simplifying your architecture.

Automation

Send Symbl.ai Data to Automation Tools

Once you can extract structured conversation data from Symbl.ai, the next step is routing it through Semarize for structured scoring and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from recording trigger through Symbl.ai processing and Semarize evaluation to CRM, Slack, or database output.

ZapierNo-code automation

Recording → Symbl.ai → Zapier → Semarize → CRM

Detect a new recording from your meeting platform, submit it to Symbl.ai for processing, retrieve the structured output, send it to Semarize for scoring, then write the scored signals directly to your CRM.

Example Zap
Trigger: New Recording
Fires when a new recording is available
App: Zoom / Teams / Custom Webhook
Event: New Recording Available
Output: recording_url, meeting_id
Webhooks by Zapier
Submit audio to Symbl.ai Async API
Method: POST
URL: https://api.symbl.ai/v1/process/audio/url
Auth: Bearer {{access_token}}
Body: { url: {{recording_url}}, detectTopics: true }
Poll until completed
Webhooks by Zapier
Retrieve results from Conversation API
GET /v1/conversations/{{conversationId}}/topics
GET /v1/conversations/{{conversationId}}/action-items
GET /v1/conversations/{{conversationId}}/messages
Structured data retrieved
Webhooks by Zapier
POST /v1/runs (sync) to Semarize
Method: POST
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output returned
Salesforce - Update Record
Write scored signals to Opportunity
Object: Opportunity
AI Score: {{overall_score}}
Risk Flag: {{risk_flag}}
Topics: {{top_topics}}

Setup steps

1

Create a new Zap. Choose your recording source as the trigger (Zoom, Teams, or a custom webhook). Connect your account and select the "New Recording" event.

2

Add a "Webhooks by Zapier" Action to generate a Symbl.ai access token. POST to https://api.symbl.ai/oauth2/token/generate with your App ID and App Secret.

3

Add another "Webhooks by Zapier" Action to submit the recording URL to Symbl.ai. POST to https://api.symbl.ai/v1/process/audio/url with the recording URL and your processing configuration.

4

Add a Delay step (2-5 minutes depending on typical call length), then a "Webhooks by Zapier" Action to poll the job status. Alternatively, use Zapier's webhook trigger to receive the Symbl.ai callback.

5

Add HTTP Request steps to retrieve topics, action items, and messages from the Conversation API using the conversationId.

6

Add a "Webhooks by Zapier" Action to send the Symbl.ai output to Semarize. POST to https://api.semarize.com/v1/runs with kit_code, mode: "sync", and the transcript in input.transcript.

7

Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the Semarize brick values to your CRM record.

8

Test each step end-to-end, then turn on the Zap.

Watch out for: Symbl.ai access tokens expire after 24 hours. For long-running Zaps, add a token refresh step at the start of each run. Also, Zapier has step data size limits - for very long transcripts, store the Symbl.ai output in cloud storage and pass a reference URL to Semarize.
Learn more about Zapier automation
n8nSelf-hosted workflows

Recording → Symbl.ai → n8n → Semarize → Database

Receive new recording notifications via webhook, process through Symbl.ai, retrieve structured intelligence, send to Semarize for scoring, then write the results to your database. n8n's native loop support handles Symbl.ai job polling and batch processing.

Example Workflow
Webhook - New Recording
Receives callback from recording platform
Method: POST
Path: /symbl-ingest
Output: recording_url, metadata
HTTP Request - Auth Token
POST /oauth2/token/generate (Symbl.ai)
Body: { type: 'application', appId, appSecret }
HTTP Request - Submit Audio
POST /v1/process/audio/url (Symbl.ai)
URL: https://api.symbl.ai/v1/process/audio/url
Body: { url: {{recording_url}}, detectTopics: true }
Output: conversationId, jobId
Poll until completed
HTTP Request - Conversation API
GET /v1/conversations/{id}/topics + messages
Fetch: topics, action-items, messages
HTTP Request - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Scores & signals returned
Postgres - Insert Row
Write structured output to database
Table: call_evaluations
Columns: conversation_id, score, topics, action_items

Setup steps

1

Add a Webhook node as the workflow trigger. Configure your recording platform to POST new recording notifications to this webhook URL.

2

Add an HTTP Request node to generate a Symbl.ai access token. POST to https://api.symbl.ai/oauth2/token/generate with your App ID and App Secret. Cache the token if running multiple workflows.

3

Add an HTTP Request node to submit the recording to Symbl.ai. POST to https://api.symbl.ai/v1/process/audio/url with the recording URL, detection options, and speaker diarisation settings.

4

Add a Wait node (or a polling loop with IF node) to check job status. GET https://api.symbl.ai/v1/job/{jobId} until status is "completed". Use exponential backoff in the loop.

5

Add HTTP Request nodes to retrieve structured data from the Conversation API - GET /v1/conversations/{id}/messages for transcript, /topics, /action-items, /questions, and /entities.

6

Add a Code node (JavaScript) to assemble the Symbl.ai output into a clean transcript string. Join message text by speaker, preserving the conversation flow.

7

Add an HTTP Request node to send the transcript to Semarize. POST to https://api.semarize.com/v1/runs, add your API key as a Bearer token, set kit_code, mode to "sync", and map the transcript into input.transcript.

8

Add a Postgres (or MySQL / HTTP Request) node to write the structured Semarize output. Use conversation_id as the primary key for upserts.

9

Activate the workflow. Monitor the first few runs to verify the full pipeline - Symbl.ai processing, result retrieval, Semarize scoring, and database writes.

Watch out for: Use conversation IDs as deduplication keys to prevent reprocessing. If your recording platform fires duplicate webhooks, the first run should store the conversation ID - subsequent runs check for existence before submitting to Symbl.ai again.
Learn more about n8n automation
MakeVisual automation with branching

Recording → Symbl.ai → Make → Semarize → CRM + Slack

Fetch new recordings on a schedule, process through Symbl.ai, retrieve structured intelligence, send to Semarize for scoring, then use a Router to branch the output - alert on risk flags via Slack and write all signals to your CRM.

Example Scenario
Schedule - Every 30 min
Triggers the scenario on interval
Interval: 30 minutes
HTTP - Auth Token
POST /oauth2/token/generate (Symbl.ai)
Body: { type: 'application', appId, appSecret }
HTTP - Submit Recording
POST /v1/process/audio/url (Symbl.ai)
Iterator: for each new recording
Body: { url: {{item.recording_url}} }
Output: conversationId, jobId
Wait for completion
HTTP - Conversation API
GET topics, action-items, messages
GET /v1/conversations/{{conversationId}}/messages
GET /v1/conversations/{{conversationId}}/topics
HTTP - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output
Router - Branch on Risk Flag
Route by Semarize output
Branch 1: IF risk_flag.value = true
Branch 2: ALL (fallthrough)
Branch 1 - Risk detected
Slack - Alert Channel
Notify team about flagged call
Channel: #deal-alerts
Message: Risk on {{conversation_id}}, score: {{score}}
Branch 2 - All calls
Salesforce - Update Record
Write all scored signals to Opportunity
AI Score: {{overall_score}}
Risk Flag: {{risk_flag}}
Topics: {{top_topics}}

Setup steps

1

Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (30 minutes works well for most teams).

2

Add an HTTP module to generate a Symbl.ai access token. POST to https://api.symbl.ai/oauth2/token/generate with your App ID and App Secret.

3

Add an HTTP module to fetch new recordings from your source platform. Use the platform's API to list recordings since the last run timestamp.

4

Add an Iterator module to loop through each recording. For each, add an HTTP module to submit to Symbl.ai's Async API with the recording URL and your processing config.

5

Add a Sleep module (or HTTP polling loop) to wait for Symbl.ai processing. Then add HTTP modules to retrieve topics, messages, and action items from the Conversation API.

6

Add an HTTP module to send the assembled transcript to Semarize. POST to https://api.semarize.com/v1/runs with kit_code, mode: "sync", and input.transcript. Parse the response as JSON.

7

Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).

8

On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and conversation ID into the message.

9

On Branch 2, add a Salesforce module to write all brick values to the Opportunity record. Set the scenario schedule and activate.

Watch out for: Each API call counts as a Make operation. A scenario processing 20 recordings uses ~100+ operations (auth + submit + poll + retrieve + Semarize per recording). Use mode: "sync" for Semarize to avoid additional polling operations.
Learn more about Make automation

What you can build

What You Can Do With Symbl.ai Data in Semarize

Semarize unlocks structured compliance scoring, cross-session trend analysis, custom evaluation frameworks, and the ability to build your own intelligence layers on top of Symbl.ai's conversation output.

Real-Time Quality Gate for AI Agents

AI Response Evaluation & Safety Scoring

What Semarize generates

response_relevance = 0.87hallucination_detected = truetone_appropriate = truesafety_violation = false

Your product team uses Symbl.ai’s streaming API to power an AI voice agent. Every interaction generates a transcript — but are the AI agent’s responses accurate, relevant, and safe? Pipe Symbl.ai’s real-time transcripts into Semarize after each interaction. A quality gate kit scores every AI conversation for response_relevance, hallucination_detected, tone_appropriate, and safety_violation. When the AI agent tells a customer “we offer a full refund within 90 days” but your policy says 30 days, Semarize flags the hallucination with evidence. The AI team gets a daily quality report with scored, actionable signals from every conversation.

Learn more about AI Evaluation
AI Agent Quality GateLast 24h · 3 conversations
conv-0412Agent A
PASS
Relevance0.92
Hallucination
Tone
Safety
conv-0413Agent B
FAIL
Relevance0.87
Hallucination
Tone
Safety
conv-0414Agent A
PASS
Relevance0.95
Hallucination
Tone
Safety
Hallucination Detected in conv-0413: “We offer a full refund within 90 days” — policy states 30 days

AI Agent Knowledge Accuracy Audit

Grounded Verification for Automated Conversations

What Semarize generates

policy_citation_correct = falseproduct_info_accurate = truehallucination_type = "fabricated_policy"accuracy_rate = 0.91

You process AI voice agent conversations through Symbl.ai — but how do you verify the AI agent gave correct information? Run a knowledge-grounded kit against your policy documents and product database on every AI-handled interaction. Semarize checks whether the AI cited the correct return policy, quoted accurate shipping timelines, and referenced real product specifications. After auditing 10,000 AI conversations, you discover the agent fabricates a “90-day satisfaction guarantee” that doesn’t exist in 3% of calls. The structured output feeds back into your AI agent’s training loop to close specific hallucination patterns.

Learn more about AI Evaluation
Unified Deal Signal Summary
Symbl.ai
budget_confirmed=true
Zoom
timeline_set=Q3 2026
Teams
decision_maker=false
Signals feed into unified deal view
Deal Completeness67%
Missing: decision_maker not identified in Teams follow-up

Conversation-Driven Product Feedback Loop

Feature-Level Sentiment & Urgency Scoring

What Semarize generates

feature_satisfaction = 32mention_count = 89churn_threats = 12request_urgency = "critical"

Your support team processes customer calls through Symbl.ai. You need to go beyond topics and action items to quantify product sentiment at the feature level. Run every support transcript through a product feedback kit. Semarize scores each call for feature_satisfaction (per feature mentioned), churn_signal_strength, bug_report_severity, and feature_request_urgency. A monthly product board report shows structured feedback from 500+ calls: “Dashboard loading” has a satisfaction score of 32/100, mentioned in 89 calls, with 12 explicit churn threats. Product prioritisation shifts from gut feel to conversation evidence.

Learn more about Customer Success
Product Feedback Report500+ calls scored
Dashboard loading
32/100
89
CRITICAL
Export functionality
58/100
45
HIGH
Search filters
71/100
34
MEDIUM
Notification system
44/100
62
HIGH
“Dashboard loading” — satisfaction 32/100, mentioned in 89 calls, 12 explicit churn threats

Custom Conversation Analytics Platform

End-to-End Scored Data Pipeline

Vibe-coded

What Semarize generates

pipeline_stages = 4output_format = "typed SQL columns"latency = "< 45s"monthly_volume = 2,400

A data engineer vibe-codes a FastAPI service that chains Symbl.ai processing with Semarize evaluation. Audio files drop into an S3 bucket, a Lambda triggers Symbl.ai’s async API for transcription, then passes the transcript to Semarize for structured scoring. The scored output lands in Snowflake with typed columns: discovery_depth (int), budget_confirmed (bool), competitor_mentioned (varchar), sentiment_score (float). A dbt model aggregates scores by rep, team, and quarter. The BI team builds dashboards on structured, query-ready conversation data — fully typed and ready for analytics at scale.

Learn more about Data Science
Data Pipeline ArchitectureVibe-coded
S3 Bucketaudio
Lambdatrigger
Symbl.aitranscript
Semarizestructured JSON
SnowflakeSQL columns
Latency< 45s
Monthly volume2,400
Outputtyped SQL
Stages4

Watch out for

Common Challenges & Gotchas

These are the issues that come up most often when teams start processing conversations through Symbl.ai at scale.

Async processing is not instant

The Async API queues files for processing. Attempting to retrieve results immediately after submission will return incomplete data. Poll the job status endpoint or use webhook callbacks to know when processing is done.

Access token expiration

Symbl.ai access tokens expire after a set period. Your integration must handle token refresh automatically - failing to do so will cause API calls to return 401 errors mid-pipeline. Cache the token and refresh before expiry.

Concurrent processing limits

Each plan tier has a limit on how many concurrent Async jobs or Streaming connections you can run. Exceeding the limit results in queued or rejected requests. For bulk backfills, implement a job queue with concurrency control.

Tracker configuration drift

Trackers need to be maintained as your business evolves. Competitor names change, product features launch, and compliance language updates. Stale Tracker configurations produce false negatives - regularly audit and update your Tracker vocabulary.

Speaker diarisation accuracy

Speaker separation quality depends on audio quality, microphone setup, and the number of speakers. Overlapping speech and poor-quality audio degrade diarisation accuracy. Validate speaker labels before using them for per-speaker analysis or attribution.

WebSocket connection management

The Streaming API uses WebSocket connections that can drop due to network issues. Implement reconnection logic with state recovery - losing a connection mid-conversation means losing real-time signals unless you have fallback processing via the Async API.

Conversation ID tracking

Every submission to Symbl.ai returns a conversation ID that you need to retrieve results. Losing or failing to store this ID means you cannot access the processed output. Use the conversation ID as your primary key and store it immediately after submission.

FAQ

Frequently Asked Questions

Explore

Explore Semarize