Semarize

Get Your Data

Outreach - How to Get Your Conversation Data

A practical guide to getting your conversation data out of Outreach - covering Kaia recordings, OAuth API access, transcript extraction, Daily Export, webhook-triggered flows, and how to route structured data into your downstream systems.

What you'll learn

  • What conversation data you can extract from Outreach - Kaia recordings, transcripts, meeting metadata, and sequence context
  • How to access data via the Outreach REST API - OAuth 2.0 authentication, key endpoints, and pagination
  • Three extraction patterns: historical backfill with Daily Export, incremental polling, and webhook-triggered
  • How to connect Outreach data pipelines to Zapier, n8n, and Make
  • Advanced use cases - sequence quality scoring, rep coaching, compliance monitoring, and pipeline intelligence

Data

What Data You Can Extract From Outreach

Outreach captures more than just the recording. Between Kaia conversation intelligence and the broader sales engagement platform, every interaction produces structured assets that can be extracted via API - recordings, transcripts, sequence activity, prospect engagement, and meeting context.

Common fields teams care about

Full Kaia transcript text
Speaker labels (rep vs. prospect)
Meeting owner / rep name
Recording URL and duration
Meeting date, time, and attendees
Associated prospect and account
Sequence name and step number
Call disposition and outcome
Prospect engagement signals
Associated CRM opportunity IDs

API Access

How to Get Transcripts via the Outreach API

Outreach exposes recordings, transcripts, and sales engagement data through a REST API at api.outreach.io. The workflow is: authenticate via OAuth 2.0, list recordings by date range, then fetch the transcript for each recording ID.

1

Authenticate with OAuth 2.0

Outreach uses OAuth 2.0 with the authorization code grant flow. Register your application in the Outreach developer portal at developers.outreach.io, obtain your client_id and client_secret, then exchange the authorization code for access and refresh tokens.

POST https://api.outreach.io/oauth/token

{
  "client_id": "<your_client_id>",
  "client_secret": "<your_client_secret>",
  "redirect_uri": "<your_redirect_uri>",
  "grant_type": "authorization_code",
  "code": "<authorization_code>"
}
Access tokens expire (typically after 2 hours). Your integration must automatically refresh tokens using the refresh_token grant type before expiry. Store refresh tokens securely - if one is revoked, the pipeline stops until manual re-authorisation.
2

List recordings by date range

Call the GET /api/v2/recordings endpoint with filter parameters for your date range. Results are paginated - each response includes pagination links to fetch the next page of results.

GET https://api.outreach.io/api/v2/recordings
    ?filter[createdAt]=2025-01-01..2025-02-01
    &page[size]=50

Authorization: Bearer <access_token>
Content-Type: application/vnd.api+json

The response returns an array of recording objects following JSON:API format, each with id, attributes.createdAt, attributes.duration, and relationship links to the associated meeting, prospect, and user. Keep paginating until the links.next field is null.

3

Fetch the transcript

For each recording ID, request the transcript content. The Outreach API returns transcript data as structured segments with speaker identification, timestamps, and text.

GET https://api.outreach.io/api/v2/recordings/<recording_id>
    ?include=transcript

Authorization: Bearer <access_token>
Content-Type: application/vnd.api+json

Each transcript segment includes speaker attribution, start and end timestamps, and the text content. Reassemble into plain text by concatenating segments, or preserve the structured format for per-speaker analysis.

4

Handle rate limits and transcript availability

Rate limits

Outreach enforces rate limits of approximately 10,000 requests per hour. When you receive a 429 response, back off using the Retry-After header. For bulk operations, pace requests and persist your pagination cursor between runs.

Transcript timing

Kaia transcripts are not available the instant a meeting ends. Outreach processes recordings asynchronously - typical lag is 15 minutes to a few hours depending on recording length and system load. Build a buffer into your extraction timing or implement a retry with exponential backoff for recently completed meetings.

Patterns

Key Extraction Flows

There are three practical patterns for getting transcripts out of Outreach. The right choice depends on whether you're doing a one-off migration, running ongoing extraction, or need near real-time processing.

Backfill via Daily Export

One-off migration of historical recordings

1

Configure a Daily Export in Outreach admin settings to export recordings and transcript data for your desired date range - typically 6-12 months of historical data

2

Download the exported data files. Daily Export produces structured data sets that can be ingested directly into your data warehouse or processing pipeline

3

Alternatively, use the API: call GET /api/v2/recordings with date filters. Paginate through the full result set, collecting all recording IDs

4

For each recording ID, fetch the transcript content. Pace requests to stay within the 10,000 requests/hour rate limit

5

Store each transcript with its recording metadata (recording ID, date, participants, prospect, sequence context) in your data warehouse or object store

Tip: For large historical exports, the Daily Export feature is more reliable than paginating through the API. It avoids rate limit pressure and produces consistent data snapshots. Use the API for smaller, targeted backfills.

Incremental Polling

Ongoing extraction on a schedule

1

Set a cron job or scheduled trigger (hourly, daily, etc.) that runs your extraction script

2

On each run, call GET /api/v2/recordings with filter[createdAt] set to your last successful poll timestamp. Paginate through all results

3

Fetch transcripts for any new recording IDs returned. Use the recording ID as a deduplication key to avoid reprocessing

4

Route each transcript and its metadata to your downstream pipeline - analysis tool, warehouse, or automation platform

5

Update your stored cursor / timestamp to the current run time for the next poll cycle

Tip: Account for Kaia processing delay. A meeting that ended 10 minutes ago may not have a transcript yet. Polling with a 1-2 hour lag reduces empty fetches and avoids wasted API calls.

Webhook-Triggered

Near real-time on recording completion

1

Register a webhook endpoint in Outreach via the API (POST /api/v2/webhooks). Subscribe to recording-related events so your endpoint fires when Kaia finishes processing a recording

2

When the webhook fires, parse the event payload to extract the recording ID and associated metadata (meeting, prospect, user)

3

Fetch the transcript using the recording ID from the webhook event. Add a short delay if needed - the webhook may fire slightly before the transcript is fully processed

4

Route the transcript and metadata downstream - to your analysis pipeline, CRM updater, or automation tool

Note: Outreach webhooks are configured via API, not through the admin UI. You'll need to manage webhook subscriptions programmatically. Ensure your endpoint responds with a 200 status quickly - Outreach may retry or disable webhooks that consistently time out.

Automation

Send Outreach Transcripts to Automation Tools

Once you can extract transcripts from Outreach, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from Outreach trigger through Semarize evaluation to CRM, Slack, or database output.

ZapierNo-code automation

Outreach → Zapier → Semarize → CRM

Detect new Outreach Kaia recordings, fetch the transcript, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.

Example Zap
Trigger: Webhook (Outreach)
Fires when Outreach sends a recording event
App: Webhooks by Zapier
Event: Catch Hook
Source: Outreach webhook
Webhooks by Zapier
Fetch transcript from Outreach API
Method: GET
URL: https://api.outreach.io/api/v2/recordings/{{id}}?include=transcript
Auth: Bearer (OAuth access_token)
Transcript returned
Webhooks by Zapier
POST /v1/runs (sync) to Semarize
Method: POST
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output returned
Formatter by Zapier
Extract brick values from Semarize response
Extract: bricks.overall_score.value
Extract: bricks.risk_flag.value
Extract: bricks.coaching_priority.value
Salesforce - Update Record
Write scored signals to Opportunity
Object: Opportunity
AI Score: {{overall_score}}
Risk Flag: {{risk_flag}}
Coaching Priority: {{coaching_priority}}

Setup steps

1

Create a new Zap. Choose "Webhooks by Zapier" as the trigger and select "Catch Hook". Copy the webhook URL and register it in Outreach via POST /api/v2/webhooks to subscribe to recording events.

2

Add a "Webhooks by Zapier" Action (Custom Request) to fetch the transcript from Outreach. Set method to GET, URL to https://api.outreach.io/api/v2/recordings/{{id}}?include=transcript, and add your OAuth Bearer token.

3

Add a second "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the transcript text into input.transcript.

4

Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, risk_flag, coaching_priority, etc.

5

Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record.

6

Test each step end-to-end, then turn on the Zap.

Watch out for: Outreach OAuth tokens expire every 2 hours. Zapier's built-in OAuth connection handles refresh automatically, but if you're using custom webhook requests, you'll need to manage token refresh yourself. Use mode: "sync" so Semarize returns results inline.
Learn more about Zapier automation
n8nSelf-hosted workflows

Outreach → n8n → Semarize → Database

Poll Outreach for new Kaia recordings on a schedule, fetch transcripts, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles pagination and batch processing.

Example Workflow
Cron - Every Hour
Triggers the workflow on schedule
Mode: Every Hour
Timezone: UTC
HTTP Request - List Recordings
GET /api/v2/recordings (Outreach)
Method: GET
URL: https://api.outreach.io/api/v2/recordings
Auth: OAuth 2.0 Bearer
Filter: createdAt > {{$now.minus(1, 'hour')}}
For each recording ID
HTTP Request - Fetch Transcript
GET /api/v2/recordings/:id (Outreach)
URL: /api/v2/recordings/{{$json.id}}?include=transcript
Code - Reassemble Transcript
Concatenate segments into plain text
Join: segments[].text by speaker
HTTP Request - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Scores & signals returned
Postgres - Insert Row
Write structured output to database
Table: call_evaluations
Columns: recording_id, score, risk_flag, coaching_priority

Setup steps

1

Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).

2

Add an HTTP Request node to list new recordings from Outreach. Set method to GET, URL to https://api.outreach.io/api/v2/recordings, configure OAuth 2.0 auth, and filter by createdAt since last poll.

3

Add a Split In Batches node to iterate over the returned recording IDs. Inside the loop, add an HTTP Request node to fetch each transcript via GET /api/v2/recordings/:id?include=transcript.

4

Add a Code node (JavaScript) to reassemble the transcript segments into a single transcript string. Join each segment's text, prefixed by speaker name.

5

Add another HTTP Request node to send the transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.

6

Add a Code node to extract the brick values from the Semarize response - overall_score, risk_flag, coaching_priority, evidence, confidence.

7

Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use recording_id as the primary key for upserts.

8

Activate the workflow. Monitor the first few runs to verify Semarize responses are arriving and writing correctly.

Watch out for: n8n's OAuth 2.0 credential type handles token refresh automatically. Use recording IDs as deduplication keys to prevent reprocessing. You can also use async mode with n8n's native loop - POST /v1/runs (default async), then poll GET /v1/runs/:runId with a Wait + IF loop until status is "succeeded".
Learn more about n8n automation
MakeVisual automation with branching

Outreach → Make → Semarize → CRM + Slack

Fetch new Outreach Kaia transcripts on a schedule, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals to your CRM.

Example Scenario
Schedule - Every 30 min
Triggers the scenario on interval
Interval: 30 minutes
HTTP - List New Recordings
GET /api/v2/recordings (Outreach)
Method: GET
Auth: OAuth 2.0 Bearer
Filter: createdAt > {{formatDate(...)}}
HTTP - Fetch Transcript
GET /api/v2/recordings/:id (per recording)
Iterator: for each recording in response
Include: transcript
HTTP - Semarize
POST /v1/runs (sync)
URL: https://api.semarize.com/v1/runs
Auth: Bearer smz_live_...
Body: { kit_code, mode: "sync", input: { transcript } }
Structured output
Router - Branch on Risk Flag
Route by Semarize output
Branch 1: IF risk_flag.value = true
Branch 2: ALL (fallthrough)
Branch 1 - Risk detected
Slack - Alert Channel
Notify team about flagged recording
Channel: #deal-alerts
Message: Risk on {{recording_id}}, score: {{score}}
Branch 2 - All recordings
Salesforce - Update Record
Write all scored signals to Opportunity
AI Score: {{overall_score}}
Risk Flag: {{risk_flag}}
Coaching Priority: {{coaching_priority}}

Setup steps

1

Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15-60 minutes is typical).

2

Add an HTTP module to list new recordings from Outreach. Set method to GET, URL to https://api.outreach.io/api/v2/recordings, configure OAuth 2.0 auth, and filter by createdAt since the last run.

3

Add an Iterator module to loop through each recording. For each, add an HTTP module to fetch the transcript via GET /api/v2/recordings/:id?include=transcript.

4

Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the previous step. Parse the response as JSON.

5

Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).

6

On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and recording ID into the message.

7

On Branch 2, add a Salesforce module to write all brick values (score, risk_flag, coaching_priority) to the Opportunity record.

8

Set the scenario schedule and activate. Monitor the first few runs in Make's execution log.

Watch out for: Each API call counts as an operation. A scenario processing 50 recordings uses ~150 operations (list + transcript + Semarize per recording). Make's built-in Outreach module handles OAuth refresh, but custom HTTP modules require manual token management. Use mode: "sync" to avoid needing a polling loop.
Learn more about Make automation

What you can build

What You Can Do With Outreach Data in Semarize

Semarize unlocks custom conversation scoring, cross-sequence quality analysis, compliance monitoring at scale, and the ability to build your own tools on structured conversation signals from Outreach.

Knowledge-Grounded Product Claim Verification

Factual Accuracy Scoring

What Semarize generates

feature_claim_accurate = falsepricing_reference_correct = trueintegration_overstated = trueknowledge_gap_area = "api_limits"

Your SDRs make product claims on every booked call — feature capabilities, integration support, pricing references. Are they accurate? Run a knowledge-grounded kit against your current product documentation on every call transcript. Semarize checks whether feature claims match the current product spec, whether pricing references are up to date, and whether integration capabilities were overstated. After scoring 200 calls, the data shows reps consistently overstate API rate limits and misquote the enterprise tier’s SSO configuration. Product marketing gets a weekly accuracy report — messaging corrections happen within days, not quarters.

Learn more about QA & Compliance
SDR Discovery Scorecard200 calls scored
Sarah K.
Conv: 38%+12%
Discovery
84
James M.
Conv: 14%-5%
Discovery
59
Priya R.
Conv: 42%+8%
Discovery
91
Reps above 70 discovery_depth convert 3x more meetings to pipeline

Custom Regulatory Disclosure Scoring

Evidence-Backed Compliance

What Semarize generates

disclosure_sequence_correct = falseconsent_language_verbatim = trueprohibited_claim = truecompliance_evidence_package = "generated"

Your compliance team needs to verify that every outbound call follows your specific regulatory disclosure policy — correct sequence, correct phrasing, within the required timeframe. Run every call transcript through a compliance kit grounded against your regulatory policy document. Semarize checks whether disclosures were delivered in the correct order, whether consent language matched the approved verbatim script, and whether any prohibited claims were made. Every call generates a structured evidence package that maps directly to your audit filing template. Audit prep drops from 3 weeks to 3 days because the evidence is structured, searchable, and already formatted.

Learn more about QA & Compliance
Compliance Audit - Weekly Digest100% coverage
Call #1042Beta feature claim
fail
"...our AI auto-dialer ships next month"
Call #1038Disclosure on pricing
pass
Disclosure delivered at 2:14
Call #1035Competitor pricing ref
pass
No competitor pricing mentioned
1 prohibited claim flagged - sent to legal review

Structured Call Signal Pipeline to Warehouse

Typed Conversation Data for BI

What Semarize generates

pain_category = "operational_cost"urgency_level = 0.78stakeholder_role = "vp_ops"output_format = "typed_sql_row"

Outreach gives you activity data and Kaia gives you recordings — but neither gives you structured, queryable fields from what was actually said. Run every Kaia transcript through Semarize and land typed rows in your warehouse: pain_category, urgency_level, stakeholder_role, buying_stage as real columns. Your BI team joins conversation signals with Outreach sequence data and CRM pipeline data in dbt. For the first time, you can answer “which pain categories convert fastest from which sequences?” with a SQL query instead of listening to 50 recordings.

Learn more about Data Science
Prospect Signal Aggregation
Cold Outbound
Pain identified
Nurture Track
Budget confirmed
Event Follow-up
Timeline set
Unified Prospect View
Budget confirmed
Timeline: Q3 2026
Decision-maker
3 touchpoints

Conversation-Powered Sequence Optimization

Data-Driven Sequence Design

Vibe-coded

What Semarize generates

avg_discovery_score = 71best_step = "step_5"conversion_correlation = 0.74engagement_dropoff = "step_3"

A revenue analyst vibe-codes a Streamlit dashboard that correlates Semarize scores from Outreach Kaia transcripts with downstream conversion data from the CRM. The dashboard shows which sequence steps produce the highest-scoring discovery calls, which talk tracks correlate with closed-won deals, and where in the sequence prospects disengage. It's not about open rates or reply rates — it's about conversation quality at each stage. The team restructures their top sequence based on the data: moving the value-prop call from step 3 to step 5 increases pipeline conversion by 18%.

Learn more about Data Science
Sequence Quality vs. ConversionVibe-coded
Step 1Intro call
5812%
Step 2Pain discovery
6518%
Step 3Value prop
499%
Step 4Case study
7224%
Step 5Value prop (moved)
8131%
Recommendation: Move value-prop call from step 3 to step 5 - projected +18% pipeline conversion

Watch out for

Common Challenges & Gotchas

These are the issues that come up most often when teams start extracting transcripts from Outreach at scale.

OAuth 2.0 token management

Outreach uses OAuth 2.0 with short-lived access tokens. Your integration must handle token refresh automatically - if the refresh token expires or is revoked, the entire pipeline stops until re-authorised manually.

Kaia transcript availability delay

Kaia processes recordings asynchronously after a meeting ends. Attempting to fetch a transcript too soon returns empty or incomplete data. Build in a delay of at least 30 minutes or implement a retry mechanism with exponential backoff.

API rate limits

The Outreach API enforces rate limits (typically ~10,000 requests/hour). Exceeding limits results in 429 responses. Implement exponential backoff and pace bulk operations carefully, especially during historical backfills.

Pagination across large datasets

Outreach API responses are paginated with cursor-based navigation. Track your cursor position carefully - losing a cursor mid-backfill means re-scanning from the start. The Daily Export feature is often more practical for very large historical extractions.

Recording availability depends on Kaia being enabled

Not all Outreach meetings produce Kaia recordings. Kaia must be enabled for the user, the meeting must be a supported type (e.g., video call), and the bot must successfully join. Check recording availability before attempting transcript extraction.

Mapping recordings to sequences and prospects

Recordings, sequences, and prospects live in separate API resources. Joining them requires multiple API calls and careful ID mapping. Plan your data model to link recording IDs to sequence steps and prospect records.

Duplicate processing protection

Without idempotency checks, re-running an extraction flow can process the same recording twice. Use recording IDs as deduplication keys to ensure each transcript is handled exactly once in your pipeline.

FAQ

Frequently Asked Questions

Explore

Explore Semarize