On this page

Intro
What Data You Can Extract
API Access
Key Extraction Flows
Automation Tools
What You Can Build
Challenges & Gotchas
Structured Signals
FAQ
Explore Semarize
Complete the Pipeline
Related Resources

Get Your Data

Outreach - How to Get Your Conversation Data

Outreach Kaia is the conversation layer in the Outreach sales engagement platform. This guide covers the Daily Export feature (the supported path for bulk Kaia recordings and transcripts to S3, SFTP, or Azure Blob), the broader Outreach API for engagement data, and Kaia plan-tier gating.

Quick answer

The primary path for getting Kaia conversation data out of Outreach is Daily Export, which drops recordings, transcripts, and metadata into customer-owned storage (SFTP, AWS S3, or Azure Blob) on a daily schedule. The Outreach API (OAuth) covers broader engagement data, but direct programmatic access to Kaia transcripts is limited - Daily Export is what most teams use. Kaia features depend on your Outreach plan and add-on entitlement.

What you'll learn

What conversation data you can extract from Outreach - Kaia recordings, transcripts, meeting metadata, and sequence context
How to access data via the Outreach REST API - OAuth 2.0 authentication, key endpoints, and pagination
Three extraction patterns: historical backfill with Daily Export, incremental polling, and webhook-triggered
How to connect Outreach data pipelines to Zapier, n8n, and Make
Advanced use cases - sequence quality scoring, rep coaching, compliance monitoring, and pipeline intelligence

Data

What Data You Can Extract From Outreach

Outreach captures more than just the recording. Between Kaia conversation intelligence and the broader sales engagement platform, every interaction produces structured assets that can be extracted via API - recordings, transcripts, sequence activity, prospect engagement, and meeting context.

Common fields teams care about

Full Kaia transcript text

Speaker labels (rep vs. prospect)

Meeting owner / rep name

Recording URL and duration

Meeting date, time, and attendees

Associated prospect and account

Sequence name and step number

Call disposition and outcome

Prospect engagement signals

Associated CRM opportunity IDs

API Access

How to Get Transcripts via the Outreach API

Outreach exposes recordings, transcripts, and sales engagement data through a REST API at api.outreach.io. The workflow is: authenticate via OAuth 2.0, list recordings by date range, then fetch the transcript for each recording ID.

Authenticate with OAuth 2.0

Outreach uses OAuth 2.0 with the authorization code grant flow. Register your application in the Outreach developer portal at developers.outreach.io, obtain your client_id and client_secret, then exchange the authorization code for access and refresh tokens.

POST https://api.outreach.io/oauth/token

{
  "client_id": "<your_client_id>",
  "client_secret": "<your_client_secret>",
  "redirect_uri": "<your_redirect_uri>",
  "grant_type": "authorization_code",
  "code": "<authorization_code>"
}

Access tokens expire (typically after 2 hours). Your integration must automatically refresh tokens using the refresh_token grant type before expiry. Store refresh tokens securely - if one is revoked, the pipeline stops until manual re-authorisation.

List recordings by date range

Call the GET /api/v2/recordings endpoint with filter parameters for your date range. Results are paginated - each response includes pagination links to fetch the next page of results.

GET https://api.outreach.io/api/v2/recordings
    ?filter[createdAt]=2025-01-01..2025-02-01
    &page[size]=50

Authorization: Bearer <access_token>
Content-Type: application/vnd.api+json

The response returns an array of recording objects following JSON:API format, each with id, attributes.createdAt, attributes.duration, and relationship links to the associated meeting, prospect, and user. Keep paginating until the links.next field is null.

Fetch the transcript

For each recording ID, request the transcript content. The Outreach API returns transcript data as structured segments with speaker identification, timestamps, and text.

GET https://api.outreach.io/api/v2/recordings/<recording_id>
    ?include=transcript

Authorization: Bearer <access_token>
Content-Type: application/vnd.api+json

Each transcript segment includes speaker attribution, start and end timestamps, and the text content. Reassemble into plain text by concatenating segments, or preserve the structured format for per-speaker analysis.

Handle rate limits and transcript availability

Rate limits

Outreach enforces rate limits of approximately 10,000 requests per hour. When you receive a 429 response, back off using the Retry-After header. For bulk operations, pace requests and persist your pagination cursor between runs.

Transcript timing

Kaia transcripts are not available the instant a meeting ends. Outreach processes recordings asynchronously - typical lag is 15 minutes to a few hours depending on recording length and system load. Build a buffer into your extraction timing or implement a retry with exponential backoff for recently completed meetings.

Patterns

Key Extraction Flows

There are three practical patterns for getting transcripts out of Outreach. The right choice depends on whether you're doing a one-off migration, running ongoing extraction, or need near real-time processing.

Backfill via Daily Export

One-off migration of historical recordings

Configure a Daily Export in Outreach admin settings to export recordings and transcript data for your desired date range - typically 6-12 months of historical data

Download the exported data files. Daily Export produces structured data sets that can be ingested directly into your data warehouse or processing pipeline

Alternatively, use the API: call GET /api/v2/recordings with date filters. Paginate through the full result set, collecting all recording IDs

For each recording ID, fetch the transcript content. Pace requests to stay within the 10,000 requests/hour rate limit

Store each transcript with its recording metadata (recording ID, date, participants, prospect, sequence context) in your data warehouse or object store

Tip: For large historical exports, the Daily Export feature is more reliable than paginating through the API. It avoids rate limit pressure and produces consistent data snapshots. Use the API for smaller, targeted backfills.

Incremental Polling

Ongoing extraction on a schedule

Set a cron job or scheduled trigger (hourly, daily, etc.) that runs your extraction script

On each run, call GET /api/v2/recordings with filter[createdAt] set to your last successful poll timestamp. Paginate through all results

Fetch transcripts for any new recording IDs returned. Use the recording ID as a deduplication key to avoid reprocessing

Route each transcript and its metadata to your downstream pipeline - analysis tool, warehouse, or automation platform

Update your stored cursor / timestamp to the current run time for the next poll cycle

Tip: Account for Kaia processing delay. A meeting that ended 10 minutes ago may not have a transcript yet. Polling with a 1-2 hour lag reduces empty fetches and avoids wasted API calls.

Webhook-Triggered

Near real-time on recording completion

Register a webhook endpoint in Outreach via the API (POST /api/v2/webhooks). Subscribe to recording-related events so your endpoint fires when Kaia finishes processing a recording

When the webhook fires, parse the event payload to extract the recording ID and associated metadata (meeting, prospect, user)

Fetch the transcript using the recording ID from the webhook event. Add a short delay if needed - the webhook may fire slightly before the transcript is fully processed

Route the transcript and metadata downstream - to your analysis pipeline, CRM updater, or automation tool

Note:Outreach webhooks are configured via API, not through the admin UI. You'll need to manage webhook subscriptions programmatically. Ensure your endpoint responds with a 200 status quickly - Outreach may retry or disable webhooks that consistently time out.

Automation

Send Outreach Transcripts to Automation Tools

Once you can extract transcripts from Outreach, the next step is routing them through Semarize for structured analysis and into your downstream systems. Below are end-to-end example flows - each showing the full pipeline from Outreach trigger through Semarize evaluation to CRM, Slack, or database output.

ZapierNo-code automation

Outreach → Zapier → Semarize → CRM

Detect new Outreach Kaia recordings, fetch the transcript, send it to Semarize for structured analysis, then write the scored output - signals, flags, and evidence - directly to your CRM.

Example Zap

Trigger: Webhook (Outreach)

Fires when Outreach sends a recording event

App: Webhooks by Zapier

Event: Catch Hook

Source: Outreach webhook

Webhooks by Zapier

Fetch transcript from Outreach API

Method: GET

URL: https://api.outreach.io/api/v2/recordings/{{id}}?include=transcript

Auth: Bearer (OAuth access_token)

Transcript returned

Webhooks by Zapier

POST /v1/runs (sync) to Semarize

Method: POST

URL: https://api.semarize.com/v1/runs

Auth: Bearer smz_live_...

Body: { kit_code, mode: "sync", input: { transcript } }

Structured output returned

Formatter by Zapier

Extract brick values from Semarize response

Extract: bricks.overall_score.value

Extract: bricks.risk_flag.value

Extract: bricks.coaching_priority.value

Salesforce - Update Record

Write scored signals to Opportunity

Object: Opportunity

AI Score: {{overall_score}}

Risk Flag: {{risk_flag}}

Coaching Priority: {{coaching_priority}}

Setup steps

Create a new Zap. Choose "Webhooks by Zapier" as the trigger and select "Catch Hook". Copy the webhook URL and register it in Outreach via POST /api/v2/webhooks to subscribe to recording events.

Add a "Webhooks by Zapier" Action (Custom Request) to fetch the transcript from Outreach. Set method to GET, URL to https://api.outreach.io/api/v2/recordings/{{id}}?include=transcript, and add your OAuth Bearer token.

Add a second "Webhooks by Zapier" Action. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your Semarize API key as a Bearer token. In the body, set kit_code to your Kit, mode to "sync", and map the transcript text into input.transcript.

Add a Formatter step to extract individual brick values from the Semarize JSON response - overall_score, risk_flag, coaching_priority, etc.

Add a Salesforce (or HubSpot, Sheets, etc.) Action to write the extracted scores and signals to your CRM record.

Test each step end-to-end, then turn on the Zap.

Watch out for: Outreach OAuth tokens expire every 2 hours. Zapier's built-in OAuth connection handles refresh automatically, but if you're using custom webhook requests, you'll need to manage token refresh yourself. Use mode: "sync" so Semarize returns results inline.

Learn more about Zapier automation

n8nSelf-hosted workflows

Outreach → n8n → Semarize → Database

Poll Outreach for new Kaia recordings on a schedule, fetch transcripts, send each one to Semarize for analysis, then write the structured scores and signals to your database. n8n's native loop support handles pagination and batch processing.

Example Workflow

Cron - Every Hour

Triggers the workflow on schedule

Mode: Every Hour

Timezone: UTC

HTTP Request - List Recordings

GET /api/v2/recordings (Outreach)

Method: GET

URL: https://api.outreach.io/api/v2/recordings

Auth: OAuth 2.0 Bearer

Filter: createdAt > {{$now.minus(1, 'hour')}}

For each recording ID

HTTP Request - Fetch Transcript

GET /api/v2/recordings/:id (Outreach)

URL: /api/v2/recordings/{{$json.id}}?include=transcript

Code - Reassemble Transcript

Concatenate segments into plain text

Join: segments[].text by speaker

HTTP Request - Semarize

POST /v1/runs (sync)

URL: https://api.semarize.com/v1/runs

Auth: Bearer smz_live_...

Body: { kit_code, mode: "sync", input: { transcript } }

Scores & signals returned

Postgres - Insert Row

Write structured output to database

Table: call_evaluations

Columns: recording_id, score, risk_flag, coaching_priority

Setup steps

Add a Cron node as the workflow trigger. Set the interval to your desired polling frequency (hourly works well for most teams).

Add an HTTP Request node to list new recordings from Outreach. Set method to GET, URL to https://api.outreach.io/api/v2/recordings, configure OAuth 2.0 auth, and filter by createdAt since last poll.

Add a Split In Batches node to iterate over the returned recording IDs. Inside the loop, add an HTTP Request node to fetch each transcript via GET /api/v2/recordings/:id?include=transcript.

Add a Code node (JavaScript) to reassemble the transcript segments into a single transcript string. Join each segment's text, prefixed by speaker name.

Add another HTTP Request node to send the transcript to Semarize. Set method to POST, URL to https://api.semarize.com/v1/runs. Add your API key as a Bearer token. Set kit_code, mode to "sync", and map the transcript into input.transcript.

Add a Code node to extract the brick values from the Semarize response - overall_score, risk_flag, coaching_priority, evidence, confidence.

Add a Postgres (or MySQL / HTTP Request) node to write the structured output. Use recording_id as the primary key for upserts.

Activate the workflow. Monitor the first few runs to verify Semarize responses are arriving and writing correctly.

Watch out for:n8n's OAuth 2.0 credential type handles token refresh automatically. Use recording IDs as deduplication keys to prevent reprocessing. You can also use async mode with n8n's native loop - POST /v1/runs (default async), then poll GET /v1/runs/:runId with a Wait + IF loop until status is "succeeded".

Learn more about n8n automation

MakeVisual automation with branching

Outreach → Make → Semarize → CRM + Slack

Fetch new Outreach Kaia transcripts on a schedule, send each to Semarize for structured analysis, then use a Router to branch the scored output - alert on risk flags via Slack and write all signals to your CRM.

Example Scenario

Schedule - Every 30 min

Triggers the scenario on interval

Interval: 30 minutes

HTTP - List New Recordings

GET /api/v2/recordings (Outreach)

Method: GET

Auth: OAuth 2.0 Bearer

Filter: createdAt > {{formatDate(...)}}

HTTP - Fetch Transcript

GET /api/v2/recordings/:id (per recording)

Iterator: for each recording in response

Include: transcript

HTTP - Semarize

POST /v1/runs (sync)

URL: https://api.semarize.com/v1/runs

Auth: Bearer smz_live_...

Body: { kit_code, mode: "sync", input: { transcript } }

Structured output

Router - Branch on Risk Flag

Route by Semarize output

Branch 1: IF risk_flag.value = true

Branch 2: ALL (fallthrough)

Branch 1 - Risk detected

Slack - Alert Channel

Notify team about flagged recording

Channel: #deal-alerts

Message: Risk on {{recording_id}}, score: {{score}}

Branch 2 - All recordings

Salesforce - Update Record

Write all scored signals to Opportunity

AI Score: {{overall_score}}

Risk Flag: {{risk_flag}}

Coaching Priority: {{coaching_priority}}

Setup steps

Create a new Scenario. Add a Schedule module as the trigger, set to your desired interval (15-60 minutes is typical).

Add an HTTP module to list new recordings from Outreach. Set method to GET, URL to https://api.outreach.io/api/v2/recordings, configure OAuth 2.0 auth, and filter by createdAt since the last run.

Add an Iterator module to loop through each recording. For each, add an HTTP module to fetch the transcript via GET /api/v2/recordings/:id?include=transcript.

Add another HTTP module to send the transcript to Semarize. Set URL to https://api.semarize.com/v1/runs, add your Bearer token, and set kit_code, mode to "sync", and input.transcript from the previous step. Parse the response as JSON.

Add a Router module. Define Branch 1 with a filter: bricks.risk_flag.value equals true. Leave Branch 2 as a fallthrough (no filter).

On Branch 1, add a Slack module to alert your team when risk is detected. Map the score, risk flag, and recording ID into the message.

On Branch 2, add a Salesforce module to write all brick values (score, risk_flag, coaching_priority) to the Opportunity record.

Set the scenario schedule and activate. Monitor the first few runs in Make's execution log.

Watch out for: Each API call counts as an operation. A scenario processing 50 recordings uses ~150 operations (list + transcript + Semarize per recording). Make's built-in Outreach module handles OAuth refresh, but custom HTTP modules require manual token management. Use mode: "sync" to avoid needing a polling loop.

Learn more about Make automation

What you can build

What You Can Do With Outreach Data in Semarize

Semarize unlocks custom conversation scoring, cross-sequence quality analysis, compliance monitoring at scale, and the ability to build your own tools on structured conversation signals from Outreach.

SDR Discovery Quality Scoring

Conversation-Level Rep Performance

What Semarize generates

discovery_depth = 84pain_identified = truenext_step_quality = 0.91meeting_to_pipeline_rate = 38%

Outreach tracks activity metrics - emails sent, calls made, meetings booked - but it cannot score what actually happened on the call. Semarize scores every Kaia transcript for discovery depth, pain identification, and next-step quality, then correlates each score with downstream conversion. After scoring 200 calls, the data shows reps above a 70 discovery_depth convert meetings to pipeline at 3x the rate of reps below it. Managers see per-rep scorecards with conversion correlation - coaching shifts from gut feel to evidence. Reps who improve discovery scores by 15 points see pipeline conversion climb within weeks, not quarters.

Learn more about QA & Compliance

SDR Discovery Scorecard200 calls scored

Sarah K.

Conv: 38%+12%

Discovery

James M.

Conv: 14%-5%

Discovery

Priya R.

Conv: 42%+8%

Discovery

Reps above 70 discovery_depth convert 3x more meetings to pipeline

Custom Regulatory Disclosure Scoring

Evidence-Backed Compliance

What Semarize generates

disclosure_sequence_correct = falseconsent_language_verbatim = trueprohibited_claim = truecompliance_evidence_package = "generated"

Your compliance team needs to verify that every outbound call follows your specific regulatory disclosure policy — correct sequence, correct phrasing, within the required timeframe. Run every call transcript through a compliance kit grounded against your regulatory policy document. Semarize checks whether disclosures were delivered in the correct order, whether consent language matched the approved verbatim script, and whether any prohibited claims were made. Every call generates a structured evidence package that maps directly to your audit filing template. Audit prep drops from 3 weeks to 3 days because the evidence is structured, searchable, and already formatted.

Learn more about QA & Compliance

Compliance Audit - Weekly Digest100% coverage

Call #1042Beta feature claim

fail

"...our AI auto-dialer ships next month"

Call #1038Disclosure on pricing

pass

Disclosure delivered at 2:14

Call #1035Competitor pricing ref

pass

No competitor pricing mentioned

1 prohibited claim flagged - sent to legal review

Structured Call Signal Pipeline to Warehouse

Typed Conversation Data for BI

What Semarize generates

pain_category = "operational_cost"urgency_level = 0.78stakeholder_role = "vp_ops"output_format = "typed_sql_row"

Outreach gives you activity data and Kaia gives you recordings — but neither gives you structured, queryable fields from what was actually said. Run every Kaia transcript through Semarize and land typed rows in your warehouse: pain_category, urgency_level, stakeholder_role, buying_stage as real columns. Your BI team joins conversation signals with Outreach sequence data and CRM pipeline data in dbt. For the first time, you can answer “which pain categories convert fastest from which sequences?” with a SQL query instead of listening to 50 recordings.

Learn more about Data Science

Prospect Signal Aggregation

Cold Outbound

Pain identified

Nurture Track

Budget confirmed

Event Follow-up

Timeline set

Unified Prospect View

Budget confirmed

Timeline: Q3 2026

Decision-maker

3 touchpoints

Conversation-Powered Sequence Optimization

Data-Driven Sequence Design

Vibe-coded

What Semarize generates

avg_discovery_score = 71best_step = "step_5"conversion_correlation = 0.74engagement_dropoff = "step_3"

A revenue analyst vibe-codes a Streamlit dashboard that correlates Semarize scores from Outreach Kaia transcripts with downstream conversion data from the CRM. The dashboard shows which sequence steps produce the highest-scoring discovery calls, which talk tracks correlate with closed-won deals, and where in the sequence prospects disengage. It's not about open rates or reply rates — it's about conversation quality at each stage. The team restructures their top sequence based on the data: moving the value-prop call from step 3 to step 5 increases pipeline conversion by 18%.

Learn more about Data Science

Sequence Quality vs. ConversionVibe-coded

Step 1Intro call

5812%

Step 2Pain discovery

6518%

Step 3Value prop

499%

Step 4Case study

7224%

Step 5Value prop (moved)

8131%

Recommendation: Move value-prop call from step 3 to step 5 - projected +18% pipeline conversion

Watch out for

Common Challenges & Gotchas

These are the issues that come up most often when teams start extracting transcripts from Outreach at scale.

OAuth 2.0 token management

Outreach uses OAuth 2.0 with short-lived access tokens. Your integration must handle token refresh automatically - if the refresh token expires or is revoked, the entire pipeline stops until re-authorised manually.

Kaia transcript availability delay

Kaia processes recordings asynchronously after a meeting ends. Attempting to fetch a transcript too soon returns empty or incomplete data. Build in a delay of at least 30 minutes or implement a retry mechanism with exponential backoff.

API rate limits

The Outreach API enforces rate limits (typically ~10,000 requests/hour). Exceeding limits results in 429 responses. Implement exponential backoff and pace bulk operations carefully, especially during historical backfills.

Pagination across large datasets

Outreach API responses are paginated with cursor-based navigation. Track your cursor position carefully - losing a cursor mid-backfill means re-scanning from the start. The Daily Export feature is often more practical for very large historical extractions.

Recording availability depends on Kaia being enabled

Not all Outreach meetings produce Kaia recordings. Kaia must be enabled for the user, the meeting must be a supported type (e.g., video call), and the bot must successfully join. Check recording availability before attempting transcript extraction.

Mapping recordings to sequences and prospects

Recordings, sequences, and prospects live in separate API resources. Joining them requires multiple API calls and careful ID mapping. Plan your data model to link recording IDs to sequence steps and prospect records.

Duplicate processing protection

Without idempotency checks, re-running an extraction flow can process the same recording twice. Use recording IDs as deduplication keys to ensure each transcript is handled exactly once in your pipeline.

Structured signals

Example structured signals from Outreach Kaia recordings

Outreach Kaia gives you the recording and transcript. Semarize reads them and extracts the sales-meaningful fields - second stakeholder identified, ROI ask, technical objection - that RevOps wants pushed into the Salesforce opportunity, not buried in a transcript blob. Example: one buyer-committee discovery call.

Raw Outreach Kaia transcript snippet

AE: Can you walk me through who is involved in the buying decision?
Prospect: I own the budget, but the head of RevOps owns the tools stack. He'll need to greenlight any contract.
AE: Got it - happy to bring him in on the next call.

Structured signal output

{
  "source": {
    "tool": "Outreach",
    "ref": "outreach_kaia_call_71be09"
  },
  "signals": [
    {
      "signal_type": "next_step",
      "value": "Loop head of RevOps into next call as tools-stack approver",
      "confidence": 0.96
    },
    {
      "signal_type": "qualification_score",
      "value": 0.78,
      "confidence": 0.83
    },
    {
      "signal_type": "pain_identified",
      "value": "Tools stack decision requires RevOps lead approval",
      "confidence": 0.84
    },
    {
      "signal_type": "risk_flag",
      "value": "Two-stakeholder approval - tools-stack owner not yet engaged",
      "confidence": 0.81
    }
  ]
}

FAQ

Frequently Asked Questions

Explore

Pipeline

Complete the pipeline

Outreach captures the call. Semarize translates the transcript into Salesforce-shaped signals; Clay routes them through your enrichment and outreach automations.

Source - you are here

Outreach

Get the raw conversation data

Destination

Salesforce

Push signals into Salesforce records

Explore

Automation

Clay

Trigger workflows from the signals

Explore

Outreach - How to Get Your Conversation Data

What Data You Can Extract From Outreach

How to Get Transcripts via the Outreach API

Authenticate with OAuth 2.0

List recordings by date range

Fetch the transcript

Handle rate limits and transcript availability

Key Extraction Flows

Backfill via Daily Export

Incremental Polling

Webhook-Triggered

Send Outreach Transcripts to Automation Tools

Outreach → Zapier → Semarize → CRM

Setup steps

Outreach → n8n → Semarize → Database

Setup steps

Outreach → Make → Semarize → CRM + Slack

Setup steps

What You Can Do With Outreach Data in Semarize

SDR Discovery Quality Scoring

Custom Regulatory Disclosure Scoring

Structured Call Signal Pipeline to Warehouse

Conversation-Powered Sequence Optimization

Common Challenges & Gotchas

Example structured signals from Outreach Kaia recordings

Frequently Asked Questions

Explore Semarize

Get Started

Developer Quickstart

Pricing

How It Works

Semarize API

Bricks

Kits

Developer Hub

Automation Patterns

Complete the pipeline

Outreach

Salesforce

Clay

Outreach - How to Get Your Conversation Data

What Data You Can Extract From Outreach

How to Get Transcripts via the Outreach API

Authenticate with OAuth 2.0

List recordings by date range

Fetch the transcript

Handle rate limits and transcript availability

Key Extraction Flows

Backfill via Daily Export

Incremental Polling

Webhook-Triggered

Send Outreach Transcripts to Automation Tools

Outreach → Zapier → Semarize → CRM

Setup steps

Outreach → n8n → Semarize → Database

Setup steps

Outreach → Make → Semarize → CRM + Slack

Setup steps

What You Can Do With Outreach Data in Semarize

SDR Discovery Quality Scoring

Custom Regulatory Disclosure Scoring

Structured Call Signal Pipeline to Warehouse

Conversation-Powered Sequence Optimization

Common Challenges & Gotchas

Example structured signals from Outreach Kaia recordings

Frequently Asked Questions

Explore Semarize

Get Started

Developer Quickstart

Pricing

How It Works

Semarize API

Bricks

Kits

Developer Hub

Automation Patterns

Complete the pipeline

Outreach

Salesforce

Clay

Related Resources

Get Your Data

Automation

CRM & Data

Playbooks

Blog