CRM & Data
Azure Synapse — Conversation Analytics in the Microsoft Stack
Azure Synapse Analytics unifies data warehousing, data lake analytics, and data integration in one service. If your organization runs on Microsoft — Dynamics 365, Power BI, Teams, ADLS — Synapse is where conversation intelligence slots into the existing stack without adding another vendor.
Recommended default pattern
ADLS Gen2 staging + COPY INTO dedicated SQL pool. Land files in Azure Data Lake Storage Gen2, run COPY INTO to load into dedicated SQL pool tables. Use Data Factory pipelines to orchestrate the end-to-end flow: fetch from source, stage in ADLS, load into the pool, notify downstream consumers.
Fit check
When to Choose Synapse
Synapse is the strongest choice when your analytics stack is already rooted in the Microsoft ecosystem. Here are the signals that confirm Synapse is the right fit.
Your organization is a Microsoft shop — Dynamics 365, Power BI, Azure AD, and Teams are already in production
Synapse Link for Dataverse gives you near-real-time CRM data in your warehouse without building separate ETL pipelines
Power BI DirectQuery on dedicated SQL pools means dashboards query live warehouse data — no extracts, no stale data
Serverless SQL pools let you query Parquet/JSON files in ADLS without loading them — pay only for data processed
Dedicated SQL pools give you predictable performance for production dashboards and scheduled reporting
Azure Data Factory (integrated into Synapse Studio) provides visual pipeline building for non-engineering teams
Architecture choices
Three Synapse Decisions That Drive Cost and Performance
These three choices determine how well your Synapse deployment handles conversation data at scale. Get them right early to avoid expensive migrations later.
1. Dedicated vs serverless SQL pool
Dedicated pools give guaranteed compute for production dashboards (DWU-based pricing, always running). Serverless pools charge per TB scanned — ideal for ad-hoc exploration and querying data lake files without loading. Many teams use both: serverless for exploration, dedicated for production reporting.
Start with serverless for development, add a dedicated pool when Power BI dashboards go live.
2. Distribution strategy
HASH distribution on call_id for the signals table co-locates signals for a conversation on the same distribution. ROUND_ROBIN for the conversations table spreads data evenly. REPLICATE for small lookup tables (scoring_versions, teams) avoids data movement during JOINs entirely. Wrong distribution causes expensive data movement — the query plan will show BroadcastMoveOperation or ShuffleMoveOperation when distributions are mismatched.
Always check distribution skew after initial loads.
3. Synapse Link for Dataverse
If you use Dynamics 365, enable Synapse Link to automatically replicate CRM data to ADLS. Query it from Synapse alongside conversation signals without building a separate CRM data pipeline. Join opportunity data with call signals in a single SQL query. The trade-off is a slight replication lag (minutes, not seconds).
Enable for every Dataverse table your BI team queries regularly.
Schema design
Schema Design
A two-table model with Synapse-native types, distribution annotations, and clustered columnstore indexes. Partition by call_timestamp on monthly ranges for efficient time-range queries.
conversations
DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX, PARTITION (call_timestamp monthly)
Columns
conversation_signals
DISTRIBUTION = HASH(call_id), CLUSTERED COLUMNSTORE INDEX
Columns
Data loading
Loading Strategy
ADLS Gen2 staging with COPY INTO is the recommended starting point. It gives you the fastest load path with minimal infrastructure.
ADLS Gen2 + COPY INTO
Recommended starting pattern
Land files in ADLS Gen2
Stage Parquet files (preferred), CSV, or JSON in your Azure Data Lake Storage Gen2 container. Organize by date or source for easy partition management.
COPY INTO for the fastest load path
COPY INTO reads directly from ADLS with columnar-aware optimization. Parquet format loads significantly faster than CSV and preserves types. COPY INTO handles file-format quirks (delimiters, encoding, compression) without external staging objects.
Deduplication pattern
COPY INTO a staging table, then use a DELETE/INSERT pattern within a transaction to merge into the target table. Match on call_id for conversations, signal_id for signals. This prevents duplicate rows from retries or overlapping file drops.
Orchestrating with Data Factory
Data Factory pipelines trigger on schedule or event (new file in ADLS). Copy Activity moves data from external sources to ADLS. Stored Procedure Activity runs COPY INTO and dedup logic. Monitor pipeline runs directly in Synapse Studio — no need to leave the workspace.
Ideal for teams without dedicated data engineering resources.
Serverless SQL pool for ad-hoc querying
Query Parquet files directly in ADLS without loading them. Use OPENROWSET to read files on demand. Create external tables for repeated queries against the same file paths. Good for exploring raw transcripts before deciding what to load into dedicated pools — you only pay for bytes scanned.
Use for exploration and development before committing to dedicated pool schema.
Production operations
Operational Playbook
Synapse-specific practices for keeping your conversation data warehouse performant and cost-efficient in production.
DWU scaling
Dedicated SQL pools scale in DWU increments (DW100c to DW30000c). Start at DW100c for development. Scale up for production loads and reporting hours. Scale down overnight. Pause the pool entirely during off-hours to stop billing — a paused pool costs nothing.
Distribution skew monitoring
Check DBCC PDW_SHOWSPACEUSED to identify distribution skew. If one distribution has significantly more data than others, your HASH key may be poorly chosen. High-cardinality, evenly-distributed columns make the best HASH keys. Re-evaluate distribution strategy and use CTAS to rebuild if needed.
Statistics maintenance
Synapse doesn’t auto-create statistics. CREATE STATISTICS on columns used in WHERE, JOIN, GROUP BY, and ORDER BY clauses. Update statistics after significant data loads (more than 10% change). Stale statistics cause the query optimizer to generate poor execution plans.
Result set caching
Enable result set caching for frequently-repeated dashboard queries. Cached results return instantly without consuming DWUs. Caches are invalidated automatically when underlying data changes. Particularly effective for Power BI dashboards with predictable query patterns.
CTAS for table rebuilds
Use CREATE TABLE AS SELECT (CTAS) to rebuild tables with better distribution, partitioning, or index strategies. CTAS is faster than ALTER TABLE for structural changes in Synapse. Drop the old table and rename the new one to swap.
Monitoring and cost management
Use Synapse Studio to monitor query performance, DWU utilization, and data movement operations. Set Azure cost alerts at the resource group level. Review the DMV sys.dm_pdw_exec_requests for slow queries and sys.dm_pdw_nodes_db_column_store_row_group_physical_stats for columnstore health.
Microsoft BI ecosystem
Analysis Examples
Three patterns that leverage the Microsoft integration stack to turn conversation data into actionable intelligence.
1. Signal to deal health dashboard (Power BI DirectQuery)
Power BI connects directly to the dedicated SQL pool via DirectQuery. Build a report with a deal-level view showing latest signal scores, risk flags, and trend sparklines. DirectQuery means the report always shows current data — no scheduled refreshes, no stale extracts. Publish to the Power BI service for organization-wide access with row-level security mapped to Azure AD groups.
2. CRM + conversation signals in one view (Synapse Link)
Enable Synapse Link for Dynamics 365 Dataverse. Dynamics opportunity tables replicate automatically to ADLS. Join opportunity data with conversation signals in a single SQL query inside Synapse — no ETL pipeline needed. Build a unified view of pipeline health powered by both CRM metadata and conversation intelligence. Filter by sales stage, territory, or rep.
3. Pipeline monitoring and alerting (Data Factory + Logic Apps)
Data Factory pipelines log success/failure metrics into Azure Monitor. Connect Logic Apps to send Teams notifications when a load fails or when signal counts drop below expected thresholds. Build operational visibility into your conversation data pipeline without custom monitoring infrastructure.
Structured signals
Distribution, Synapse Link, and the Microsoft BI Stack
Synapse sits at the centre of the Microsoft data stack — its value for conversation intelligence comes from native integration with Power BI, Dynamics 365, and Data Factory rather than raw compute speed.
HASH distribution on call_id for conversation_signals ensures all signals for a conversation land on the same distribution. When Power BI issues a DirectQuery that joins conversations and signals on call_id, the join happens locally on each distribution with zero data movement. Monitor distribution skew with DBCC PDW_SHOWSPACEUSED — if one distribution is 3x larger than the mean, your key has hot values.
Synapse Link for Dataverse mirrors Dynamics 365 CRM records (accounts, contacts, opportunities) into a serverless SQL pool in near-real-time. You can JOIN conversation signals in your dedicated pool with live CRM data in the serverless pool using cross-database queries — no ETL pipeline needed for the CRM side. The replication lag is typically under 10 minutes.
NVARCHAR columns in Synapse don’t have a JSONB equivalent with native operators. Use OPENJSON() and JSON_VALUE() to extract fields from stored JSON payloads, but extract frequently-queried signal fields into typed FLOAT and NVARCHAR columns during the COPY INTO stage. Columnar compression on typed columns is 5-8x more efficient than re-parsing JSON on every query.
Result set caching (enabled per session) stores the first execution of a query and returns cached results for subsequent identical queries. For BI dashboards where multiple users view the same signal summary, this avoids re-scanning the table entirely. DWU charges only apply to the first execution.
FAQ
Frequently Asked Questions
Explore