Semarize

CRM & Data

Google Cloud SQL — Managed PostgreSQL for Conversation Data on GCP

Cloud SQL gives you a fully managed PostgreSQL instance with automated backups, patching, replication, and failover — without the operational overhead of running Postgres yourself. If you're on GCP and need an operational database for conversation data (not a warehouse for analytics), Cloud SQL is the simplest path.

Recommended default ingestion pattern

Cloud SQL Auth Proxy + client library upserts

Use Cloud Functions (HTTP trigger or Pub/Sub trigger) to receive conversation data, normalize it, and write to Cloud SQL via the Auth Proxy. The Auth Proxy handles SSL, IAM authentication, and connection pooling automatically. Schedule with Cloud Scheduler for batch processing.

When it fits

When to Choose Cloud SQL

Cloud SQL is not a data warehouse. It's a managed operational database. It shines when you need PostgreSQL on GCP without the work of running it yourself.

You’re on GCP and want a managed PostgreSQL instance without running your own servers or managing patches, backups, and failover. Cloud SQL handles all of that automatically.

Cloud SQL Auth Proxy provides secure, IAM-based connections from Cloud Functions, Cloud Run, GKE, and external services — no VPN setup or IP whitelisting needed. SSL certificates are managed automatically.

Automated backups, point-in-time recovery, and high availability with regional failover are built in. You configure them once and they run without intervention.

You need an operational database for application-layer reads and writes — dashboards, API responses, real-time features. Not columnar analytics across millions of rows (that’s BigQuery).

Cloud SQL integrates with Looker, Looker Studio, and Data Studio for direct BI connections. Point your dashboard tool at the database and start building.

You want to start simple and graduate to BigQuery for analytics later. Both are GCP-native and easy to bridge — use BigQuery Data Transfer Service or Dataflow to replicate data when the time comes.

Architecture choices

Three Decisions to Get Right on Cloud SQL

These three decisions will shape how your conversation data pipeline behaves in production. Get them right early.

1

PostgreSQL vs MySQL engine

Choose PostgreSQL. JSONB support, GIN indexes, INSERT ... ON CONFLICT (upsert), and full-text search make it significantly better for semi-structured conversation data. MySQL works but lacks these features. All Cloud SQL PostgreSQL guidance in this guide assumes PostgreSQL.

2

When to graduate to BigQuery

Cloud SQL excels for operational workloads: low-latency reads, transactional writes, application-layer queries. When you need columnar analytics across millions of rows, cross-source joins, or BI at scale, add BigQuery alongside Cloud SQL. Use Dataflow or BigQuery Data Transfer Service to replicate. Don’t replace Cloud SQL — use both.

3

Instance sizing and connection management

Start with db-custom-2-7680 (2 vCPU, 7.5GB RAM). Scale up CPU and RAM as needed. Enable storage auto-resize so you don’t run out of disk. Use Cloud SQL Auth Proxy for all connections — it handles SSL, IAM authentication, and connection pooling. Direct IP connections are a security risk you don’t need to take.

Schema design

Schema Design

A two-table model with PostgreSQL-native types. Keep raw conversation data separate from extracted signals so you can re-evaluate conversations with updated scoring logic without modifying the source data. All types below assume the PostgreSQL engine.

conversations

One row per call

Columns

call_idTEXT PKPrimary key. Unique identifier from the source platform.
source_platformTEXTWhere the call originated — gong, zoom, teams, etc.
call_timestampTIMESTAMPTZWhen the call started. NOT NULL. Timezone-aware for correct ordering across regions.
duration_secondsINTEGERCall duration in seconds.
participantsJSONBArray of participant names, roles, and email addresses. Indexable with GIN.
deal_idTEXTForeign key to CRM deal/opportunity.
account_idTEXTForeign key to CRM account/company.
transcript_textTEXTFull transcript text. Cloud SQL handles large text well.
metadataJSONBFlexible JSON for source-specific metadata. Add a GIN index for efficient querying.
ingested_atTIMESTAMPTZDEFAULT now(). When this row was inserted into the database.

conversation_signals

One row per extracted signal

Columns

signal_idTEXT PKPrimary key. Unique identifier for this signal row.
call_idTEXT FKForeign key to conversations(call_id).
signal_nameTEXTName of the extracted signal — overall_score, risk_flag, pain_point. NOT NULL.
signal_valueTEXTThe extracted value. Cast to appropriate type in queries.
signal_typeTEXTType hint — score, boolean, category, extraction, count.
confidenceREALConfidence score from the evaluation (0.0 to 1.0).
scoring_versionTEXTVersion of the evaluation framework that produced this signal.
evaluated_atTIMESTAMPTZWhen this signal was generated.

Indexes & Cloud SQL Configuration

Performance and monitoring setup

B-tree index on call_timestamp for time-range queries. This is the most common filter in dashboards and reports.

Composite index on (call_id, signal_name) for fast signal lookups by conversation and signal type.

GIN indexes on JSONB columns (participants, metadata) for efficient JSON path queries and containment checks.

Enable Query Insights for slow query monitoring. Use Cloud SQL for PostgreSQL flags to tune work_mem, shared_buffers, and other performance-critical settings.

For Semarize-specific modeling, versioning, and grounding lineage best practices, see the canonical modeling guide.

Data loading

Loading Strategy

Start with a Cloud Function and Auth Proxy for real-time or scheduled writes. Add bulk loading for historical backfills and Cloud Run for longer-running imports.

Cloud Function + Auth Proxy

Recommended starting pattern

Cloud Function receives webhook or runs on Cloud Scheduler trigger

Connects to Cloud SQL via Auth Proxy — automatic SSL, IAM auth, no database passwords to manage

Uses INSERT ... ON CONFLICT (call_id) DO UPDATE for idempotent writes — re-running the same batch produces the same result

Auth Proxy handles connection pooling and IAM authentication so your function code stays simple

Bulk historical loads

Use Cloud SQL import from Cloud Storage (CSV or SQL dump) for the fastest bulk load path

Or connect via psql through Auth Proxy and use COPY FROM for maximum throughput

For large imports (>1GB), consider increasing instance tier temporarily — scale up, import, scale back down

Continuous ingestion

Pub/Sub → Cloud Function → Cloud SQL for event-driven writes. A new conversation fires a webhook, publishes to Pub/Sub, the Cloud Function processes and writes.

Cloud Scheduler → Cloud Function → Cloud SQL for scheduled batch processing. Run every 15 minutes, hourly, or daily depending on latency requirements.

Cloud Run for long-running import processes that exceed the Cloud Function timeout (9 minutes). Cloud Run supports up to 60 minutes per request.

Production concerns

Operational Playbook

Cloud SQL-specific operational considerations for running a conversation data pipeline in production. Standard PostgreSQL best practices (autovacuum, EXPLAIN ANALYZE, index maintenance) still apply — these are the managed service extras.

Cloud SQL Auth Proxy

Always use Auth Proxy for connections. It handles SSL certificates automatically, supports IAM-based auth (no database passwords), and provides connection pooling. Deploy as a sidecar in GKE or as a standalone binary on Compute Engine. For Cloud Functions and Cloud Run, use the built-in Cloud SQL connector instead.

Automated backups and PITR

Cloud SQL takes daily backups automatically. Enable point-in-time recovery for the ability to restore to any second within the retention window (up to 7 days). Backups are stored in Cloud Storage — no storage management needed. Test a restore at least once before you need it in production.

Maintenance windows

Set a maintenance window for automatic PostgreSQL minor version updates and patches. Choose a low-traffic period. Cloud SQL handles patching without manual intervention — your instance may restart briefly during the window.

Query Insights

Enable Cloud SQL Query Insights to identify slow queries, see query execution plans, and track performance over time. Available in the Cloud Console. Equivalent to pg_stat_statements but with a built-in UI and no extension management.

Read replicas

Add cross-region read replicas for low-latency reads from different regions, or same-region replicas to offload analytical queries from the primary. Replicas use asynchronous replication — expect a few seconds of lag under normal conditions.

Storage auto-resize

Enable storage auto-resize to prevent disk full errors. Cloud SQL increases storage automatically when usage exceeds the threshold. Important: storage can only grow, never shrink — choose your initial size carefully and monitor growth rates.

What you can build

Analysis Examples

Three patterns that use GCP-native tools to turn conversation signals into actionable output — no third-party infrastructure required.

1

Signal dashboard (Looker Studio)

Looker Studio (formerly Data Studio) connects directly to Cloud SQL. Build a dashboard that shows signal trends by rep and team. Filter by date range and scoring version. Share with stakeholders via Google Workspace. Free to use — no BI license fees.

2

BigQuery federation for deep analytics

Use BigQuery external connections to query Cloud SQL tables from BigQuery SQL. Join conversation signals with BigQuery datasets (marketing attribution, product analytics) without replicating data. Good for ad-hoc cross-source analysis when you need warehouse-scale queries but want to keep operational data in Cloud SQL.

3

Automated alerting (Cloud Functions + Pub/Sub)

Create a Cloud Function that runs on schedule, queries for high-risk signals inserted in the last hour, and publishes alerts to a Pub/Sub topic. Downstream subscribers (Slack, email, PagerDuty) receive notifications. No third-party monitoring tool needed — the entire pipeline runs on GCP.

Structured signals

Auth Proxy, BigQuery Federation, and When to Graduate

Cloud SQL is your operational database — it handles transactional signal writes and application reads. Here is how to get the most out of it and when to pair it with BigQuery for analytics.

Cloud SQL Auth Proxy eliminates the need for IP whitelisting or SSL certificate management. Your Cloud Function connects through the proxy using IAM authentication, and the connection is encrypted end-to-end. This is not optional security hygiene — it is the only recommended connection method for production workloads on GCP.

BigQuery federated queries (EXTERNAL_QUERY) let you join Cloud SQL signal tables with BigQuery datasets in a single SQL statement. Use this for cross-source analytics — e.g., joining conversation signals in Cloud SQL with marketing attribution data in BigQuery — without building an ETL pipeline. Federated queries run on BigQuery compute, so your Cloud SQL instance is not loaded by analytical workloads.

Query Insights (the built-in monitoring tool) surfaces the exact queries causing high CPU or lock contention on your Cloud SQL instance. Enable it from the console and watch for signal ingestion queries that are missing indexes or holding locks too long. The normalised query view groups similar queries so you can spot patterns instead of individual slow queries.

When your signal table exceeds 50M rows and your BI dashboards start slowing down, that is the signal to add BigQuery alongside Cloud SQL rather than scaling up your instance. Export signals to BigQuery on a schedule (or use federated queries) and point your dashboards at BigQuery. Keep Cloud SQL for application reads and webhook ingestion.

Full modeling guide: Modeling Semarize Outputs covers versioning, lineage columns, and schema patterns for any database.
CRM sync: For pushing signals back to your CRM, see the HubSpot guide or Salesforce guide.

FAQ

Frequently Asked Questions

Explore

Explore Semarize