Supabase Read-After-Write Consistency
This guide explains the critical read-after-write consistency issue with Supabase read replicas and how CONA solves it for financial operations.๐จ The Problem
What Was Happening
Shopify invoices were created successfully, but 20-30% had no accounting impacts (GL entries). The workflow reported success, but financial data was incomplete.Root Cause Discovery
The issue wasnโt a workflow failure - it was silent data corruption:- โ Document created in PRIMARY database
- โ Line items created in PRIMARY database
- โ Child workflow queries REPLICA (via Supavisor pooler)
- โ Document found in replica
- โ Line items missing (not yet synced from primary)
- โ Financial entry creation โsucceedsโ with empty line items
- โ No GL entries created (or incomplete)
- No retries triggered (workflow thinks it succeeded)
- No error logs (silent corruption)
- Discovered days later by users
Technical Details
documents
table row syncs quickly (50-500ms)line_items
table (foreign key) lags behind- JOIN queries return incomplete data
โ The Solution
1. Direct Primary Connection
We created a separate Prisma client that connects directly to Supabase PRIMARY database, bypassing read replicas entirely. Files Created/Modified:How It Works
Aspect | Pooled (Before) | Direct (After) |
---|---|---|
Port | 6543 (Supavisor) | 5432 (Primary) |
Routing | Load-balanced to replicas | Direct to primary |
Data freshness | Stale (50-500ms lag) | Always current |
Line items | May be missing โ | Always complete โ |
Consistency | Eventually consistent | Strongly consistent |
2. READ_AFTER_WRITE_RETRY Policy
Added specialized retry policy for activities that query recently created data:batchCreateFinancialEntriesActivity
in workflow configuration.
3. Fixed Sleep Placement
Movedworkflow.sleep("500ms")
to BEFORE re-queries in parent workflows:
๐ง Setup Required
Environment Variables
AddDIRECT_URL
to your environment files:
- Go to Project Settings โ Database
- Copy โConnection stringโ (pooled) โ
DATABASE_URL
- Copy โDirect connectionโ โ
DIRECT_URL
Build & Deploy
๐ When to Use Each Connection
Use Case | Connection | Why |
---|---|---|
Financial entry creation | prismaDirect | Must have complete line items |
Reconciliation checks | prismaDirect | Requires consistent view |
Audit trail queries | prismaDirect | Must be accurate |
Document creation | prisma | Writes go to primary anyway |
Bulk queries | prisma | Spread load across replicas |
User dashboard queries | prisma | Fast response from nearby replica |
๐ฏ Expected Results
Before Fix
- โ 20-30% invoices missing accounting impacts
- โ Silent failures (no error logs)
- โ Discovered days later by users
- โ Manual intervention required
After Fix
- โ >99.9% success rate for financial entries
- โ Complete line items always fetched
- โ Zero replica lag issues
- โ Immediate consistency
๐๏ธ Tuning Connection Pools (Advanced)
When to Adjust Connection Limits
Increase direct connection limit if:- โ
You see frequent
P2024
pool timeout errors - โ Financial entry creation is consistently slow (>5s)
- โ Your PRIMARY database can handle more connections
- โ Youโre running on dedicated infrastructure (not serverless)
- โ ๏ธ Running on serverless (multiple instances ร connections = exhaustion)
- โ ๏ธ Sharing database with other applications
- โ ๏ธ Limited database connection capacity
Environment-Based Configuration
For different environments, you can override defaults:Connection Pool Monitoring
Use Prismaโs built-in metrics to track pool health:Watch for Pool Exhaustion
Error CodeP2024
indicates pool timeout:
- Increase
connection_limit
(if database can handle it) - Increase
pool_timeout
(if operations legitimately take longer) - Optimize slow queries
- Add external pooler (PgBouncer) for serverless
Optimal Settings by Deployment
Environment | Direct Connections | Pool Timeout | Rationale |
---|---|---|---|
Serverless (Fly.io) | 3 | 30s | Multiple instances, conserve connections |
Dedicated Workers | 5 | 20s | Single instance, balanced approach |
Local Development | 10 | 15s | No connection constraints |
High-Load Production | 10 + PgBouncer | 30s | External pooler handles scaling |
๐ Monitoring
Connection Pool Health
Find Documents Missing GL Entries
Check Replication Lag
Monitor Connection Types
๐งช Testing
Verify Direct Connection Works
โ ๏ธ Important Notes
Connection Pooling Optimizations
Question: Should we optimize Prismaโs connection pooling?Answer: Yes! While pooling wasnโt the root cause, we can optimize it for better performance.
Current Configuration
According to Prismaโs connection pool documentation, the default pool size is:4 * 2 + 1 = 9 connections
Optimizations Implemented
1. Direct Connection Pool (Conservative)- Bypasses Supavisor pooler
- Connects directly to PRIMARY database
- Too many direct connections can overwhelm primary
- Used only for critical operations (financial entries, reconciliation)
- Routes through Supavisor connection pooler
- Load-balanced across read replicas
- Supavisor handles connection management
- Used for high-volume queries
Connection Pool Strategy
Client | Pool Size | Timeout | Use Case | Rationale |
---|---|---|---|---|
prismaDirect | 5 | 20s | Critical operations | Protect primary, allow complex queries |
prisma | Default | 10s | General queries | Let Supavisor manage scaling |
Performance Impact
Direct connections:- Slightly higher latency (~5-20ms more)
- Limited connections (use sparingly)
- Trade: Acceptable for critical operations
- Lower latency (nearby replica)
- Unlimited scaling
- Trade: May have stale data
When Replica Lag Exceeds 2 Seconds
- Check Supabase Status: https://status.supabase.com
- Contact Supabase Support: โFRA region replica lag > 2sโ
- Request synchronous replication for your organization
๐ Additional Prisma Performance Optimizations
1. Select Field Optimization
Problem: Including all fields wastes bandwidth and memory.2. Pagination Instead of Fetching All
- Offset pagination gets slower with larger offsets
- Cursor pagination maintains constant performance
- Essential for workflows processing thousands of documents
3. Batch Operations with createMany
and updateMany
4. Transaction Batching
- Atomic operations (all or nothing)
- Single round trip for simple transactions
- Consistent data state
5. Avoid N+1 Queries with Proper Includes
6. Use Indexes Effectively
Check if your queries are using indexes:7. Query Logging and Analysis
Enable query logging to identify slow queries:8. Prisma Accelerate (Optional - Paid Service)
Prisma Accelerate provides:- Global connection pooling (solves serverless connection issues)
- Query caching at edge (Redis-backed)
- Automatic query optimization
- Running on serverless (Fly.io) with connection exhaustion
- Global user base (edge caching beneficial)
- Repetitive expensive queries
9. Selective Field Loading for Large JSON/Text
10. Parallel Queries for Independent Operations
๐ฏ Performance Optimization Checklist
Apply these in order of impact:- Connection pooling (โ Already done)
- Use
select
to fetch only needed fields (5-10x faster for large records) - Batch operations with
createMany
/updateMany
(10-100x faster) - Add database indexes for common query patterns
- Use transactions for multi-step operations
- Cursor pagination instead of offset for large datasets
- Avoid N+1 queries with proper
include
- Parallel independent queries with
Promise.all
- Query logging to identify bottlenecks
- Consider Prisma Accelerate for serverless at scale
๐ Key Learnings
-
Replica lag affects JOINs more than simple queries
- Parent table syncs fast
- Child tables (foreign keys) lag behind
- Queries with
include
return incomplete data
-
Silent failures are worse than loud failures
- Workflow โsucceedsโ with incomplete data
- No retries triggered
- Discovered too late
-
Direct connections solve read-after-write
- Bypass pooler โ No replica routing
- 100% data consistency
- Acceptable trade-off for critical operations
-
Connection pooling is not the root cause
- Both Prisma and Supabase pooling work fine
- Issue was replica routing, not pooling
-
Performance is multi-faceted
- Connection management (pooling)
- Query optimization (select, batch, indexes)
- Application architecture (parallel queries, caching)
- Right tool for the job (direct vs pooled connections)
๐ Related Documentation
๐ Deployment Checklist
- Add
DIRECT_URL
to staging environment - Build and deploy
@cona/database
package - Build and deploy
@cona/core
package - Build and deploy
@cona/temporal-workflows
package - Deploy temporal workers
- Monitor for 24-48 hours using SQL queries
- Verify no documents with missing GL entries
- Add
DIRECT_URL
to production environment - Deploy to production
- Monitor production for 48 hours
Status: โ IMPLEMENTED
Impact: Critical - Fixes 100% of missing accounting impacts
Last Updated: 2025-01-02