Overview
A B2B analytics platform had built a sleek, feature-rich dashboard. In demos with sample data, it was snappy and impressive. But in production, with real customer data, it was unusable. Initial page load took 12+ seconds. Interactions triggered full-page freezes. Customers were churning specifically because of dashboard performance.
The engineering team had tried various fixes—caching layers, query optimization, CDN setup—but nothing moved the needle significantly. They needed a systematic approach.
The Problem
Surface-level optimizations had missed the real issues:
- N+1 Query Explosions: GraphQL resolvers triggering hundreds of database queries per request
- Render Blocking: Entire dashboard waiting for slowest widget, not streaming
- Memory Leaks: 30-minute sessions consuming 2GB+ browser memory
- Unoptimized Aggregations: Real-time recalculating metrics that rarely changed
- Bundle Bloat: 4.2MB JavaScript bundle blocking first paint
System Architecture
The performance problems spanned the entire stack:
Redesigned Architecture
The solution required changes at every layer:
Performance Optimization Flow
The new request flow prioritizes perceived performance:
Widget Optimization Strategy
Different widgets required different optimization approaches:
The Solution
Phase 1: Profiling & Diagnosis (Week 1)
Systematic performance profiling revealed the real bottlenecks:
| Layer | Issue | Impact |
|---|---|---|
| Database | N+1 queries, missing indexes | 60% of latency |
| API | No batching, no caching | 20% of latency |
| Client | Render blocking, memory leaks | 15% of latency |
| Network | Bundle size, no compression | 5% of latency |
Phase 2: Backend Optimization (Week 2-3)
- Implemented DataLoader for automatic query batching
- Added materialized views for common aggregations
- Set up Redis caching with smart invalidation
- Migrated heavy analytics to ClickHouse
Phase 3: Frontend Optimization (Week 3-4)
- Code-split by route and widget
- Implemented streaming SSR with React 18
- Added virtualization for large data sets
- Built memory management with cleanup hooks
Phase 4: Infrastructure & Monitoring (Week 5)
- Configured CDN with edge caching
- Set up real-user monitoring (RUM)
- Created performance budgets with CI checks
- Built anomaly alerting for regressions
Results
The dashboard transformed from liability to selling point:
| Metric | Before | After | Change |
|---|---|---|---|
| Initial Load | 12.4s | 0.9s | -93% |
| Time to Interactive | 15.2s | 1.2s | -92% |
| Largest Contentful Paint | 11.8s | 0.8s | -93% |
| Memory (30-min session) | 2.1GB | 180MB | -91% |
| Bundle Size | 4.2MB | 420KB | -90% |
| Database Queries/Request | 347 | 12 | -97% |
Customer churn citing "performance" dropped from 23% to under 2%.
Technical Stack
| Component | Technology |
|---|---|
| Frontend | React 18, Next.js |
| State Management | TanStack Query (aggressive caching) |
| Visualization | D3.js, Canvas (not SVG) |
| API | GraphQL, Apollo Server |
| Batching | DataLoader |
| Primary DB | PostgreSQL |
| Analytics DB | ClickHouse |
| Cache | Redis, CDN (Cloudflare) |
| Monitoring | Datadog RUM, Lighthouse CI |
| Build | Webpack, Bundle Analyzer |
Key Learnings
- Profile before optimizing: Intuition about bottlenecks is often wrong—measure first
- N+1 is the silent killer: GraphQL makes N+1 easy to create and hard to spot
- Perceived performance matters: Users care about time-to-interactive, not total load time
- Memory leaks accumulate: Performance testing needs to include long-running sessions
- Performance is a feature: After optimization, "speed" became a key differentiator in sales calls