Horizontal vs Vertical
Overview
Scaling is the process of increasing a system's capacity to handle more load. There are two fundamental approaches: vertical scaling (scaling up) and horizontal scaling (scaling out).
Quick Reference
Vertical Scaling (Scale Up)
- Add more power (CPU, RAM, disk) to existing machine
- Simple to implement, no code changes needed
- Limitations:
- Hard ceiling on hardware capacity
- Single point of failure
- Expensive at high specs (cost grows exponentially)
- Downtime during upgrades
Horizontal Scaling (Scale Out)
- Add more servers to distribute load
- No theoretical ceiling, better fault tolerance
- Challenges:
- Requires stateless application design
- Data consistency across nodes
- More operational complexity
Database Scaling
- Vertical first: Upgrade primary instance (e.g., AWS RDS supports up to 24TB RAM)
- Horizontal (sharding): Split data across multiple servers by shard key
- Sharding challenges:
- Resharding when data grows unevenly (use consistent hashing)
- Hotspot/celebrity problem (popular data on same shard)
- Cross-shard joins become difficult (de-normalize instead)
Key Principles
- Keep web tier stateless for easy horizontal scaling
- Build redundancy at every tier
- Cache aggressively to reduce database load
- Use CDN for static assets
- Scale data tier by sharding when vertical limits are reached
Questions
Q1: What is horizontal vs vertical scaling?
Detailed Explanation
Vertical Scaling:
- Upgrade the hardware of a single server
- Increase CPU cores, RAM, or storage capacity
- No changes to application architecture required
- Has physical limits (you can only make a server so big)
Horizontal Scaling:
- Add more servers to a pool
- Distribute workload across multiple machines
- Requires application design considerations (statelessness, data consistency)
- Theoretically unlimited scaling potential
Example
Consider a web application experiencing slow response times:
- Vertical approach: Upgrade from a 4-core server with 16GB RAM to a 16-core server with 64GB RAM
- Horizontal approach: Add 3 more identical servers behind a load balancer
Q2: When would you choose one over the other?
Detailed Explanation
Choose Vertical Scaling when:
- Your application is stateful and difficult to distribute
- You need a quick fix without architectural changes
- Your workload is I/O bound and benefits from faster hardware
- Cost of re-architecting exceeds cost of bigger hardware
- You're dealing with legacy systems
Choose Horizontal Scaling when:
- You need high availability and fault tolerance
- Your traffic is unpredictable and requires elasticity
- You've hit the limits of vertical scaling
- Your application is stateless or can be made stateless
- You need geographic distribution
Example
- Database primary: Often scaled vertically first (bigger instance) because writes typically go to a single node
- Web/API servers: Usually scaled horizontally because they're stateless and can easily run in parallel
- Cache layer: Can go either way—Redis can scale vertically to a point, then requires clustering (horizontal)
Q3: What are the pros and cons of each?
Detailed Explanation
Vertical Scaling:
| Pros | Cons |
|---|---|
| Simple to implement | Hardware limits (ceiling) |
| No code changes needed | Single point of failure |
| Lower operational complexity | Downtime during upgrades |
| Better for ACID transactions | Cost grows exponentially |
| Simpler data consistency | Vendor lock-in risk |
Horizontal Scaling:
| Pros | Cons |
|---|---|
| No theoretical ceiling | Complex architecture |
| High availability/fault tolerance | Data consistency challenges |
| Cost-effective at scale | Network latency between nodes |
| Geographic distribution | Requires stateless design |
| Pay for what you use | Operational overhead |
Example
Cost comparison at scale:
- Vertical: A server with 2x the CPU often costs more than 2x the price
- Horizontal: Two standard servers typically cost exactly 2x, sometimes less
Q4: Give real-world examples of each
Detailed Explanation
Vertical Scaling Examples:
- Instagram (early days): Ran on a single PostgreSQL server that was continuously upgraded before eventually sharding
- Stack Overflow: Famous for scaling vertically—runs on a surprisingly small number of powerful servers
- Most startup MVPs: Begin with a single beefy database server
Horizontal Scaling Examples:
- Netflix: Thousands of microservices distributed globally
- Google Search: Distributes queries across massive server farms
- Facebook: Memcached clusters with thousands of nodes
- Amazon: Auto-scaling groups that add/remove EC2 instances based on demand
Example
Stack Overflow's approach: They handle billions of page views with just a handful of servers by heavily optimizing their code and using powerful hardware. This is a counterexample to "always scale horizontally"—sometimes vertical scaling with good engineering is the right choice.
Q5: What challenges come with horizontal scaling?
Detailed Explanation
Key Challenges:
-
Data Consistency
- Keeping data synchronized across nodes
- Handling distributed transactions
- Dealing with eventual consistency
-
Session Management
- User sessions must be shared or externalized
- Sticky sessions vs. stateless design
- Token-based authentication becomes preferred
-
Service Discovery
- How do services find each other?
- Dynamic IP addresses as instances scale
- Tools: Consul, etcd, Kubernetes DNS
-
Load Balancing
- Even distribution of traffic
- Health checks and failover
- Algorithm selection (round-robin, least connections, etc.)
-
Operational Complexity
- More servers = more things to monitor
- Distributed logging and tracing
- Configuration management at scale
Example
Session management evolution:
- Phase 1: Sessions stored on server (breaks with horizontal scaling)
- Phase 2: Sticky sessions via load balancer (limits flexibility)
- Phase 3: Centralized session store (Redis)
- Phase 4: Stateless JWT tokens (no server-side session needed)
Q6: Can you combine both approaches?
Detailed Explanation
Hybrid Scaling Strategy:
-
Application Layer: Scale horizontally
- Stateless web servers behind load balancers
- Easy to add/remove instances
-
Cache Layer: Start vertical, then horizontal
- Single Redis instance initially
- Redis Cluster when you outgrow it
-
Database Layer: Vertical first, then horizontal
- Upgrade primary instance as long as possible
- Add read replicas (horizontal for reads)
- Eventually shard (horizontal for writes)
-
File Storage: Horizontal from the start
- Object storage (S3) is inherently distributed
Example
Typical e-commerce architecture:
- 10+ stateless API servers (horizontal)
- 1 large primary database + 5 read replicas (vertical + horizontal)
- Redis cluster for sessions and caching (horizontal)
- CDN for static assets (horizontal by nature)
Q7: How does cloud computing affect this decision?
Detailed Explanation
Cloud Advantages for Scaling:
-
Auto-Scaling
- Automatically add/remove instances based on metrics
- No capacity planning needed
- Pay only for what you use
-
Managed Services
- RDS handles database scaling complexity
- ElastiCache manages Redis clustering
- Reduces operational burden
-
Global Infrastructure
- Multiple regions and availability zones
- Built-in redundancy
- CDN integration
-
Instance Variety
- Can scale vertically with one click
- Wide range of instance sizes
- Specialized instances (compute, memory, storage optimized)
Cloud-Native Patterns:
- Serverless (Lambda): Automatic horizontal scaling to zero
- Containers (ECS/EKS): Easy horizontal scaling with orchestration
- Spot instances: Cost-effective horizontal scaling
Example
Before cloud:
- Horizontal scaling required purchasing, racking, and configuring new servers
- Lead time: weeks to months
- Requires accurate capacity forecasting
With cloud:
- Horizontal scaling is an API call or configuration change
- Lead time: minutes
- Can react to actual demand in real-time
Q8: What metrics determine when to scale?
Detailed Explanation
Primary Scaling Metrics:
| Metric | When to Scale | Typical Threshold |
|---|---|---|
| CPU Utilization | High compute load | 70-80% sustained |
| Memory Usage | Memory pressure | 80-85% |
| Request Latency | Slow responses | p95 > SLA target |
| Queue Depth | Backlog building | Growing trend |
| Error Rate | System stress | Above baseline |
| Connection Count | Connection exhaustion | Near pool limit |
Scaling Strategies:
-
Reactive Scaling
- Respond to current metrics
- Risk: Lag time before new capacity is ready
-
Predictive Scaling
- Use historical patterns
- Scale before the traffic arrives
- Better for known events (sales, launches)
-
Scheduled Scaling
- Based on time of day/week
- Good for predictable patterns
Example
Setting up auto-scaling on AWS:
- Target tracking policy: Maintain average CPU at 60%
- Step scaling: Add 2 instances if CPU > 80%, add 4 if > 90%
- Cooldown period: 300 seconds to prevent thrashing