Skip to main content

Horizontal vs Vertical

Overview

Scaling is the process of increasing a system's capacity to handle more load. There are two fundamental approaches: vertical scaling (scaling up) and horizontal scaling (scaling out).

Quick Reference

Vertical Scaling (Scale Up)

  • Add more power (CPU, RAM, disk) to existing machine
  • Simple to implement, no code changes needed
  • Limitations:
    • Hard ceiling on hardware capacity
    • Single point of failure
    • Expensive at high specs (cost grows exponentially)
    • Downtime during upgrades

Horizontal Scaling (Scale Out)

  • Add more servers to distribute load
  • No theoretical ceiling, better fault tolerance
  • Challenges:
    • Requires stateless application design
    • Data consistency across nodes
    • More operational complexity

Database Scaling

  • Vertical first: Upgrade primary instance (e.g., AWS RDS supports up to 24TB RAM)
  • Horizontal (sharding): Split data across multiple servers by shard key
  • Sharding challenges:
    • Resharding when data grows unevenly (use consistent hashing)
    • Hotspot/celebrity problem (popular data on same shard)
    • Cross-shard joins become difficult (de-normalize instead)

Key Principles

  • Keep web tier stateless for easy horizontal scaling
  • Build redundancy at every tier
  • Cache aggressively to reduce database load
  • Use CDN for static assets
  • Scale data tier by sharding when vertical limits are reached

Questions

Q1: What is horizontal vs vertical scaling?

Detailed Explanation

Vertical Scaling:

  • Upgrade the hardware of a single server
  • Increase CPU cores, RAM, or storage capacity
  • No changes to application architecture required
  • Has physical limits (you can only make a server so big)

Horizontal Scaling:

  • Add more servers to a pool
  • Distribute workload across multiple machines
  • Requires application design considerations (statelessness, data consistency)
  • Theoretically unlimited scaling potential

Example

Consider a web application experiencing slow response times:

  • Vertical approach: Upgrade from a 4-core server with 16GB RAM to a 16-core server with 64GB RAM
  • Horizontal approach: Add 3 more identical servers behind a load balancer

Q2: When would you choose one over the other?

Detailed Explanation

Choose Vertical Scaling when:

  • Your application is stateful and difficult to distribute
  • You need a quick fix without architectural changes
  • Your workload is I/O bound and benefits from faster hardware
  • Cost of re-architecting exceeds cost of bigger hardware
  • You're dealing with legacy systems

Choose Horizontal Scaling when:

  • You need high availability and fault tolerance
  • Your traffic is unpredictable and requires elasticity
  • You've hit the limits of vertical scaling
  • Your application is stateless or can be made stateless
  • You need geographic distribution

Example

  • Database primary: Often scaled vertically first (bigger instance) because writes typically go to a single node
  • Web/API servers: Usually scaled horizontally because they're stateless and can easily run in parallel
  • Cache layer: Can go either way—Redis can scale vertically to a point, then requires clustering (horizontal)

Q3: What are the pros and cons of each?

Detailed Explanation

Vertical Scaling:

ProsCons
Simple to implementHardware limits (ceiling)
No code changes neededSingle point of failure
Lower operational complexityDowntime during upgrades
Better for ACID transactionsCost grows exponentially
Simpler data consistencyVendor lock-in risk

Horizontal Scaling:

ProsCons
No theoretical ceilingComplex architecture
High availability/fault toleranceData consistency challenges
Cost-effective at scaleNetwork latency between nodes
Geographic distributionRequires stateless design
Pay for what you useOperational overhead

Example

Cost comparison at scale:

  • Vertical: A server with 2x the CPU often costs more than 2x the price
  • Horizontal: Two standard servers typically cost exactly 2x, sometimes less

Q4: Give real-world examples of each

Detailed Explanation

Vertical Scaling Examples:

  • Instagram (early days): Ran on a single PostgreSQL server that was continuously upgraded before eventually sharding
  • Stack Overflow: Famous for scaling vertically—runs on a surprisingly small number of powerful servers
  • Most startup MVPs: Begin with a single beefy database server

Horizontal Scaling Examples:

  • Netflix: Thousands of microservices distributed globally
  • Google Search: Distributes queries across massive server farms
  • Facebook: Memcached clusters with thousands of nodes
  • Amazon: Auto-scaling groups that add/remove EC2 instances based on demand

Example

Stack Overflow's approach: They handle billions of page views with just a handful of servers by heavily optimizing their code and using powerful hardware. This is a counterexample to "always scale horizontally"—sometimes vertical scaling with good engineering is the right choice.


Q5: What challenges come with horizontal scaling?

Detailed Explanation

Key Challenges:

  1. Data Consistency

    • Keeping data synchronized across nodes
    • Handling distributed transactions
    • Dealing with eventual consistency
  2. Session Management

    • User sessions must be shared or externalized
    • Sticky sessions vs. stateless design
    • Token-based authentication becomes preferred
  3. Service Discovery

    • How do services find each other?
    • Dynamic IP addresses as instances scale
    • Tools: Consul, etcd, Kubernetes DNS
  4. Load Balancing

    • Even distribution of traffic
    • Health checks and failover
    • Algorithm selection (round-robin, least connections, etc.)
  5. Operational Complexity

    • More servers = more things to monitor
    • Distributed logging and tracing
    • Configuration management at scale

Example

Session management evolution:

  • Phase 1: Sessions stored on server (breaks with horizontal scaling)
  • Phase 2: Sticky sessions via load balancer (limits flexibility)
  • Phase 3: Centralized session store (Redis)
  • Phase 4: Stateless JWT tokens (no server-side session needed)

Q6: Can you combine both approaches?

Detailed Explanation

Hybrid Scaling Strategy:

  1. Application Layer: Scale horizontally

    • Stateless web servers behind load balancers
    • Easy to add/remove instances
  2. Cache Layer: Start vertical, then horizontal

    • Single Redis instance initially
    • Redis Cluster when you outgrow it
  3. Database Layer: Vertical first, then horizontal

    • Upgrade primary instance as long as possible
    • Add read replicas (horizontal for reads)
    • Eventually shard (horizontal for writes)
  4. File Storage: Horizontal from the start

    • Object storage (S3) is inherently distributed

Example

Typical e-commerce architecture:

  • 10+ stateless API servers (horizontal)
  • 1 large primary database + 5 read replicas (vertical + horizontal)
  • Redis cluster for sessions and caching (horizontal)
  • CDN for static assets (horizontal by nature)

Q7: How does cloud computing affect this decision?

Detailed Explanation

Cloud Advantages for Scaling:

  1. Auto-Scaling

    • Automatically add/remove instances based on metrics
    • No capacity planning needed
    • Pay only for what you use
  2. Managed Services

    • RDS handles database scaling complexity
    • ElastiCache manages Redis clustering
    • Reduces operational burden
  3. Global Infrastructure

    • Multiple regions and availability zones
    • Built-in redundancy
    • CDN integration
  4. Instance Variety

    • Can scale vertically with one click
    • Wide range of instance sizes
    • Specialized instances (compute, memory, storage optimized)

Cloud-Native Patterns:

  • Serverless (Lambda): Automatic horizontal scaling to zero
  • Containers (ECS/EKS): Easy horizontal scaling with orchestration
  • Spot instances: Cost-effective horizontal scaling

Example

Before cloud:

  • Horizontal scaling required purchasing, racking, and configuring new servers
  • Lead time: weeks to months
  • Requires accurate capacity forecasting

With cloud:

  • Horizontal scaling is an API call or configuration change
  • Lead time: minutes
  • Can react to actual demand in real-time

Q8: What metrics determine when to scale?

Detailed Explanation

Primary Scaling Metrics:

MetricWhen to ScaleTypical Threshold
CPU UtilizationHigh compute load70-80% sustained
Memory UsageMemory pressure80-85%
Request LatencySlow responsesp95 > SLA target
Queue DepthBacklog buildingGrowing trend
Error RateSystem stressAbove baseline
Connection CountConnection exhaustionNear pool limit

Scaling Strategies:

  1. Reactive Scaling

    • Respond to current metrics
    • Risk: Lag time before new capacity is ready
  2. Predictive Scaling

    • Use historical patterns
    • Scale before the traffic arrives
    • Better for known events (sales, launches)
  3. Scheduled Scaling

    • Based on time of day/week
    • Good for predictable patterns

Example

Setting up auto-scaling on AWS:

  • Target tracking policy: Maintain average CPU at 60%
  • Step scaling: Add 2 instances if CPU > 80%, add 4 if > 90%
  • Cooldown period: 300 seconds to prevent thrashing