25  Scale Up vs Scale Out

Figure 25.1: Azure App Service Deploy

25.1 The Two Scaling Strategies

When your application needs more resources to handle increased load, you have two fundamental approaches:

Scale Up (Vertical Scaling): Make your server bigger
Scale Out (Horizontal Scaling): Add more servers

25.2 Scale Up (Vertical Scaling)

25.2.1 Concept

Adding more power to the SAME machine - more CPU, RAM, disk, etc.

BEFORE (Scale Up):              AFTER (Scale Up):

┌─────────────────┐            ┌─────────────────┐
│   Web Server    │            │   Web Server    │
│                 │            │                 │
│  CPU: 2 cores   │   ──>      │  CPU: 8 cores   │ ← More powerful
│  RAM: 4 GB      │            │  RAM: 16 GB     │ ← More memory
│  Disk: 100 GB   │            │  Disk: 500 GB   │ ← More storage
└─────────────────┘            └─────────────────┘

    1 Server                       1 Server
   (weaker)                      (stronger)

25.2.2 Radiology AI Example

Your CT Stroke Detection App:

BEFORE:                          AFTER:
┌─────────────────┐            ┌─────────────────┐
│  M3 Pro Mac     │            │  M3 Max Mac     │
│  • 11 cores     │   ──>      │  • 16 cores     │
│  • 36 GB RAM    │            │  • 128 GB RAM   │
│                 │            │                 │
│  Processes 5    │            │  Processes 15   │
│  CT scans/min   │            │  CT scans/min   │
└─────────────────┘            └─────────────────┘

Same number of machines (1)
But more powerful!

25.2.3 Characteristics

✅ Advantages: - Simpler architecture (still one machine) - No code changes needed - No load balancing complexity - Better for single-threaded applications - Maintains data locality (everything on one server)

❌ Disadvantages: - Limited - can only make machine so big - Expensive - high-end hardware costs more - Single point of failure - if server dies, everything stops - Downtime - usually requires restart to upgrade - Diminishing returns - 2x cost ≠ 2x performance

25.2.4 Real Limits

Consumer Hardware:
CPU: ~64 cores max
RAM: ~192 GB max
Cost: $5,000 - $15,000

Enterprise Hardware:
CPU: ~256 cores max
RAM: ~24 TB max
Cost: $100,000 - $1,000,000+

But there IS a ceiling! ←─── This is the problem

25.3 Scale Out (Horizontal Scaling)

25.3.1 Concept

Adding more machines of the same size to distribute the load.

BEFORE (Scale Out):            AFTER (Scale Out):

┌─────────────┐               ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│ Web Server  │               │ Web Server  │  │ Web Server  │  │ Web Server  │
│             │    ──>        │             │  │             │  │             │
│ 2 cores     │               │ 2 cores     │  │ 2 cores     │  │ 2 cores     │
│ 4 GB RAM    │               │ 4 GB RAM    │  │ 4 GB RAM    │  │ 4 GB RAM    │
└─────────────┘               └─────────────┘  └─────────────┘  └─────────────┘

  1 Server                          3 Servers (same specs)
                                   
                                   Load Balancer
                                        │
                            ┌───────────┼───────────┐
                            │           │           │
                            ▼           ▼           ▼
                         Server 1   Server 2   Server 3

25.3.2 Radiology AI Example

High Volume Day at Ramathibodi:

BEFORE:                          AFTER:
┌────────────────┐              ┌────────────────┐
│  Single Server │              │ Load Balancer  │
│  Processing    │              └───────┬────────┘
│  CT Strokes    │                      │
│                │              ┌───────┼───────┬───────┐
│  Queue: 50     │              │       │       │       │
│  scans waiting │              ▼       ▼       ▼       ▼
└────────────────┘         ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
                           │Srv 1│ │Srv 2│ │Srv 3│ │Srv 4│
   Can't keep up!          └─────┘ └─────┘ └─────┘ └─────┘
                           
                           Queue: 0 - all scans processed!

25.3.3 Characteristics

✅ Advantages: - Unlimited scaling - just add more servers - High availability - if one fails, others continue - No downtime - add servers without stopping - Cost effective - use commodity hardware - Geographic distribution - servers in different locations

❌ Disadvantages: - More complex architecture - Requires load balancing - Need to handle state management - Requires stateless application design - More moving parts = more can break - Network overhead between servers

25.4 Azure-Specific Implementation

Looking at your diagram, let’s map these concepts to Azure services:

25.4.1 Azure App Service Plan: Both Strategies Available

┌─────────────────────────────────────────────────────────┐
│            Azure App Service Plan                       │
│                                                         │
│  SCALE UP:                 SCALE OUT:                   │
│  Change tier ───────────── Add instances                │
│                                                         │
│  ┌──────────────┐         ┌───┐ ┌───┐ ┌───┐             │
│  │              │         │   │ │   │ │   │             │
│  │  Tier: B1    │         │ 1 │ │ 2 │ │ 3 │ ← Instances │
│  │  1 core      │         │   │ │   │ │   │             │
│  │  1.75 GB     │         └───┘ └───┘ └───┘             │
│  │              │                                       │
│  └──────────────┘         Auto-scale rules:             │
│       ↓                   • CPU > 70% → add instance    │
│  ┌──────────────┐         • CPU < 30% → remove instance │
│  │  Tier: P2V3  │                                       │
│  │  4 cores     │                                       │
│  │  16 GB       │                                       │
│  └──────────────┘                                       │
└─────────────────────────────────────────────────────────┘

25.4.2 Scale Up in Azure App Service

Changing the pricing tier = Scale Up

Pricing Tiers (Examples):

Free/Shared:
┌──────────┐
│ F1       │  60 min/day, 1 GB RAM
└──────────┘

Basic:
┌──────────┐
│ B1       │  1 core, 1.75 GB   $13/month
│ B2       │  2 cores, 3.5 GB   $26/month
│ B3       │  4 cores, 7 GB     $52/month
└──────────┘

Standard:
┌──────────┐
│ S1       │  1 core, 1.75 GB   $70/month
│ S2       │  2 cores, 3.5 GB   $140/month
│ S3       │  4 cores, 7 GB     $280/month
└──────────┘

Premium:
┌──────────┐
│ P1V3     │  2 cores, 8 GB     $117/month
│ P2V3     │  4 cores, 16 GB    $234/month
│ P3V3     │  8 cores, 32 GB    $468/month
└──────────┘

Each tier = bigger machine (Scale Up!)

25.4.3 Scale Out in Azure App Service

Adding instances within a tier = Scale Out

Example: P2V3 Tier Scaled Out

                    ┌────────────────┐
                    │ Load Balancer  │
                    └────────┬───────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
        ┌──────────┐   ┌──────────┐   ┌──────────┐
        │Instance 1│   │Instance 2│   │Instance 3│
        │          │   │          │   │          │
        │ P2V3     │   │ P2V3     │   │ P2V3     │
        │ 4 cores  │   │ 4 cores  │   │ 4 cores  │
        │ 16 GB    │   │ 16 GB    │   │ 16 GB    │
        └──────────┘   └──────────┘   └──────────┘

Total capacity: 12 cores, 48 GB
Cost: $234/month × 3 = $702/month

25.4.4 From Your Diagram: Deployment Slots

The diagram shows deployment slots (staging, production, last-known good):

┌─────────────────────────────────────────────┐
│         App Service Plan                    │
│                                             │
│  ┌───────────────────────────────────────┐  │
│  │  Production Slot                      │  │
│  │  ┌───┐  ┌───┐  ┌───┐                  │  │
│  │  │ 1 │  │ 2 │  │ 3 │ ← 3 instances    │  │ ← Scale Out
│  │  └───┘  └───┘  └───┘                  │  │
│  └───────────────────────────────────────┘  │
│                                             │
│  ┌───────────────────────────────────────┐  │
│  │  Staging Slot                         │  │
│  │  ┌───┐                                │  │
│  │  │ 1 │ ← 1 instance                   │  │
│  │  └───┘                                │  │
│  └───────────────────────────────────────┘  │
│                                             │
│  All running on P2V3 tier ←── Scale Up      │
└─────────────────────────────────────────────┘

Slots share the same App Service Plan resources but allow: - Testing before production - Zero-downtime deployments - Easy rollback

25.5 Decision Matrix: When to Use Each?

25.5.1 Use Scale Up When:

✅ GOOD FOR:

┌─────────────────────────────────┐
│ • Single-threaded apps          │
│ • Memory-intensive operations   │
│ • Apps with state/sessions      │
│ • Quick fix for performance     │
│ • Development/testing           │
│ • Small to medium traffic       │
└─────────────────────────────────┘

Example: Your CT stroke model needs more RAM
to load larger deep learning models
→ Scale Up from 8GB to 16GB

25.5.2 Use Scale Out When:

✅ GOOD FOR:

┌─────────────────────────────────┐
│ • Stateless applications        │
│ • High availability required    │
│ • Variable traffic patterns     │
│ • Large number of users         │
│ • Geographic distribution       │
│ • Cost optimization             │
└─────────────────────────────────┘

Example: During peak hours (8am-12pm) at hospital,
CT scan requests increase 10x
→ Scale Out from 2 to 10 instances automatically

25.6 Radiology AI Real-World Scenario

25.6.1 Development Phase

┌────────────────────────────────┐
│  Your MacBook (Local)          │
│  • M3 Pro (11 cores, 36GB)     │
│  • 1 instance                  │
│  • Testing models              │
└────────────────────────────────┘

Strategy: Nothing - sufficient for dev

25.6.2 Initial Production

┌────────────────────────────────┐
│  Azure App Service             │
│  • Tier: P1V3 (2 cores, 8GB)   │ ← Scaled Up from B1
│  • 1 instance                  │
│  • Handles 10 scans/hour       │
└────────────────────────────────┘

Usage: Low volume testing
Strategy: Scale Up to handle production workload

25.6.3 Growing Usage

Problem: During rounds (8am-10am), 
         50 CT scans requested simultaneously!

Solution: Scale Out!

┌──────────────────────────────────────────┐
│  Azure App Service Plan                  │
│  • Tier: P1V3 (2 cores, 8GB)             │ ← Same tier
│  • 5 instances ←────────────────────────── Scaled Out!
│  • Auto-scale rule:                      │
│    - If CPU > 70% → add 1 instance       │
│    - If CPU < 30% → remove 1 instance    │
│    - Min: 2, Max: 10                     │
└──────────────────────────────────────────┘

         Load Balancer
              │
    ┌─────────┼─────────┬─────────┬─────────┐
    ▼         ▼         ▼         ▼         ▼
┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐
│Inst 1│ │Inst 2│ │Inst 3│ │Inst 4│ │Inst 5│
└──────┘ └──────┘ └──────┘ └──────┘ └──────┘

25.6.4 Peak Load + Complex Model

Problem: New model requires 16GB RAM
         AND peak load still high

Solution: Scale Up + Scale Out!

┌──────────────────────────────────────────┐
│  Azure App Service Plan                  │
│  • Tier: P2V3 (4 cores, 16GB) ←───────── Scaled Up!
│  • 8 instances ←────────────────────────── Scaled Out!
│                                          │
│  Cost: $234/month × 8 = $1,872/month     │
└──────────────────────────────────────────┘

Best of both worlds!

25.7 Azure SQL Database: Same Concepts

From your diagram, the SQL Database also supports both:

25.7.1 Scale Up (Change Service Tier)

DTU-based:
┌──────────┐
│ Basic    │  5 DTUs, 2GB
│ Standard │  10-3000 DTUs, 250GB
│ Premium  │  125-4000 DTUs, 1TB
└──────────┘

vCore-based:
┌──────────┐
│ 2 vCores │  2 cores, 10.2GB
│ 4 vCores │  4 cores, 20.4GB
│ 8 vCores │  8 cores, 40.8GB
└──────────┘

25.7.2 Scale Out (Read Replicas)

Primary (Read/Write):
┌─────────────────┐
│   SQL Primary   │
│   (Write)       │
└────────┬────────┘
         │
         │ Replication
    ┌────┼────┬────────┐
    ▼    ▼    ▼        ▼
┌──────┐ ┌──────┐ ┌──────┐
│Repli │ │Repli │ │Repli │
│ca 1  │ │ca 2  │ │ca 3  │
│(Read)│ │(Read)│ │(Read)│
└──────┘ └──────┘ └──────┘

Distribute read queries across replicas

25.8 Key Takeaways

25.8.1 Quick Reference

SCALE UP:                    SCALE OUT:
┌──────────┐                ┌───┐┌───┐┌───┐
│          │                │   ││   ││   │
│  Bigger  │                └───┘└───┘└───┘
│  Machine │                  More Machines
│          │
└──────────┘

• Limited ceiling            • Unlimited growth
• Simple setup              • Requires load balancer
• Good for stateful         • Best for stateless
• Single point of failure   • High availability
• Quick but expensive       • Cost effective at scale

25.8.2 Combined Strategy (Most Common)

Best Practice for Production:

1. Scale Up to appropriate baseline tier
2. Scale Out for traffic variations
3. Use auto-scaling for cost optimization

Example:
┌────────────────────────────────┐
│  P2V3 tier ←── Scale Up        │
│  2-10 instances ←── Scale Out  │
│  Auto-scale: CPU-based         │
└────────────────────────────────┘

25.8.3 For Your Radiology AI Projects

Recommendation:

Development:
• Local Mac (no scaling needed)

Staging:
• P1V3 tier, 1 instance
• Test under realistic conditions

Production:
• P2V3 tier (16GB for ML models) ←── Scale Up
• 2-8 instances (auto-scale) ←────── Scale Out
• Based on daily patterns:
  - 8am-12pm: High (8 instances)
  - 1pm-5pm: Medium (4 instances)
  - 6pm-7am: Low (2 instances)

25.9 Summary

Scale Up = Vertical = Bigger machine (more CPU/RAM/disk)
Scale Out = Horizontal = More machines (additional instances)

In Azure:

  • Scale Up: Change pricing tier (B1 → P2V3)
  • Scale Out: Add instances (1 → 10 instances)

Best approach: Use both strategically!