A few months ago, a thread went viral in the AWS community: "We Optimized Lambda Cold Starts. Our Bill Went Up 3x."
The engineer had done everything right. They identified a user-facing Lambda function with cold start latency spikes. They enabled provisioned concurrency to eliminate cold starts. Users stopped complaining. The engineering team celebrated.
Then the invoice arrived.
This is the central irony of Lambda cold start optimization: the most effective solution (provisioned concurrency) is also expensive enough to make a thoughtful engineer pause and ask whether they've actually solved the right problem. This guide is about making that decision correctly — with real numbers, not vibes.
What Cold Starts Actually Are
A Lambda cold start happens when AWS needs to spin up a new execution environment for your function. The sequence looks like this:
- AWS allocates a micro-VM (using Firecracker)
- The runtime initializes (Python/Node.js/Java/etc.)
- Your deployment package is downloaded and unpacked
- Your initialization code runs (importing libraries, establishing DB connections, loading models)
- Your handler function finally executes
Steps 1-4 are the cold start. Step 5 is what you're actually paying for.
For a warm invocation, the execution environment already exists and AWS skips directly to step 5. The difference in latency is typically:
- Node.js/Python cold start: 200-600ms
- Java cold start: 2-15 seconds (JVM initialization is brutal)
- Functions with large dependencies: 1-5+ seconds
- Warm invocations: 1-50ms
The 500ms number you see cited everywhere is a median for well-optimized, lightweight functions. Real-world cold starts vary wildly based on runtime, package size, and initialization code.
When Cold Starts Actually Matter
Before spending a dollar on cold start optimization, answer this question honestly: Do my users actually notice 500ms?
The answer depends entirely on context. Here's a framework:
Cold Starts Are a Real Problem When:
Real-time user interactions — An API endpoint that users wait on. A checkout flow. A search function. Anything where a user is staring at a spinner and expecting sub-second response. A 500ms cold start on top of your normal latency budget is noticeable. A 5-second Java cold start is a UX disaster.
Latency-sensitive integrations — Functions in a synchronous chain where each step adds to total latency. If five Lambda functions call each other synchronously and each can cold start, the tail latency becomes deeply unpredictable.
Time-sensitive processing — Functions handling real-time payment authorization, fraud detection, or security alerts where latency directly maps to business outcomes.
Cold Starts Are NOT a Real Problem When:
Asynchronous processing — Event-driven functions processing SQS queues, S3 events, or DynamoDB streams. Users aren't waiting. A cold start means a batch starts processing 500ms later. Nobody cares.
Background jobs — Scheduled functions running overnight ETL, reporting, or cleanup. A cold start adds milliseconds to a job that takes minutes.
Low-traffic functions — If your function invokes 100 times per day, cold starts happen rarely and affect a tiny fraction of your traffic. The optimization cost exceeds the problem cost by definition.
Dev/staging environments — Provisioned concurrency in environments where developers and tests generate the traffic is pure waste.
Get an honest traffic profile before doing anything else. Add INIT_DURATION to your CloudWatch Logs Insights queries and actually measure what percentage of invocations have cold starts and what the P50/P95/P99 cold start latency is for your specific function.
-- CloudWatch Logs Insights query for cold start rate
filter @type = "REPORT"
| stats count(*) as total,
count(initDuration) as cold_starts,
avg(initDuration) as avg_cold_start_ms,
percentile(initDuration, 95) as p95_cold_start_ms,
cold_starts / total * 100 as cold_start_rate_pct
If your cold start rate is below 5% and P95 cold start latency is under 500ms, you likely have a non-problem that doesn't justify provisioned concurrency costs.
Provisioned Concurrency: The Math Nobody Shows You
Provisioned concurrency keeps a specified number of Lambda execution environments initialized and ready to handle requests instantly. It genuinely eliminates cold starts. It also costs money whether your function is handling requests or not.
Provisioned Concurrency Pricing (us-east-1):
- $0.000004646 per GB-second of provisioned concurrency
- For a 512MB function: $0.000002323 per second of provisioned concurrency
That sounds tiny. Let's scale it.
1M Invocations Scenario: The Actual Math
Without Provisioned Concurrency:
Assume a function with the following profile:
- 1M invocations/month
- Average duration: 200ms
- Memory: 512MB
- 3% cold start rate (30,000 cold starts)
- P95 cold start: 400ms
Monthly compute: 1,000,000 × 0.2s × 0.5GB = 100,000 GB-seconds At $0.0000166667/GB-second: $1.67 Plus request charges: 1M × $0.0000002 = $0.20
Total: ~$1.87/month
Cold start impact: 30,000 invocations see extra 400ms latency. For a user-facing function with 10,000 daily users, roughly 300 users per day experience a slow load. Annoying, but quantifiable.
With Provisioned Concurrency (5 instances):
Most teams start with 5-10 provisioned instances to cover typical concurrency.
Provisioned concurrency cost: 5 instances × 512MB × 720hrs/month × 3600 sec/hr = 6,652,800 GB-seconds At $0.000004646/GB-second: $30.91/month
Plus the regular compute cost (provisioned invocations are also billed normally): $1.87/month
Total: ~$32.78/month
That's a 17x increase to eliminate cold starts affecting 1% of users.
When the Math Changes
The math does work — in specific circumstances:
Higher traffic: At 50M invocations/month with sustained concurrency, the compute cost is ~$94/month. If you need 5 provisioned instances anyway to handle baseline concurrency, the incremental cost of provisioned vs. not is smaller relative to total spend.
Higher base cost functions: A 3GB, 10-second function at 1M invocations costs ~$50/month baseline. Adding $30/month of provisioned concurrency to eliminate user-visible 15-second Java cold starts is a reasonable trade.
Revenue-critical paths: If each second of cold start latency costs you measurable conversion — say you're an e-commerce checkout function — the math is different. Calculate the revenue impact of cold starts, compare to provisioned concurrency cost, and optimize from that basis.
The key insight: provisioned concurrency is a per-instance-per-second charge. It's a flat cost floor. For low-traffic functions, this floor dominates. For high-traffic functions with sustained concurrency, the incremental cost is much smaller.
SnapStart: Java's Escape Hatch
Java Lambda cold starts are notoriously painful. A Spring Boot Lambda can cold start in 8-15 seconds — not because Lambda is slow, but because the JVM initialization, Spring context loading, and dependency injection are doing substantial work.
Lambda SnapStart (introduced 2022, expanded 2023) addresses this by:
- Running your initialization code once when you publish a function version
- Taking a snapshot of the initialized execution environment (Firecracker microVM state)
- Restoring from that snapshot on subsequent cold starts instead of re-initializing
The result: Java cold starts drop from 8-15 seconds to 200-500ms — roughly on par with Python/Node.js.
The catch: SnapStart requires careful handling of initialization state. If your init code:
- Generates random seeds or cryptographic material
- Establishes connections with unique identifiers
- Loads time-sensitive data
...then restoring from a snapshot means multiple instances sharing the same initial state. AWS provides RuntimeHook interface to handle this — you implement beforeCheckpoint() and afterRestore() to re-initialize state that shouldn't be shared.
SnapStart is free. It's available for Java 11, 17, and 21 runtimes with Lambda function versions. If you're running Java Lambdas and not using SnapStart, you're leaving significant performance improvement on the table at zero cost.
Keep-Warm Strategies: A Mixed Bag
Before provisioned concurrency existed, engineers invented "keep-warm" patterns — scheduled pings that invoke Lambda functions frequently enough to keep execution environments alive.
How long does AWS keep Lambda environments warm? AWS doesn't publish this, but empirically it's roughly 5-15 minutes of inactivity before recycling. A common keep-warm pattern is a CloudWatch Events rule pinging the function every 5 minutes.
The Problems With Keep-Warm
It scales poorly. You can keep one instance warm with a ping. You can't keep 50 instances warm — you'd need 50 simultaneous pings. For functions that scale to handle concurrent load, keep-warm only helps the first instance.
It costs money. Invocation charges are small but real. At $0.0000002/invocation, pinging 12 times/hour × 720 hours/month = 8,640 invocations = $0.002. Trivial. But you're also paying for the runtime — and if your keep-warm invocations do real initialization work (DB connections, etc.), you're paying for that compute.
It's unreliable. AWS can still recycle the environment if it's under resource pressure. Keep-warm reduces cold start frequency; it doesn't eliminate it.
It doesn't help with the actual scaling problem. When traffic spikes above 1 concurrent request, you'll still get cold starts for the additional instances.
Keep-warm is worth using when:
- Your function has unpredictable but sporadic user-facing traffic
- You want to reduce cold starts without paying for provisioned concurrency
- You accept it's a partial solution, not a complete one
Keep-warm is NOT worth using when:
- You need guaranteed cold start elimination (use provisioned concurrency instead)
- Your function handles concurrent load (keep-warm can't address multiple instances)
- You're trying to solve a Java cold start problem (SnapStart is better)
Code Optimization: The Free Wins First
Before touching billing levers, fix the function itself. Many cold start problems are self-inflicted:
Reduce package size. Lambda downloads your deployment package on every cold start. A 50MB package cold starts faster than a 500MB package. Use Lambda layers for large dependencies shared across functions. Use bundlers (esbuild, webpack) to tree-shake unused code. Check your node_modules — most projects include 10x more than they need.
# Check your package size breakdown
zip -r function.zip . -x "*.git*"
unzip -l function.zip | sort -k1 -rn | head -20
Move initialization outside the handler. Any code outside your handler function runs once per cold start, then is reused across warm invocations. Database connections, SDK clients, configuration loading — all of these should be initialized outside the handler.
# BAD: runs on every invocation
def handler(event, context):
db = connect_to_database() # cold start + every invocation
return process(db, event)
# GOOD: runs once on cold start, reused on warm invocations
db = connect_to_database() # cold start only
def handler(event, context):
return process(db, event)
Lazy initialization for expensive dependencies. If you import a large library that's only needed in some code paths, import it inside the code path rather than at module load time. This trades initialization cost for first-use cost on that specific path.
Right-size memory. More memory = more CPU in Lambda. Underprovisioned functions run initialization code slower. The cost of 512MB vs 256MB is double, but if initialization completes 3x faster, the total cold start time (and cold start compute cost) may be lower. Use AWS Lambda Power Tuning (open source) to find the optimal memory/performance/cost point.
Architecture Alternatives: When Lambda Isn't the Right Tool
Sometimes the correct answer to "how do I solve Lambda cold starts?" is "don't use Lambda for this."
AWS Fargate
Fargate runs containers in ECS without managing EC2 instances. There are no cold starts in the Lambda sense — containers start once when the service starts and stay running. You pay per vCPU and memory per second.
When Fargate beats Lambda: Long-running processes, workloads that need consistent latency guarantees, large applications with significant initialization overhead, or workloads that are expensive to start/stop frequently.
When Lambda beats Fargate: True serverless scaling to zero (Fargate still costs when idle), extremely variable traffic patterns, simple event-driven functions, or true zero-maintenance operations.
Cost comparison for a steady-state API:
- Lambda 512MB, 200ms avg, 10M req/month: ~$19/month
- Fargate 0.25 vCPU / 0.5GB, always-on: ~$10/month
At sustained load, Fargate often wins on cost. Lambda wins at variable/bursty traffic.
App Runner
AWS App Runner is the highest-level serverless container option. You give it a container image, it handles everything else — load balancing, auto-scaling, TLS, deployments. App Runner auto-scales to zero when there's no traffic (billing pauses) and scales up on demand.
The cold start situation with App Runner: When scaling from zero, App Runner cold starts are 10-30 seconds (container startup, not Lambda init). This is worse than Lambda for cold starts, better than Lambda for sustained load.
When App Runner wins: You want container flexibility without ECS/EKS complexity, you have a web service that can tolerate 10-30s startup from zero, and you want AWS to manage everything.
Lambda with Function URLs and Response Streaming
For web-facing APIs, Lambda Function URLs with response streaming can make cold starts feel faster to users even when they happen — because users see bytes arriving rather than staring at a blank screen. This doesn't reduce cold start latency, but it can improve perceived performance significantly.
The Decision Framework
Here's how to actually decide what to do about your Lambda cold starts:
1. Is this function user-facing with real-time latency requirements?
NO → Stop here. Cold starts don't matter. Don't spend money.
YES → Continue.
2. What's your actual cold start rate and latency?
< 5% rate AND < 500ms P95 → Minor problem. Start with code optimization (free).
> 5% rate OR > 1s P95 → Real problem worth solving.
3. What runtime?
Java → Enable SnapStart immediately (free). Reassess after.
Python/Node → Continue.
4. What's your traffic pattern?
Sustained, predictable → Provisioned concurrency math likely works.
Bursty/variable → Provisioned concurrency math probably doesn't work.
5. Do the provisioned concurrency math:
Monthly PC cost vs. revenue impact of cold starts?
PC cost > 10% of function's value → Consider architecture alternatives.
PC cost < 10% of function's value → Enable it.
6. Architecture alternative:
Is this a long-running service that should be Fargate/App Runner instead?
Putting It Together
Cold starts are real. For the right workloads, they're worth fixing. But the most important question isn't "how do I eliminate cold starts?" — it's "do my users actually notice this, and is the fix worth the cost?"
Run the measurement. Do the math. Enable SnapStart for Java for free. Optimize your package size and initialization code for free. Then, and only then, if the numbers justify it, consider provisioned concurrency.
The team that went viral wasn't wrong to optimize. They were wrong to optimize before checking whether the cure was worse than the disease.
Provisioned concurrency is a tool. Like all tools, it works best when you know exactly what problem you're solving and have the numbers to justify the cost. The serverless promise was "pay only for what you use." Provisioned concurrency is "pay whether you use it or not."
Sometimes that trade makes sense. Make sure you're making it deliberately.
Quick Reference: Cost Levers Ranked by Cost
| Solution | Monthly Cost | Effectiveness | Best For |
|---|---|---|---|
| SnapStart (Java) | Free | Eliminates JVM cold starts | Java functions |
| Package size reduction | Free | 10-40% cold start reduction | All functions |
| Initialization optimization | Free | 20-60% cold start reduction | Functions with init code |
| Keep-warm (EventBridge) | ~$0.01 | Reduces cold start frequency | Low-traffic, single-instance |
| Provisioned concurrency | $10-100+/function/month | Eliminates cold starts | High-traffic, user-facing |
| Migrate to Fargate | Varies ($5-100+/service) | Eliminates Lambda cold starts | Sustained load services |
The cheapest solutions should always come first. Don't jump to provisioned concurrency without exhausting the free options.
Running Lambda at scale and want to understand your full serverless cost profile? The hidden costs often go beyond cold starts — idle execution environments, over-provisioned memory, and suboptimal timeout settings all add up. Start with a cost audit before reaching for any optimization levers.