Warmup solves 98% of AWS Lambda cold starts while 100x cheaper than provisioned concurrency
Lambda cold starts are annoying, but can be solved easily with warmup or provisioned concurrency. I've tried both and found warmup is 100x cheaper (while still 98% effective) for my application
TLDR:
Lambda cold starts for a simple NodeJS function is benchmarked at ~200ms.
However, the bigger the application, the slower the initial boot, cold start with more AWS SDK and library imports can be between ~400-600ms.
If you use APM tools in Lambda extensions (such as NewRelic, Datadog or AWS Distro for OpenTelemetry Lambda), cold starts can be between ~800ms-1.6s.
If your Lambda function is invoked frequently or sensitive to cold starts, we can solve cold starts using Warmup or Provisioned concurrency.
I used AWS Lambda, EventBridge and AWS CDK to create a simple warmup function which invokes a specific Lambda function every 5 minutes at 10 concurrency (Github repo).
I monitored warmup efficiency using NewRelic custom tracing attributes and found 98% of cold starts are warmup calls, significantly reduces cold start impacts in our application.
Warmup is able to keep concurrent execution at 10 most of the time, and warmup latency is <100ms for 10 concurrency invocations.
Warmup cost estimations are 100x cheaper than provisioned concurrency at same level of concurrency.
I haven’t been able to measure effeciency at high warmup concurrency rate (e.g: 100, 500), however, I believe it is still much more cost-effective than provisioned concurrency.
If you still want to analyze your Lambda cold starts, you can use Datadog’s built-in Serverless Tracing tool for profiling, artillery for load testing, and analyze bundle with webpack-bundle-analyzer, source-map-explorer or esbuild bundle size analyzer.
Lambda Cold Start & Concurrency
Before jumping into the details, let’s first be on the same page by understanding Lambda function scaling and concurrency behavior. One picture is worth 1000 words.
When your function receives a new request, one of two things can happen:
If a pre-initialized execution environment instance is available, Lambda uses it to process the request.
Otherwise, Lambda creates a new execution environment instance to process the request.
In this diagram, Lambda was able to serve 10 requests by creating 6 environment instances (A-F). So in the end, there’s only 6 cold starts (Init phase).
There’s no official hard numbers on how long a Lambda instance is kept “warm”, but it is generally believe to stay warm around 5-15 minutes or even longer after first invocation.
Cold starts benchmarks
AWS Lambda cold starts for a simple NodeJS function is benchmarked at ~200ms. However, the bigger the application the slower the initial boot will be.
Even if you follow best practices when bundling AWS SDK (import individual clients, include it in your bundle, or lazy loading some clients), cold start can still be in the range of ~400-600ms.
Using additional APM tools such as NewRelic, Datadog or AWS Distro for OpenTelemetry Lambda (in Lambda extensions) might increase cold starts further to ~800ms-1.6s.
Trying to optimize cold starts can be very challenging, so if your Lambda is NOT sensitive to cold starts (i.e. it’s not creating any problems), you might be better off to leave it as is.
So let’s not worry too much about cold start, and focus on building more with Lambda instead.
But in case your Lambda function is invoked frequently or sensitive to cold starts (such as my Lambda integrations with a public API Gateway), we can still solve cold start problems using 2 simple methods: Warmup and Provisioned concurrency.
Solving Lambda Cold Starts
Warmup functions
If you use Serverless Framework, there’s a plugin called Serverless Warmup Plugin which allows you to specify warmup concurrency, schedule for a specific lambda function.
A simple implementation of Warmup Function using AWS CDK can be found at this Github repository. It uses AWS Lambda and EventBridge Scheduled Rule to implement warmup logics, as described in the diagram below.
It might also be necessary to invoke warmup functions after every Lambda deployment to avoid time gap between next scheduled event and new deployment.
To invoke warmup function at deployment time, you might need to use AWS CLI in your CI/CD pipeline, since AWS CDK does not support asynchronous workflows.
Provisioned concurrency
AWS provides Lambda Provisioned concurrency, which can be configured for Lambda using Aliases/Versions. However, provisioned concurrency is priced for every GB-seconds it is configured (similar to DynamoDB provisioned capacity), which is very diffirent from how warmup function works.
For comparison, warmup only invokes specified Lambda function for a few seconds every 5 minutes, and thus is only billed for the GB-seconds when the warmup & Lambda invocations was active.
For a NodeJS Lambda with 512MB memory, pricing for 10 provisioned concurrency is ~$69.63 per month, as shown in the screen shot above.
Although AWS Auto Scaling supports scheduling Lambda provisioned concurrency for a specific time frame, it is still priced at every GB-seconds, and does not help frequently invoked Lambda functions, which has stable concurrent invocations almost all the time.
Estimating required concurrency
We can use CloudWatch’s built-in ConcurrentExecutions metrics to determine warmup/provisioned concurrency for a Lambda function. For reference, AWS also provides guideline to accurately estimate required provisioned concurrency.
For my application, I use CloudWatch maximum concurrent execution/5 minutes metrics and see that it ranges from ~4.5-7, so I decided to set warmup concurrency to 10 for my main Lambda function.
Monitor Warmup Effectiveness
An interesting problem is how can we measure the efficiency of our warmup solution. We need to be able to answer the following questions:
How many of our invocations are actual cold starts and not warmup calls?
Is warmup concurrency matching with actual Lambda concurrent invocations?
What is the latency of warmup calls when invoking Lambda concurrently?
Using NewRelic to monitor warmup efficiency
We use NewRelic custom tracing attributes to tag warmup calls and later use NRQL (NewRelic Query Language) to build dashboards monitoring warmup efficiency.
We found that after deploying warmup stack at 10 concurrency, 98% of cold starts are warm up calls, and actual cold start reduces to 2%.
With warmup scheduled at 5 minutes rate, 93% of warmup calls are on warm Lambda instances, and only 6.7% starts a new Lambda instance.
Using CloudWatch to monitor concurrency
On CloudWatch, after warmup deployment, we found maximum concurrent execution metrics per 5 minutes to be consistently around 10, which means we should have 10 active Lambda instances most of the time.
Measuring exact number of active Lambda instances is difficult, so if you want a guaranteed number of active instances, you might want to use provisioned concurrency instead.
Using X-Ray tracing to profile warmup latency
Using X-Ray tracing for warmup function, we found invocation latency at 10 concurrency to be <100ms (time elapsed between first and last invocations).
Although this is good enough for our use cases, our target function exits early when there’s a
prewarm
flag in Lambda event and can finish in <100ms, so you might want to wait at least 100ms to exit in warm calls. However, this might increase the chances of blocking actual calls to our Lambda function, so you should decide on the balance between latency and wait duration.
Cost comparison
Warmup costs
We found most of warmup costs are for invocations of target Lambda function, which can cost around $0.7 at 10 concurrency, and $6 for 100 concurrency.
We use 800ms for estimation, but actual billable duration might be a lot less in your use cases, since we exit early when there’s a
prewarm
flag in Lambda event.
Provisioned concurrency costs
Provisioned concurrency costs are fixed and easier to estimate, at $69.6 for 10 and $696.3 for 100 concurrency, respectively.
We found at both 10 and 100 concurrency, provisioned concurrency are ~100x more expensive than warmup at same concurrency level.
Limitations
I haven’t been able to measure effeciency at high warmup concurrency rate (e.g: 100, 500). At this level, I think warmup call latencies can be > 100ms, and could possibly impact on actual number of concurrent invocations (because some Lambda instances can be free and able to execute other warm up calls).
However, I believe even with lower effiency rate at high concurrency level, warmup functions are still much more cost effective, and since warmup calls are only billed per executions, it fits Serverless model much better than provisioned concurrency.
Closing thoughts
If you’ve already followed best practices when bundling AWS CDK, your application should be fine and don’t need to optimize cold start time further.
However, as a software developer, you’ll probably still want to optimize, benchmark and run performance profiling of your Lambda cold starts.
You can use tools such as artillery to load test your Lambda using function URLs and get lots of cold starts quickly (be careful to set very short durations to avoid unnecessary costs invoking already warm instances, since you only want cold starts).
You can profile your Lambda cold start with Datadog’s built-in Serverless Tracing tool, with a nice flamegraph. Datadog offers a free 14-day trial, which should be more than enough for you to profile your coldstart stack trace (after that, it’s $5/function/month).
To analyze your esbuild/webpack bundle, use tools such as webpack-bundle-analyzer, source-map-explorer or esbuild bundle size analyzer.
I’ve tried it and had a very hard time trying to optimize my cold start time further, and learned a simple fact that “the bigger the application the slower the initial boot will be” (which is a very neat statement from fastify serverless guide).
So I guess my closing note is do nothing (until it actually starts causing problems). Otherwise use a simple warmup function or provisioned concurrency, instead of trying to bring your Lambda cold starts back to ~200ms (which does not bring much benefits like it sounds).