Optimizing Infrastructure Latency

A Journey with Lambdas and Opensearch.

Apr 01, 2023

Hello Everyone, I am writing a brief article on my journey in optimising an infrastructure.

A little bit about the overall architecture:

So, we have some data in an index and we need to serve it up on a silver platter, but with a side lightning-fast search capabilities. Ideally, we'd have a public API backed up by a super-charged lambda to do the heavy lifting. However, we don't live in an ideal world. We have to deal with pesky issues like authorization, making sure that only the worthy and righteous can access our API Gateway. And to make matters worse, our Opensearch instance is in a different account. So, we can't just go for the simple solution and call it a day. No, no, no, we must embrace the challenges and overcome them.

The Current issue with this architecture:

There are two Lambdas and two API Gateways that need to talk to each other before and after connecting with Opensearch. When we make a request to the API Gateway, it goes to one Lambda, then to another Lambda, and finally to Opensearch before coming all the way back - it's like a game of telephone, but with more frustration and fewer giggles. And as much as we love spending our days twiddling our thumbs, we'd much rather find a way to streamline this process. So, let's dive into some solutions to speed things up

Before we dive into the nitty-gritty details of lambdas, let's talk about cold starts. Most of the lag time associated with a Lambda can be traced back to this chilly phenomenon. Essentially, when a request comes through to the lambda, it needs to go through a few steps:

1. Download the code

2. Start a new execution environment

3. Execute initialisation code

4. Execute handler code

If the lambda hasn't run in a while, it'll have to do all four of these steps again. But if you hit it up right after it finishes, it'll be ready to rock and roll with just step 4. Unfortunately, after an indeterminate amount of time, the lambda will go back into hibernation and start the whole process over again.

Provisioned Concurrency:

It's important to understand the concept of concurrency in this scenario, which refers to the ability of a Lambda to perform multiple tasks at the same time. Provisioned concurrency means that Lambda is always ready to perform a certain number of tasks simultaneously, while reserved concurrency is the maximum limit of concurrent tasks that can be performed.

The approach to optimizing Lambda performance is simple: set the provisioned concurrency to a value greater than 1 so that the Lambda is always warm and can execute the handler code immediately without going through the initialization steps 1-3. In my case, a provisioned concurrency of 1 worked well since the Lambda wasn't receiving high traffic yet. However, there are some drawbacks to this approach. Firstly, some architects consider this an anti-pattern as Lambdas are designed to run on demand instead of being kept warm. Secondly, there's the cost factor to consider.

Memory Utilisation:

Ah, memory utilization - the elusive metric that's harder to find than a needle in a haystack. By default, lambdas operate at 128MB, but they can be juiced up to 10,240 MB. This can help to reduce the lambda's latency, but unlike ECS, there are no defined metrics to give you insight into how much memory your lambda is actually using. It's like trying to figure out how much ice cream you ate without a scale or any remaining evidence. But fear not, Cloudwatch Insights has got you covered with a handy dandy query to retrieve the memory usage metrics:

filter @type = "REPORT"
   | stats max(@memorySize / 1000 / 1000) as provisonedMemoryMB,
      min(@maxMemoryUsed / 1000 / 1000) as smallestMemoryRequestMB,
      avg(@maxMemoryUsed / 1000 / 1000) as avgMemoryUsedMB,
      max(@maxMemoryUsed / 1000 / 1000) as maxMemoryUsedMB,
      provisonedMemoryMB - maxMemoryUsedMB as overProvisionedMB

The output for the above query will look like this:

With this query, you can see that our lambda was operating at 40MB on average, despite being provisioned with 128MB. So, I gave it a bump to 256MB and was pleasantly surprised with a 1.5x performance boost. Of course, this comes at a cost - literally. The cost of your lambda will basically double up, but hey, if it's taking less time to execute, you're not really spending more, right?

Code Optimisations:

Now comes the part where we throw away the rulebook and dive into the specifics. This is where general guidance just won't cut it, and it's all about understanding what actions are essential to be performed within the handler compared to before the handler. If you've configured preserved concurrency, your lambda is already warmed up and ready to execute everything within the handler. However, in our code, we found that some actions were repetitive and could have been done before the actual invocation. Examples of such actions could include:

Initialising variables
Establishing connections
Loading modules

In our case, we were retrieving secrets from the secret manager and establishing HTTP client connections within the handler. But by moving these actions outside the handler and making them part of the initialization code, we were able to achieve even better lambda latency.

Of course, with every solution comes potential drawbacks. If a lambda has been provisioned with a credential and the underlying credential is updated between initialization and actual execution, issues could arise. That's why it's always best practice to implement mechanisms to reload credentials at regular intervals or even to include retries within the handler to handle these situations gracefully. So, when it comes to improving lambda latency, it's all about finding the right balance between optimizing the handler and the initialization code.

Other things to note:

1. When provisioning concurrency is implemented it is important to give your Lambda an Alias.

2. To ensure the API gateway is calling the preserved instance make sure LATEST is not visible in the logs of the lambda.

3. AWS suggests that about 1% of the actual traffic in a production-like environment will face cold starts.

4. With preserved concurrency, Lambdas will refresh the execution environment every once in a while. Although we don’t exactly know when but in my case this was happening every 2 hours and 10 minutes approximately

These are just a few of the million (okay, maybe not a million, but a lot) ways to optimize your Lambdas. So go forth and make your serverless dreams come true!

Sanee’s Substack

Discussion about this post