Lessons I learnt while building Slack apps

Few weeks back, I built a slack bot for the team which gives sassy replies when someone says “/hi”. The idea came from my colleague when he talked about an email bot when cc’ed, randomly updates the mail thread with funny things that his manager usually says. I took this idea and extended to Slack Bots, where everybody gets roasted.

It also became a guessing game where every time the bot replied, people went guessing who could have said that. We also hid few easter eggs like — you can say /hi @username and the bot would respond with one of the random text that we coded for that user alone. There was also /hi standup — which says stuff people generally say in standups — and many more.

Few examples of what the replies look like

We wanted to launch this as soon as possible, so I wrote a simple spring boot application and used ngrok to expose as a public url, which inturn is configured for this bot at slack app. Here is the tutorial which explains how to do that.

Launch was a huge success! I thought it would go away but people started suggesting new replies to be mapped for themselves/other coworkers. I figured it is not going away anytime soon. I had to announce downtime whenever I am commuting and fire up the app as soon as I got home so that they can continue to play with the bot. Also ngrok session expires every 8 hours or so, and there was a limit in max number of hits per minute.

It was time to move this app to production. Before I could move this to AWS, I added Slack authentication — steps explained here. Since this app is only intended for our team, authorization too. — to make sure only on certain channels you can even call this app.

Since I was reading a lot about Serverless Architecture that time, I decided to play around with AWS Lambdas. In spite of all warnings about using Java, I did exactly that to understand the pain points myself.

There are tens of blogs that details how to build a slack bot using api-gateway and lambda. So I won’t go into the step by step details of what you need to do in AWS console to build the app. Rather, the rest of the post is about my thought process whenever I was stuck in the process of building this app.

High level architecture

Slack app is configured to hit an api-gateway url for command /hi. slack app sends post request to this url

Api-gateway is configured to invoke a bot-lambda when the call is placed.

bot-lambda does the following:

  • Authenticates every call. It loads the slack app’s signing secret from AWS Secret Manager. You can also use AWS System Manager Parameter Store or simply put these secrets in environment variable of lambda itself — but make sure it is all encrypted.
  • The replies are mapped to each team member in a Json file stored at S3. After successful authorization, it sends a random response loaded from this file.

More about AWS Lambdas

The most important thing about slack bot is that it expects a response from the server in 3 seconds. If not, it displays “operation timeout”

There are two execution styles in AWS lambda

  • Request-response : which forces lambdas to execute synchronously and return a response to the caller, which is usually api-gateway
  • Event : where lambda is triggered because of an event — say an object getting updated at S3 triggers the lambda, which in-turn processes this file and sends an SNS notification, etc.

The slack bot integration is of type request-response. i.e. the slack API is expected to see some result within 3 seconds (else it says “operation timeout”)

How does AWS Lambda serve a request?

It needs an execution environment. It spins up container and initializes your code and execute it for the event passed.

What is coldstart?

This instantiation of the container in which our code is run, as well as some initialization of our code is called cold start. Once a container is instantiated, it can handle events without undergoing that same instantiation and initialization process. These warm invocations of the Lambda function are much faster.

This coldstart time is particularly high for Java — because the classes are to be loaded and interpreted by JVM. Now that AWS lambda is supporting custom runtime, there is lot of potential to improve this cold start delay if we move to GraalVM — which is where my next focus would be.

When does coldstart occur?

  • When there is no active container to serve the request — one such scenario is the first request and another is when the lambda hasn’t been invoked in a while.
  • When lambda is scaling out. i.e. there are too many concurrent request to lambda and it needs to handle them by instantiating more instances.
  • When lambda configuration changes

How bad is this coldstart?

bot-lambdaonly takes less than 100 milli-seconds (0.1 second) to execute — note that this is without any dependency injection frameworks like spring/dagger . At times the number is much less than 100 millis but that does not matter since AWS Lambda bills for the duration of each call rounded to nearest millisecond — Note that duration is not the only aspect for billing, it also depends on the memory you configure. More on that later.

With cold start, it takes close to 12–13 seconds (mainly because we are accessing other AWS services — like S3) — otherwise less than a 100ms! So cold start, when it occurs, is really bad.

Also, coldstart is bad for lambdas with VPC enabled because it adds extra 10 seconds — because VPC access requires Lambda to create ENIs (elastic network interface) to the target VPC. Although, this is improved using AWS HyperPlane as mentioned here

What have I done to reduce this coldstart?

  • I migrated to AWS sdk2 which supposedly takes less time to connect to other AWS services. I noticed that it improved only by 2 seconds.
  • Cold start delay is directly connected to the memory you configure for lambda. With greater memory configuration, the time taken to instantiate lambda reduces. But 3 seconds is very hard to achieve even with over provisioning memory to lambda. Plus memory also plays part in billing. Why would I allocated 2 GB for a lambda that does not even take 150 mb? Notice how smart lambda billing is. You cannot keep increasing the memory for lesser execution time. You cannot keep memory to bare minimum since that would increase the execution time. There are tools out there in market to let you tune that perfect balance.
  • I wanted to employ multi threading thinking that I can run S3 and secret manager calls in parallel but that was irrelevant since only the first aws service (regardless of s3/secretmanager) took time to acquire. i.e. There is no workload to parallelize here. The memory allocated to lambda is directly related to the CPU core. To unlock a vCPU of 2 core you need to allocate > 1.8GB. Once you return a response, the process is suspended. so there is no way you can return from lambda function with some custom reply and then continue with initialization.

So the next question we ask is, Can we live with this?

How often does this worse case scenario happen to us?

There is a hypothesis that a non-VPC lambda is kept around for 5 mins and VPC lambda for 15 mins. These stats are NOT from official docs but from people who test lambda functions with various memory settings to determine what this number could be. These hypothesis are used to generate a lambda-warmer plugin which people use to keep sending pings to the lambda function to keep it warm so that cold start does not happen too often. These plugins are also capable of placing concurrent request to lambda to keep certain number of instances warm. I bet these plugins are used by functions that are expected to execute synchronously and these methods are often practiced to limit the delay observed by end-user. It is certainly an anti-pattern in my eyes to use a warmer to keep lambda instances alive and active, when these lambdas are naturally expected to be transient. My 2 cents, as long as it is not from official docs, do not bet on these hypothesis — because AWS can change anything behind the scenes.

AWS Lambda itself has opened support for Provisioned concurrency — a promise to keep certain number of instances active to scale lambda executions to handle sudden burst of traffic. Like, if you know your business expects 10 times more traffic during certain time of the day, you can configure Lambda in such a way to keep 10 times more instances ready to serve during that peak time.

Our bot is just a play tool. It does not deserve provisioned concurrency. Plus the usage is random — It is down to group of ~12 people and depending on their mood to say hello to a bot for fun! I mean we are not even fun-loving people 😜So our worst case scenario is one user calling the bot and seeing an operation timeout and an hour or even a day later, another one calling the bot and seeing the same operation timeout. Eventually people are just gonna say, “Oh yeah that bot? it never works”

My problem with this setup is that operation timeout. It does make the bot look like it never works. So we need some custom message to tell that 'yes I received the request. I am very much in working condition. Just give me couple more seconds to respond to you.'

In technical terms, “send a custom response to the slack bot when it is a cold start”. There are various ways to do that:

  1. Slack request comes with a response_url -> which you can use to respond to users. As per the doc, These responses can be sent up to 5 times within 30 minutes of receiving the payload It is best practice to load S3 configuration once when the lambda is initialized rather than accessing for every invocation. You can have a static variable to determine whether you have already initialized this config - and thereby determining whether it is a cold start or not. If it is a cold start, you can first send an ack to this response_url and then proceed with loading the config from S3, etc. Mind that it could take more than 3 seconds for your control to even get there.
  2. So, what if it takes more than 3 seconds for the container to boot up and load the java code? Here is where we can use Integration timeout at api-gateway. I expect bot lambda to execute within 100 milliseconds except for cold start cases so I have set integration timeout at api-gateway to 1.5 seconds — i.e. if the lambda did not return a result within 1.5 seconds, then send Integration Timeout to the user. And at gateway responses, I am mapping this integration timeout to a 200 status with a Bot is waking up :sleepy: slack response. As a user, if I see this message instead of operation timeout, I would think that maybe I will get a response after it woke up. Let me try at it again.

Still ugly but better than an operation timeout

Set Integration request timeout in API gateway
Map 200 status code for Integration Timeout — since Slack app only expects 200 status within 3 seconds
Also, Map a response to the Integration timeout statuses
Bot response when there is cold start

Also, instead of using lambda proxy integration at Integration request, I defined the mapping myself just to see how it is mapped to the lambda event. Note that Slack authentication needs the raw data (and not the decoded request body) and few headers that are passed by the slack app.

##  See http://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-mapping-template-reference.html
#set($body = $input.json('$'))
#set($decodedBody = $util.urlDecode($body))
#set($headerParams = $input.params().get("header"))
{
"slack-data" : $decodedBody,
"raw-slack-data" : $body,
"headers" : {
#foreach($headerName in $headerParams.keySet())
"$headerName" : "$util.escapeJavaScript($headerParams.get($headerName))"
#if($foreach.hasNext),#end
#end
}
}

If your AWS account is using lambda for some critical applications, make sure you set the reserved concurrency for bot-lambda (mine is set to 5) so that a sudden breach in the number of users trying to access the slack bot is not causing any issues with mission critical lambdas configured in your account.

Closing words

I would not have learnt this much about lambdas if I just implemented it in Python 🤷‍♀️ I was also intrigued by how Serverless architecture is practiced in the industry and found that a lot of things I learnt by practice matched with the principles as well. I will add the books and blogs I read about Serverless architecture in the reference. And possibly blog more about the common usage patterns in the future. But to close, here is an excellent definition I found here about Serverless :

An architectural approach to software solutions that relies on small independent functions running on transient servers in an elastic runtime environment.

Serverless is based on small independent functions — referred to as Functions as a Service, or FaaS. Writing small bits of working code is essential to a successful serverless solution.

Second, these functions run on transient servers. That means the actual machines hosting the code are temporary instances. Unlike on-premises implementations where you buy and nurture your own physical hardware in your headend, serverless solutions rely on virtual machines that run in the cloud.

And that leads to the third element — an elastic runtime environment. These transient virtual machines are spun up when traffic grows and spun down when traffic diminishes. You only have the number of instances running that are needed to handle the current work. This elasticity means better targeting of resources, better tracking of operating costs, and an improved focus on the health of your overall solution.

Lastly, do read this comprehensive article on martinfowler too. You wouldn’t have to read another blog/watch another serverless conference video if you can read that one big article 😁

Edit

Updated on Dec 2020 — AWS Lambda now supports 1 millisecond billing granularity 👏. More details here

Not a blogger. I mostly move my well-written notes here.