We cut our average API response time by 30% when changing from Cloud Functions to Cloud Run
Sorry about the click-baity title, but it’s not an overstatement. We ran the numbers.
At the inception of the Unloc API, “Make it Work” (from Kent Beck’s famous quote) was the name of the game. We wanted to get the basics up and running without too much fuzz, and to keep it running with as little intervention as possible. So we chose to run it off the Google Cloud Platform’s Cloud Functions. There are several reasons we believe this was a fantastic choice, but I won’t get into that just yet.
As our tech team grew, and the company as a whole started to mature, we had a look at our codebase and thought: “Our codebase is becoming unwieldy”. This is when we started refactoring to Ports and Adapters (aka Hexagonal Architecture, which we highly recommend) — and I guess this is also when we moved from “Make it Work” to “Make it Right”. This was and is an interesting process, but I won’t get into that now either.
However, what I will get into is:
The problem with Cloud Functions
GCP Cloud functions are great for a lot of things. You give your code to Google, provide a tiny amount of config — and they more or less promise you that they can handle however many invocations you throw at them. Within reason, of course.
We still use Cloud Functions for a lot of stuff, but the problem with using Cloud Functions to serve an API is that each instance only processes a single request at a time, and starting a new instance takes a couple of seconds. Starting a new instance like this is known as a cold start. You can do warm-up calls to keep a handful of instances running, but that’s like adding another horse to the carriage when you should have used a car.
The cold starts were unproblematic during our early days as a startup. Today, businesses use Unloc to develop their own web solutions — so 4-second spikes in response time really disrupts the user experience.
There are two reasons why it’s going to be slow, almost no matter what:
- If you use an API like ours it’s very likely that you need data from more than one endpoint when first loading the webpage. You don’t want your users to wait, you do them all in parallel. This means that at least one of them is going to trigger a cold start, adding 4-5 seconds to the load time. Ugh.
- As the user interacts with the website and fires off API-calls, there is a non-negligible chance that one will hit a cold start. This will make the site feel a bit unresponsive all of a sudden. After all, there is only so much you can do with spinners.
A slow API feels bad to develop against, but more importantly, if our API is slow then all products using our API will be slow.
Why Cloud Run?
Doing HTTP requests towards a Cloud Functions API is like getting hot dogs from a hot dog stall that has to open every time someone comes over to buy a hot dog. You might get to buy from an open stall, or maybe you have to wait for the guy with the funny mustache to unlock the cabinets, turn on the sausage heater and the bun toaster, shake the ketchup bottle, open the coleslaw, unfold the sign and chase away the pigeons. No one wants to wait for all that.
Our goal at Unloc is to be able to serve the API-equivalent of hot dogs super fast, and to be able to do so at any time — everytime. So we ask, why not just keep the stall open?
There is a Google Cloud product called Cloud Run. Cloud Run is like a hot dog stall that… Nevermind, forget about the sausage metaphors. I guess most of you are developers anyways. Cloud Run is a fully managed platform for running highly scalable containerized applications, which sounds exactly like what we want.
Cloud Run and Cloud Functions are both similar and different. Cloud Run is like Cloud Functions in the sense that you give your code to Google, in this case as a pre-built Container instead of a Zip file — and Google then makes sure it runs and scales.
It’s not like Cloud Functions, and this is the important part, in the sense that it handles a whole bunch of requests at the same time,where you can set a minimum number of instances to be running. This means that we always have a hot dog st.. Cloud Run Instance open and ready to serve requests, and that more will be available on demand.
So how did we go about this transition?
The Unloc backend is written in Typescript, and before the migration our endpoints were all served by Express apps deployed to Cloud Functions. As with most other tasks of this scale we did the migration in several steps.
Step 1: Run the APIs in containers
Cloud Run requires each service to be a separate container. We already had the Express apps configured, so all we had to do was write the most basic Dockerfile (huge shout-out to whoever wrote the excellent dockerfile docs) running Node and start the correct Express app.
We use Firebase to configure and deploy our functions, in which the Firebase SDK accepts parameters like trigger (http) and listener (the Express app). So for the Cloud Run services, we had to add separate files that actually start the Express apps. You know, app.listen(port, () => etc... For flexibility, we start each Express app from scripts in our package.json file, called by CMD in the Dockerfile.
During this process we used Docker Compose to test locally, Docker to build images, and the gcloud cli to deploy to our development environment.
Step 2: Logging
We could have kept using our console.log statements and called it a day, but we’re not like that. No, we prefer our Yaks clean-shaven, preferably smelling of freshly cut Pine (another log-reference. I know, I’m quite clever).
We wanted to move from hard-to-search flat text logs to easier-to-search structured JSON logs (you can find a helpful guide on the difference here. Our first thought was “There has to be a library that does this” — and lo and behold, there was.
Long story short, we spent too much time on it. It has way more features than we needed, and we have to use console.log to keep executionId in the Cloud Functions logs anyways, so we dropped it.
When moving from single-request to multi-request, we have to keep track of what request logs what. To do this we used Node Async Hooks to store a trace-id, set using Express middleware at the start of each incoming request, which also lets us log useful data, like client id.
Very useful for debugging.
Step 3: Deploy
When the API was still in its earlier phase (during the “Make it Work”-phase mentioned earlier), we deployed from the command line using the Firebase CLI’s deploy command — which is fine when you don’t have a lot of functions to deploy. When you do have a lot of functions to deploy, you quickly exceed the deploy limit. We wrote a script to deploy in batches.
The gist of it is that we first use Object.getOwnPropertyNames() to get all the exported members of our index files (make sure to only export functions), then run firebase deploy --non-interactive --force --only followed by a list of function names from the current batch.
As our developer team grew, we moved away from running scripts in the CLI, to running pretty much the same scripts using Github Actions. We’d much rather press a button and forget about it — than run a script locally and wait for it to complete.
Deploying the Cloud Run services consists of three steps:
- Get the path to the Dockerfile for each service
- Build an image from that Dockerfile
- Deploy the image
1. Since we use Docker Compose for local development, this step is already pretty much done. We just have to parse docker-compose.yml and get the paths from there.
2. This can be done locally on the Github Actions instance, using docker build. No more drama there.
3. This turned out to be the hardest step. Not particularly hard, just the hardest of these three. We use the gcloud cli tool to deploy (specifically gcloud run deploy). This works like a charm when running from my local machine as I’m authenticated as my GCP user, but requires some fiddling to get to work on Github Actions. We created a dedicated Service Account for deployment, added the required roles, and stored the key using Github Secrets.
Now, onto the last step.
Step 4: Tinkering
No hot dog comes without a cost; the biggest cost of moving to Cloud Run is that we have to adjust concurrency, number of CPUs, and amount of memory, ourselves. To make sure these numbers were correct, we decided to do some load testing.
Ideally, we would have used JMeter or something similar, but we could not get it to run locally. So we wrote a small script to walk through most of our endpoints, with concurrent requests, ramp-up time, the works.
We tested higher concurrency, different number of CPUs, lower concurrency etc. However, in spite of changing memory from 512 MB to 2 GB we basically ended up with the default settings for all the services, and a note to increase the minimum number of active Cloud Run instances as our traffic grows. With the load testing and tweaking done, we set out to test the performance of both setups side-by-side. Was it going to be worth it?
Yes it was.
We did thousands of test runs with all kinds of different variables, and over this large set of data we saw that the average response time was about 30% lower for Cloud Run. 31.62%, to be exact. Great Success!
Below is data from a run that really shows the difference:
These are the average numbers (y axis is response time in ms) from a handful of invocations all sent in parallel. You can probably imagine how this difference would feel to a user.
There are of course a lot more details to this process than written here, but this blog post is already a lot longer than I planned, so I’ll cut it here.
Your submission has been received!