Dynamic meta tags for bots and crawlers using Firebase and Cloudflare Workers


There are lots of ways to serve dynamic meta tags to bots and crawlers, most involve managing your own servers whether it be SSR, pre-rendering, or going down the route of static site generation. But what if you didn’t want to manage a server or lock into an SSR framework, is there another option?

This is the position I found myself in recently when working on a project using Firebase. We’ve got our client side rendered app on Firebase Hosting. It’s great, fast and reliable. The problem is when users share pages from the site crawlers can’t scrape the meta tags. So I needed to find a way to serve these meta tags and ideally I wanted to do this without resorting to SSR or managing a server.

Pre-rendering would be a good solution; route incoming requests based on the user agent, users get the site from the CDN and bots can be routed to a pre-render service. At first a Cloud Function or Lambda seems like a good way to approach this, we can check the user agent of incoming requests and route it accordingly. But there is a trade off with serverless, and in this case it’s cold starts. If you’re new to serverless, cold starts are when the platform spins down your code when it’s not in use; if your workload is inconsistent, this is going to happen. The problem arises when a new request comes in and the platform needs to load and initialise your code again, which is slow. A cold start could have users waiting several seconds (5 or more) before the server is ready to handle the incoming request, which is unacceptable in this use case.

AWS offers something called “Provisioned Concurrency” for Lambda Functions to mitigate cold starts. You can essentially pay to keep a number of functions “warm” 24/7, but to me that defeats the entire point of serverless, right? The benefit of only paying for what you use, and the ability to scale instantly to match demand has now gone.

Anything with a cold start is out of the of the question because they are too slow. Enter Cloudflare Workers. Cloudflare Workers are different to the GCP and AWS serverless offerings, but they are also designed for a different purpose. They’re not running Node or spinning up a VM, so this allows Cloudflare to have an advertised 0ms cold start time, while being deployed to 155 locations on the Cloudflare CDN. This mitigates the two sticking points I have when using Cloud Functions/Lambda for a purpose like routing incoming requests.

Workers do come with some limitations, for example on the free tier you only get 10ms execution per invocation. But one could easily use a Worker as a reverse proxy to check how the user agent identifies itself and serve the correct assets based on that. So let’s do it.

Rather than add our domain to Firebase Hosting I’ve added it to Cloudflare. I’m using Cloudflare as my DNS and I’ve routed requests through my Worker which is acting as a reverse proxy.

The array of user agent strings and the conditional are from the prerender-node package, which is express middleware with the purpose of checking if a user agent is a bot to route incoming requests to a pre-render service.

This works very well, incoming requests from users are getting served very quickly from the Cloudflare/Google CDN’s.

The second part is to set up a pre-renderer. We could send them to a service like prerender.io, but Cloud Functions support Puppeteer out the box and it’s not the end of the world if a bot has to wait for a cold start, as long as it doesn’t time out. Puppeteer can pre-render our page and return the HTML string.

The function opens a new tab in headless Chrome, makes a request to Firebase Hosting, renders out the resulting HTML and closes the tab. To be a bit smarter, I’m caching every request in Firestore so I can speed up response’s when something has already been pre-rendered. My cache duration is currently set at 1 day, you could also programmatically flush routes in the cache when the content updates, whatever fits your use case.

How does it perform? Well, it actually works pretty well, warm requests to the function are as low as 1.5s when the page is uncached which is pretty reasonable. In the worst cases I’ve seen requests take up to 8s which isn’t great, if it’s cached it knocks several seconds off that, but it hasn’t timed out any of the bot’s in my unscientific tests, so I can live with it. To further improve response times I call the function when someone clicks the share button on my site, this warms up the function and also caches that page before it gets crawled, so these requests are pretty performant.

Ultimately I think this solution works pretty well for my use case, if that cold start for bots is something you can’t live with, perhaps a prerender.io type of service might be where you want to point your requests. Or perhaps you do need to manage your own server.