Skip to content

Lighthouse Scanner: Stateless NodeJS Microservices with Redis DB

By Sebastian Günther

Posted in Lighthouse, Nodejs, Microservices, Redis

Lighthouse is a service to scan webpages and see how good they score on terms of SEO, performance and best practices. You can use the scanner here: https://lighthouse.admantium.com/.

Microservices execute functions. They operate with data, and the produce data. This data should not be inherent to the microservice, but it should be persisted in a database. In a cloud environment, when a microservice becomes unavailable, it is replaced by a new one. Newly deployed microservices simple pick up the state from the database. With this approach, there is no downtime at all.

In my lighthouse project, the scanner microservice produces two types of data: Jobs, which represent scan requests that need to be executed. Reports, which are the results of the jobs. This data needs to be persisted and queried. So, what is the best database for this? How can I scale the database with increasing traffic?

In this article, I detail how to decouple the microservice from its data.

Note: The lighthouse service is discontinued since 2024-05-18.

Which Database to Choose?

How do you choose your database? It’s obvious to stick to those that you have already worked with: MySQL, PostgreSQL, SQLite, MongoDB or CouchDB. In lighthouse, I was tempted to use MongoDB because all the moving data is easily represented with JSON objects. But the primary data that is produced in the app is not something that needs to be stored forever: A job is just a temporary entry, and even scan results are freshly produced on demand.

Which database has this implicit "time to live" for data as its key features? I remembered Redis, the number one key value store according to this report. Redis works with simple command on the console: Set a value with set msg "Hello", and receive a value with get msg. It’s that simple. Redis supports different data structures like lists, sets, ordered list, hashes. It’s also blazingly fast because it runs in-memory. Its schema-less nature means you can structure data in any way you want, its evolvable to your requirements. The final point to convince me was that within a tutorial of mere hours you were ready to go. In total, I spend a day of learning all essential commands, data structures and configuration/management aspects, and then half a day to add it to my lighthouse app.

In the remainder of this article, I will show you Redis by example when using it to make the lighthouse scanner truly stateless.

Redis Basics

Redis provides two main commands: redis-server and redis-cli. The server starts on 5678 and accepts connections from all clients. The redis-cli starts an interactive terminal session. Here, you execute redis commands to create, read, update or delete data, as well as system maintenance commands. Date that is exchanged between Client and Server is just serialized text. In its default configuration, there is no authorization, TLS or access control lists, but this can be configured in a very readable config file.

Redis nature as a pure key value store becomes visible by considering the basic data structures:

  • Strings
  • Hashmaps
  • Lists, Sets, Ordered Sets

To define a string value, the command is simply set KEY value. For a hashmap, it’s a key followed by field-value pairs: hmset KEY field1 value field2 value. And a list is modified with lpush LIST value1 value2. Reading these values is a simple get KEY for strings and hgetall KEY for hashes.

Let’s see an example working with a list. We create the list jobs:list and push the values job1, job2, job3 into it. Then with lrange we print the list content, starting at the index 0 until its end. We extract a value with lpop and print the list content again.

client@redis> lpush jobs:list job1 job2 job3
(integer) 3
client@redis> lrange jobs:list 0 -1
1) "job3"
2) "job2"
3) "job1"
client@redis> lpop jobs:list
"job3"
client@redis> lrange jobs:list 0 -1
1) "job2"
2) "job1"
client@redis>

Redis features other specialized date structures: Geo for defining and working with geo coordinates, or Stream for timeseries-like data. And Redis features several modules that extends the core feature and data structures. For example: RedisReJson for manipulating JSON, RedisGraph for implementing graphs, and modules like Redis Gears which enable in-memory event-base data transformations.

I know, for experienced programmer this looks very simple, simple to the point that you might raise the question "So what’s so great about it"? For me, this simplicity is refreshing! You pick the data structure that is most suitable for you, learn and apply the commands, and can just use it immediately. Your data is stored very space and functional efficiently, manipulated with simple commands. Redis has a very flat learning curve, within mere hours you can get a good understanding and experience with all major data structures. And also, installing and running Redis just works without any additional configuration.

Building a Stateless Microservice

Now let’s see how to apply Redis when building a stateless microservices. The primary imperatives are:

  • All data must be persisted immediately
  • Persisting and Reading must be fast and efficient
  • Working data must be deleted easily

In lighthouse, data is created or modified in these use cases:

  • Create and update a job
  • Create a report

Let’s discuss each case, and see which Redis data structure to apply best.

Jobs

A job object captures information and state of a scan job. It is pure working data and does not carry significance after it has been completed.

Concretely:

  • A job object consists of uuid, domain and status.
  • The uuid is the identifier
  • The status changes from created to finish or error
  • The domain is used to retrieve the scan report once it is finished

Lets see how to create, read, update and delete these values.

To store this key-values pairs, a Redis hash map is the best choice: Indexed by its uuid, with fields for status and domain. To create a hash map for a request to scan the domain http://example.com, we just execute the following command:

hset "0b25ab16-6efd-485c-b260-1766505a3811" domain "http://example.com" status "started"

After its creation, the data value can be retrieved with the following command:

hgetall "0b25ab16-6efd-485c-b260-1766505a3811"
1) "domain"
2) "http://example.com"
3) "status"
4) "started"

To update the value, we use the same command again, and just list the changed values.

hset "0b25ab16-6efd-485c-b260-1766505a3811" status "finished"

Finally, to delete this data, you either use the explicit del command or you set a time in seconds how long the data will be kept. In lighthouse, I decide to keep jobs exactly 24 hours.

expire "0b25ab16-6efd-485c-b260-1766505a3811" 86400

Reports

Once a scan job is done, a report will be generated. This report is a single, self-contained HTML page. As of yet, there is no need to further structure this data, so I just store it completely as text with the key being the domain name.

set "example.com" "<!doctype html><html lang=\"en\"><head><meta charset=\"utf-8\"> ..."

To read this report:

get "example.com"

NodeJS Wrapper

While you can execute Shell commands from within a Node.js application, I like to use the ioredis library as a small wrapper. The wrapper provides you with an object called redis. This object contains method for all the Redis commands, and its arguments are just string values. With this library, you can keep the simplicity of the Redis commands. Let’s see some examples.

The command to create a job hash map becomes the following:

redis.hset("0b25ab16-6efd-485c-b260-1766505a3811", "domain", "http://example.com", "status", "started")

We can abstract the creation, updating and setting the expiration date of a job into the following function and use it throughout the project:

function updateJob(uuid, details) {
  redis.hset(uuid, 'domain', details.domain, 'status', details.status);
  redis.expire(uuid, 86400);
}

Reading a job cannot get simpler than this:

redis.hgetall(uuid);

Conclusion

Redis is a fast and efficient in-memory database that support a wide variety of data formats. With easy to learn and to apply commands you structure data as it best fits your use case. All commands are executed immediately, thus making a microservice working with Redis absolutely stateless. Redis offers advanced modules to support graph data structure or timeseries and event streams. Take a look at Redis for your next project.