Dealing With I/O Using The ThreadPool

Hello dear contributor. We’re about to touch an exicing part of the implementaion of Node.js and libuv. To leverage this chapter it’s important that you know how Node event loop works. You don’t not need to have all the order of execution of setImmediate, setTimeout and process.nextTick. A knowledge about the architectural foundations is good enough.

Let me highlight the needed concepts for you.

Recap Node.js Event Loop

An envent loop works the same way as we when we go to the supermarket. You have your buy list.

You start picking products, you’re focused on them. While picking products from prateleiras you can’t do anything else. That’s like the execution of your JS code.

When the main thread is executing code, it can’t deal with anything else. See the classical example below:

console.log('Hi')

setTimeout(function a() {
    while(true);
}, 100);

setTimeout(function b() {
    console.log('finish execution')
}, 100);

console.log('How is it going?')

Notice that this code has to be parsed and then executed. All this keeps the main thread busy.

It executes all the four instructions In the following sequence:

prints Hi:
schedules function a to be executed at a given timeout;
schedules function b to be executed at a given timeout;
prints How is it going?.

What happens next?

Within at least 100ms the function a will be executed and then boom! Our event loop is blocked!

Remember: The main thread executes JavaScript code and when doing so it can’t do anything else

Our function a is JavaScript code. The main thread will take care of it and be blocked forever, since it has a while(true). In this scenario, the function b will never be executed not matter how much time has passed.

All this happens in libuv. The lib that backs Node.js concurrency model.

libuv architecture diagram showing the execution model

How Node.js keeps Event Loop free

Not everybody writes infinite loops like that in Node.js and when doing so they take care of it, right?

Most of our applications are I/O-bound, it means that the CPU is underused. There’s no need to have multiple threads if the most part of the job is waiting for a file descriptor/handle.

Remember: Threads consume resources

What libuv, and hence Node, does is I/O Multiplexing. This is the same thing that empowers NGINX, one of the most performative web servers out there.

Some operations still need to be done in a separate thread. Libuv, by default, uses the thread pool for file system and DNS operations. It also allows you to use the thread pool for other operations by just calling the uv_queue_work function.

In Node, some modules also leverage the threadpool, such as node:crypto and node:sqlite. The first one due to the heavy cryptographic operations and the second one due to the I/O in database operations.

The ThreadPoolWork class

Node has its constructors to empower you when dealing with other threads. One of them is the ThreadPool class. Its definition can be found at node_internals.h.

class ThreadPoolWork {
 public:
  explicit inline ThreadPoolWork(Environment* env, const char* type)
      : env_(env), type_(type) {
    CHECK_NOT_NULL(env);
  }
  inline virtual ~ThreadPoolWork() = default;

  inline void ScheduleWork();
  inline int CancelWork();

  virtual void DoThreadPoolWork() = 0;
  virtual void AfterThreadPoolWork(int status) = 0;

  Environment* env() const { return env_; }

 private:
  Environment* env_;
  uv_work_t work_req_;
  const char* type_;
};

The methods ScheduleWork and CancelWork are used to schedule and cancel work in the thread pool, respectively. Both implementations can be found at src/threadpoolworkd-inl.h.

As you can see, it is a wrapper around uv_queue_work and uv_cancel. This class is designed to be used in Node.js modules that need to perform work in a separate thread without blocking the main event loop.

The ThreadPoolWork class is meant to be extended by other classes that need to perform specific work in the thread pool. When doing so the DoThreadPoolWork method should be implemented to perform the actual work in the thread pool, and the AfterThreadPoolWork method should be implemented to handle the result of the work after it has been completed.

When I started working on the node:sqlite backup implementation I’ve been told to not block the event loop while the backup is happening. This is a must-have, we want to keep the event loop free to handle other requests while the backup is being performed.

SQLite runs in the same process as Node.JS. There is no networking or inter-process communication involved. We needed to have a custom implementation running in the libuv thread pool to avoid blocking the event loop.

I then used the ThreadPoolWork for that:

class BackupJob : public ThreadPoolWork {
    public:
        void DoThreadPoolWork() override {
            backup_status_ = sqlite3_backup_step(backup_, pages_);
        }

        void AfterThreadPoolWork(int status) override {
            int total_pages = sqlite3_backup_pagecount(backup_);
            int remaining_pages = sqlite3_backup_remaining(backup_);

            if (remaining_pages != 0) {
                this->ScheduleWork(); // There is still work to do so we call the ScheduleWork method again
                return;
            }

            if (backup_status_ != SQLITE_OK && backup_status_ != SQLITE_DONE) {
                RejectPromiseWithError();
            } else {
                ResolvePromiseWithValue(total_pages);
            }
        }

    private:
        sqlite3_backup* backup_ = nullptr;
        int pages_;
        int backup_status_ = SQLITE_OK;
};

Of course, this is a simplified version of the actual implementation, you can check all the code in the original PR or directly in the main branch.

Notice the usage of the DoThreadPoolWork and AfterThreadPoolWork methods. The first one is where the actual work is done, in this case, performing a backup operation using SQLite’s C API. The second one is where you can handle the result of the work after it has been completed.

This should be called like this:

BackupJob* job = new BackupJob(); // constructor parameters are omitted for simplicity
job->ScheduleWork(); // Remember that ScheduleWork is a wrapper around uv_queue_work, as mentioned before

That’s it. Whenever you need to perform work in the thread pool, the ThreadPoolWork is an option to consider.