How does Dataloader cache and batch database queries?

Question

How does Dataloader cache and batch database queries?

Looking at the DataLoader library , how does it cache and group queries?

The instructions define usage as follows:

var DataLoader = require('dataloader') var userLoader = new DataLoader(keys => myBatchGetUsers(keys)); userLoader.load(1) .then(user => userLoader.load(user.invitedByID)) .then(invitedBy => console.log('User 1 was invited by ${invitedBy}')); // Elsewhere in your application userLoader.load(2) .then(user => userLoader.load(user.lastInvitedID)) .then(lastInvited => console.log('User 2 last invited ${lastInvited}'));

But it’s not clear to me how the load function works and how myBatchGetUsers function might look. Please could you give an example, if possible!

+7

javascript function node.js dataloader

terik_poe Feb 06 '17 at 17:35

source share

2 answers

Sebastien · Answer 1 · 2017-02-13T17:00:44+0000

Facebook's DataLoader utility works by combining input requests with the batch function you must provide. It only works with queries using Identifiers .

There are three phases:

Aggregation phase: any request for a Loader object is delayed until process.nextTick
Batch phase: Loader just call the function myBatchGetUsers , which you provided with a combination of all the requested keys.
Separation phase: the result is then split, so input requests receive the desired portion of the response.

This is why in the example you should have only two queries:

One for users 1 and 2
Then for related users ( invitedByID )

To implement this with mongodb, for example, you just need to define the myBatchGetUsers function to use the find method accordingly:

 function myBatchGetUsers(keys) { // usersCollection is a promisified mongodb collection return usersCollection.find( { _id: { $in: keys } } ) }

Zach smith · Answer 2 · 2019-09-24T20:29:20+0000

I found it useful to recreate the part of the dataloader that I use to see one of the possible ways to implement it. (in my case, I only use the .load() function)

So, creating a new instance of the DataLoader constructor gives you two things:

List of identifiers (empty for starters)
A function that uses this list of identifiers to query the database (you provide this).

The constructor might look something like this:

 function DataLoader (_batchLoadingFn) { this._keys = [] this._batchLoadingFn = _batchLoadingFn }

And instances of the DataLoader constructor have access to the .load() function, which should be able to access the _keys property. So, it is defined for the DataLoad.prototype object:

 DataLoader.prototype.load = function(key) { // this._keys references the array defined in the constructor function }

When creating a new object using the DataLoader constructor ( new DataLoader(fn) ), the fn you pass in must retrieve data from somewhere, taking an array of keys as arguments and returning a promise that resolves to the array of values that correspond to the original key array.

For example, here is a dummy function that takes an array of keys and passes the same array back, but with doubled values:

 const batchLoadingFn = keys => new Promise( resolve => resolve(keys.map(k => k * 2)) )

 keys: [1,2,3] vals: [2,4,6] keys[0] corresponds to vals[0] keys[1] corresponds to vals[1] keys[2] corresponds to vals[2]

Then, each time you call the .load(indentifier) function, you add the key to the _keys array, and at some point batchLoadingFn is batchLoadingFn and the _keys array is _keys as an argument.

The trick is ... How do I call .load(id) many times, but batchLoadingFn execute only once? This is cool, and the reason I learned how this library works.

I found that this can be done by indicating that batchLoadingFn is executed after the timeout, but if .load() is called again before the timeout interval, then the timeout is canceled, a new key is added and batchLoadingFn is batchLoadingFn transferred. Achieving this in code looks like this:

 DataLoader.prototype.load = function(key) { clearTimeout(this._timer) this.timer = setTimeout(() => this.batchLoadingFn(), 0) }

Essentially, calling .load() removes pending calls to batchLoadingFn , and then schedules a new call to batchLoadingFn at the end of the event loop. This ensures that for a short period of time, if .load() is called many times, batchLoadingFn will be called only once. Actually it is very similar to disassembly . Or at least it's useful when building websites, and you want to do something for the mousemove event, but you get a lot more events than you want to deal with. I think this is called exposure.

But to call .load(key) also need to press a key in the _keys array, which we can do in the body of the .load function by passing the key argument to _keys (just this._keys.push(key) ). However, the contract for the .load function is that it returns a single value related to what the key argument refers to. At some point, batchLoadingFn will be called and the result will be obtained (it should return a result corresponding to _keys ). In addition, batchLoadingFn is required to actually return a promise of this value.

The next bit that I thought was especially smart (and was worth it to look at the source code)!

The dataloader library dataloader instead of storing the list of keys in _keys actually contains a list of keys associated with a reference to the resolve function, which, when called, .load() value as a result of .load() . .load() returns a promise, the promise is resolved when the resolve function is called.

Thus, the _keys array actually stores a list of tuples [key, resolve] . And when your batchLoadingFn returns, the resolve function is called with a value (which, we hope, matches the element in the _keys array through the sequence number).

Thus, the .load function looks as follows (in terms of placing the [key, resolve] tuple in the _keys array):

 DataLoader.prototype.load = function(key) { const promisedValue = new Promise ( resolve => this._keys.push({key, resolve}) ) ... return promisedValue }

And it remains only to execute batchLoadingFn with the batchLoadingFn keys as an argument and call the correct resolve function for it, return

 this._batchLoadingFn(this._keys.map(k => k.key)) .then(values => { this._keys.forEach(({resolve}, i) => { resolve(values[i]) }) this._keys = [] // Reset for the next batch })

And collectively, all the code for implementing the above is here:

 function DataLoader (_batchLoadingFn) { this._keys = [] this._batchLoadingFn = _batchLoadingFn } DataLoader.prototype.load = function(key) { clearTimeout(this._timer) const promisedValue = new Promise ( resolve => this._keys.push({key, resolve}) ) this._timer = setTimeout(() => { console.log('You should only see me printed once!') this._batchLoadingFn(this._keys.map(k => k.key)) .then(values => { this._keys.forEach(({resolve}, i) => { resolve(values[i]) }) this._keys = [] }) }, 0) return promisedValue } // Define a batch loading function const batchLoadingFunction = keys => new Promise( resolve => resolve(keys.map(k => k * 2)) ) // Create a new DataLoader const loader = new DataLoader(batchLoadingFunction) // call .load() twice in quick succession loader.load(1).then(result => console.log('Result with key = 1', result)) loader.load(2).then(result => console.log('Result with key = 2', result))

If I remember correctly, I don't think the dataloader library uses setTimeout , but uses process.nextTick instead. But I could not get this to work.

How does Dataloader cache and batch database queries?

More articles: