$ inc follower count, or should I use a collection to track them?

Question

$ inc follower count, or should I use a collection to track them?

I load products through endless scrolling in pieces of 12 at a time.

Sometimes I can sort them by the number of followers that they have.

The following is a description of how I track the number of followers of each product.

In a separate collection it follows that due to data caching 16mb and the number of the following values should be unlimited.

follow the pattern:

var FollowSchema = new mongoose.Schema({ user: { type: mongoose.Schema.ObjectId, ref: 'User' }, product: { type: mongoose.Schema.ObjectId, ref: 'Product' }, timestamp: { type: Date, default: Date.now } });

Product Following Scheme:

 var ProductSchema = new mongoose.Schema({ name: { type: String, unique: true, required: true }, followers: { type: Number, default: 0 } });

Whenever a user follows / unsubscribes from a product, I run this function:

 ProductSchema.statics.updateFollowers = function (productId, val) { return Product .findOneAndUpdateAsync({ _id: productId }, { $inc: { 'followers': val } }, { upsert: true, 'new': true }) .then(function (updatedProduct) { return updatedProduct; }) .catch(function (err) { console.log('Product follower update err : ', err); }) };

My questions:

1: Is there a chance that the added “follower” value inside the product may cause some kind of error that will lead to inconsistent / inconsistent data?

2: would it be better to write an aggregate for counting subscribers for each Product, or would it be too expensive / slower?

In the end, I would probably rewrite this in the DB graph as it seems more appropriate, but at the moment it is an exercise in mastering MongoDB.

+6

javascript node.js mongodb mongoose

Noobter Dec 04 '16 at 15:08

source share

3 answers

For the number 1, if only the operations in this field increase and decrease, I think that everything will be fine with you. If you start to replicate this data or use it in connections for any reason, you risk incompatible data.

For number 2, I recommend that you run both scripts in the mongo shell to test them. You can also look at individual explanation plans for both queries to get an idea of which one will work best. I just guess, but it looks like the update route will work well.

In addition, the amount of expected data matters. It could work perfectly in one direction, but after a million records, another way could be a way. If you have a test environment, that would be good to check.

0

James Cootware Dec 6 '16 at 20:07

source share

1) It depends on the level of the application in order to ensure consistency and, as such, there will be a chance that you will find yourself in inconsistency. The questions I would ask are: how important is consistency in this case and how likely is it that there will be a lot of inconsistency? My thought is that being turned off by a single follower is not as important as making your endless scroll loading as fast as possible to improve user experience.

2) It is probably worth looking at performance, but if I had to guess, I would say that this approach would be slow.

0

jwags Dec 13 '16 at 16:13

source share

gzc · Accepted Answer · 2016-12-13T17:36:39+0000

1 If you increase the value after inserting or decreasing after deleting, this is the probability of data inconsistency. For example, the insert was successful, but the increment failed.

2 Intuitively, aggregation is much more expensive than found in this case. I did a test to prove it.

First create 1000 users, 1000 products and 10,000 followers in random order. Then use this code for comparison.

 import timeit from pymongo import MongoClient db = MongoClient('mongodb://127.0.0.1/test', tz_aware=True).get_default_database() def foo(): result = list(db.products.find().sort('followers', -1).limit(12).skip(12)) def bar(): result = list(db.follows.aggregate([ {'$group': {'_id': '$product', 'followers': {'$sum': 1}}}, {'$sort': {'followers': -1}}, {'$skip': 12}, {'$limit': 12} ])) if __name__ == '__main__': t = timeit.timeit('foo()', 'from __main__ import foo', number=100) print('time: %f' % t) t = timeit.timeit('bar()', 'from __main__ import bar', number=100) print('time: %f' % t)

exit:

 time: 1.230138 time: 3.620147

Creating an index can speed up a query search.

 db.products.createIndex({followers: 1}) time: 0.174761 time: 3.604628

And if you need attributes from the product, such as a name, you need another O (n) request.

My guess is that as you scale up, aggregation will be much slower. If necessary, I can compare the results of large-scale data.

$ inc follower count, or should I use a collection to track them?

More articles: