Retrieve a large number of records using mongoDB in a reasonable amount of time

I use mongoDB to store the query log and get some statistics about it. The objects that I store in mongoDB contain the query text, date, user, if the user clicked on some results, etc. etc.

Now I am trying to get all requests not clicked by the user on a specific day with java. My code is something like this:

DBObject query = new BasicDBObject(); BasicDBObject keys = new BasicDBObject(); keys.put("Query", 1); query.put("Date", new BasicDBObject("$gte", beginning.getTime()).append("$lte", end.getTime())); query.put("IsClick", false); ... DBCursor cur = mongoCollection.find(query, keys).batchSize(5000); 

The query result contains about 20 thousand records that I need for iteration. the problem is that it takes minutes :( I do not think this is normal. From the server log I can see:

 Wed Nov 16 16:28:40 query db.QueryLogRecordImpl ntoreturn:5000 reslen:252403 nscanned:59260 { Date: { $gte: 1283292000000, $lte: 1283378399999 }, IsClick: false } nreturned:5000 2055ms Wed Nov 16 16:28:40 getmore db.QueryLogRecordImpl cid:4312057226672898459 ntoreturn:5000 query: { Date: { $gte: 1283292000000, $lte: 1283378399999 }, IsClick: false } bytes:232421 nreturned:5000 170ms Wed Nov 16 16:30:27 getmore db.QueryLogRecordImpl cid:4312057226672898459 ntoreturn:5000 query: { Date: { $gte: 1283292000000, $lte: 1283378399999 }, IsClick: false } bytes:128015 nreturned:2661 --> 106059ms 

Thus, loading the first fragment takes 2 seconds, the second 0.1 seconds, the third 106 seconds !!! strange .. I tried to change the batch size by creating indexes in Date and IsClick, rebooting the machine: P, but nothing. Where am I mistaken?

+4
source share
1 answer

There are several factors that can affect speed. Additional data will need to be collected to determine the cause here.

Some potential problems:

  • Indexes: Do you use the correct indexes? You should probably be indexed on IsClick/Date . This puts the range second, which is a normal proposition. Note that this is different from indexing on Date/IsClick , the order is important. Try .explain() in your query to find out which indexes are used.
  • Data size:, in some cases, slowness may be caused by too much data. This may be too many documents or too many large documents. It may also be caused by an attempt to find too many needles in a really large haystack. You are returning 252 thousand data ( reslen ) and 12k documents, so this is probably not a problem.
  • IO: MongoDB uses memory-mapped files and therefore uses a large amount of virtual memory. If you have more data than in RAM, then "transfer to disk" is required to receive certain documents. Switching to a disk can be a very expensive operation. You can define β€œdrive to disk” using tools like iostat or resmon (Windows) to track disk activity.

Based on personal experience, I strongly suspect # 3, with a possible aggravation from # 1. I would start by looking at the I / O when I run the .explain() request. This should quickly narrow down the range of possible problems.

+5
source

Source: https://habr.com/ru/post/1381517/


All Articles