Your problem is asList()
This forces the driver to iterate over the entire cursor (80,000 documents, several gigabytes), keeping everything in memory.
batchSize(someLimit) and Cursor.batch() will not help here when you cross the entire cursor, regardless of the size of the packet.
Instead, you can:
1) Loop over the cursor: List<MYClass> datalist = datasource.getCollection("mycollection").find()
2) Reading documents one at a time and submitting documents to the buffer (say, a list)
3) For every 1000 documents (say) of the Hadoop API call, clear the buffer, and then start again.
source share