Delete all fields that are empty

How can I remove all fields that are null of all this collection of documents?


I have a collection of documents such as:

 { 'property1': 'value1', 'property2': 'value2', ... } 

but each document may have a null entry instead of a value entry.

I would like to save disk space by deleting all null entries. The existence of null entries does not contain any information in my case, because I know the format of the JSON document a priori.

+8
source share
5 answers
 // run in mongo shell var coll = db.getCollection("collectionName"); var cursor = coll.find(); while (cursor.hasNext()) { var doc = cursor.next(); var keys = {}; var hasNull = false; for ( var x in doc) { if (x != "_id" && doc[x] == null) { keys[x] = 1; hasNull = true; } } if (hasNull) { coll.update({_id: doc._id}, {$unset:keys}); } } 
+10
source

This is an important question, since mongodb cannot index null values ​​(i.e. don't ask for NULL values ​​or you will wait a long time), so it is best to avoid nulls and set defaults using setOnInsert .

Here is a recursive solution to remove zeros:

 /** * RETRIEVES A LIST OF ALL THE KEYS IN A DOCUMENT, WHERE THE VALUE IS 'NULL' OR 'UNDEFINED' * * @param doc * @param keyName * @param nullKeys */ function getNullKeysRecursively(doc, keyName, nullKeys) { for (var item_property in doc) { // SKIP BASE-CLASS STUFF if (!doc.hasOwnProperty(item_property)) continue; // SKIP ID FIELD if (item_property === "_id") continue; // FULL KEY NAME (FOR SUB-DOCUMENTS) var fullKeyName; if (keyName) fullKeyName = keyName + "." + item_property; else fullKeyName = item_property; // DEBUGGING // print("fullKeyName: " + fullKeyName); // NULL FIELDS - MODIFY THIS BLOCK TO ADD CONSTRAINTS if (doc[item_property] === null || doc[item_property] === undefined) nullKeys[fullKeyName] = 1; // RECURSE OBJECTS / ARRAYS else if (doc[item_property] instanceof Object || doc[item_property] instanceof Array) getNullKeysRecursively(doc[item_property], fullKeyName, nullKeys); } } /** * REMOVES ALL PROPERTIES WITH A VALUE OF 'NULL' OR 'UNDEFINED'. * TUNE THE 'LIMIT' VARIABLE TO YOUR MEMORY AVAILABILITY. * ONLY CLEANS DOCUMENTS THAT REQUIRE CLEANING, FOR EFFICIENCY. * USES bulkWrite FOR EFFICIENCY. * * @param collectionName */ function removeNulls(collectionName) { var coll = db.getCollection(collectionName); var lastId = ObjectId("000000000000000000000000"); var LIMIT = 10000; while (true) { // GET THE NEXT PAGE OF DOCUMENTS var page = coll.find({ _id: { $gt: lastId } }).limit(LIMIT); if (! page.hasNext()) break; // BUILD BULK OPERATION var arrBulkOps = []; page.forEach(function(item_doc) { lastId = item_doc._id; var nullKeys = {}; getNullKeysRecursively(item_doc, null, nullKeys); // ONLY UPDATE MODIFIED DOCUMENTS if (Object.keys(nullKeys).length > 0) // UNSET INDIVIDUAL FIELDS, RATHER THAN REWRITE THE ENTIRE DOC arrBulkOps.push( { updateOne: { "filter": { _id: item_doc._id }, "update": { $unset: nullKeys } } } ); }); coll.bulkWrite(arrBulkOps, { ordered: false } ); } } // GO GO GO removeNulls('my_collection'); 

document before:

 { "_id": ObjectId("5a53ed8f6f7c4d95579cb87c"), "first_name": null, "last_name": "smith", "features": { "first": { "a": 1, "b": 2, "c": null }, "second": null, "third" : {}, "fourth" : [] }, "other": [ null, 123, { "a": 1, "b": "hey", "c": null } ] } 

document after:

 { "_id" : ObjectId("5a53ed8f6f7c4d95579cb87c"), "last_name" : "smith", "features" : { "first" : { "a" : 1, "b" : 2 } }, "other" : [ null, 123, { "a" : 1, "b" : "hey" } ] } 

As you can see, it removes null , undefined , empty objects and empty arrays. If you want it to be more / less aggressive, you need to change the block "NULL FIELDS - CHANGE THIS BLOCK TO ADD LIMITATIONS".

editing is welcome, especially @stennie

+2
source

Like this question ( mongodb request without field name ):

Unfortunately, MongoDB does not support any method for querying all fields with a specific value.

So, you can either iterate the document (for example, an example wizard), or do it in a non-mongodb way.

If it is a JSON file, delete all lines with null in the sed command:

 sed '/null/d' ./mydata.json 
+1
source

You can use the mongo updateMany function, but you must do this by specifying the parameter you want to update, for example the year parameter:

 db.collection.updateMany({year: null}, { $unset : { year : 1 }}) 
+1
source

Since Mongo 4.2 , db.collection.update() may take aggregation of the conveyor, which eventually removes the field on the basis of its value:

 // { _id: ObjectId("5d0e8...d2"), property1: "value1", property2: "value2" } // { _id: ObjectId("5d0e8...d3"), property1: "value1", property2: null, property3: "value3" } db.collection.update( {}, [{ $replaceWith: { $arrayToObject: { $filter: { input: { $objectToArray: "$$ROOT" }, as: "item", cond: { $ne: ["$$item.v", null] } } } }}], { multi: true } ) // { _id: ObjectId("5d0e8...d2"), property1: "value1", property2: "value2" } // { _id: ObjectId("5d0e8...d3"), property1: "value1", property3: "value3" } 

In detail:

  • The first part {} is a match request, which filters which documents to update (in our case, all documents).

  • The second part [{ $replaceWith: {... }] is the update aggregation pipeline (note the square brackets that indicate the use of the aggregation pipeline):

    • Using $objectToArray we first convert the document into an array of keys / values, such as [{ k: "property1", v: "value1" }, { k: "property2", v: null },...] .
    • Using $filter we filter this array of keys / values ​​by deleting elements for which v is null .
    • Then we convert the back-filtered array of keys / values ​​to an object using $arrayToObject .
    • Finally, we replace the entire document with $replaceWith .
  • Do not forget { multi: true } , otherwise only the first matching document will be updated.

0
source

Source: https://habr.com/ru/post/973765/


All Articles