XML vs MongoDB

I have a problem...

I need to store a daily flurry of about 3,000 medium-sized XML documents (from 100 to 200 data elements).

The data is somewhat unstable in the sense that the scheme changes from time to time, and the changes are not announced with sufficient prior notification, but they must be processed retroactively based on the emergency "correction".

The data consumption structure includes both a website and some simple analytics (some middle and pie charts).

MongoDB seems like a great solution except for one problem; it requires conversion between XML and JSON. I would rather keep the XML documents intact as they arrive and transfer any intelligent processing to the data consumer. Thus, any errors in the data loading code will not cause serious damage. Errors in the consumer (s) are always harmless, as you can correct and restart without permanent data loss.

I do not need "massively parallel" processing capabilities. This is about 4 GB of data that fits comfortably on a 64-bit server.

I got rid of the consideration of Cassandra (due to complicated setup) and Couch DB (due to the lack of familiar features like indexing that I would need because of my thinking in RDBMS).

So finally my real question is ...

Is it worth it to look for your own XML database, which is not as mature as MongoDB, or do I need to bite the bullet and convert all XML to JSON as it arrives and just use MongoDB?

+4
source share
2 answers

You can take a look at BaseX (Basex.org) with the built-in XQuery processor and Lucene text indexing.

+4
source

This amount of data is small.

If there is no need for parallel data processing, there is no need for Mongo DB. Especially when it comes to small amounts of data, such as 4 GB, the overhead of distributing the work can easily get more than the actual evaluation effort.

4GB / 60k nodes are also not large XML databases. After some time, you will understand XQuery as a great tool for parsing XML documents.

It's really?

Or do you get 4 GB daily and need to evaluate this and all the data that you already saved? Then you will receive a certain amount that you can no longer store and process on one machine; and the distribution of work will be necessary. Not in a few days or weeks, but a year will already bring you 1 TB.

Convert to JSON

How do you present the information? Does it stick to any schema or even resemble tabular data? MongoDB's capabilities for parsing semi-structured methods are worse than XML databases provide. On the other hand, if you only want to pull a few fields onto well-defined paths, and you can parse one input file after another, Mongo DB probably won't suffer much.

Migrate XML to the Cloud

If you want to use both the capabilities of the XML database for data analysis, and some of the features of the NoSQL system in disseminating this work, you can run the database from this system.

BaseX falls into the cloud with exactly the features you need — but it will probably take some time to get this -Ready feature.

+2
source

Source: https://habr.com/ru/post/1501961/


All Articles