Processing large XML documents in memory

I need to store a very large amount of XML in memory (most likely it will use Oracle Coherence as a distributed cache). It is expected that 100,000 XML will be stored in memory. These XMLs are quite large - approx. 250 KB each. These XML queries are requested by other systems β€” they only query the part of XML that is relevant to them. They will also be asked to make changes to the XML content. The load will be about 300 such queries per minute, distributed more or less evenly between search results and updates. It is important to note that XML is not structured, so I will not have an XSD for them, but I have an algorithm for retrieving and updating XML.

My question is what will give the best performance: by storing the XML files in memory as they are, and doing all the data extraction from them and updates using XQuery or even using encoded procedures or to convert XML objects to objects, manipulate them into code, and then convert them back to XML files when they are requested by other systems?

+4
source share
2 answers

You have 100,000 documents in size of 250 KB. It is approx. 24 GB of raw data. If you put this in memory and want to be able to process, filter or update it, you will have an additional blowing ratio, say 10. Then you will get the required memory capacity of 240 GB.

So, if you have enough memory, this is, of course, the best place to store it. But you need to have a backup strategy (what happens if the number of nodes grows out of memory?), And it becomes even more complicated if you don't want to lose updates: what happens if the machine is down? if you update in memory: when do you unload updates to disk? And there is still something to think about.

However, to answer your second question: Convert to objects or not? Most people are tempted to transform XML into objects using PHP, ruby, Java, ".NET" or the like, and even to store XML in SQL databases. If you want to hear an honest answer: do not do this if you do not have time and money to spend . Objects incur large overheads of additionally necessary analysis, design, parsing, sorting, testing, maintenance ... In fact, this completely eliminates the flexibility of XML, and I see that it is constantly underestimated. In my experience with XML and XQuery, it saves about 80% on average for the things listed above.

Also, if you create flexible XML data into objects, you will face a nightmare if your data structures evolve.

You might want to check out a 28 ms scalable database for flexible data , which is PAAS to the cloud. There you get everything you need from the box (including load balancing, automatic recovery, persistence management, replication, backup, automatic switching to another resource, scaling inside and out, elasticity, memory management, shape, ...).

This is just my personal opinion, but perhaps it contributes to at least some aspects of solving your problem.

+7
source

I assume that it will be faster in memory (if you have enough space). But with all the performance problems, this is due to the big β€œdependent”. You need to determine the actual uses.

0
source

Source: https://habr.com/ru/post/1395683/


All Articles