Optimized / Best way to read / write sharing

One of my needs is to manage a shared resource (more like a read and write log)

among different processes (and also several threads) in the application. Data should also be

saved when the system restarts, so it must be a physical file / database.

A shared resource is some data that contains information about a key, a value. (so a possible operation that can be performed with this shared resource is to add new information about the key value,

update / delete existing key value information).

Therefore, I am thinking about using an xml file to store information physically, and a sample of the content will be

<Root> <Key1>Value</Key1> <Key2>Value</Key2> <Key3>Value</Key3> </Root> 

The interface for reading and operations will look like this:

  public interface IDataHandler { IDictionary<string,string> GetData(); void SetData(string key,string value); } 

I could assume that the data will not cross more than 500 MB, therefore, the solution is xml and if the data grows I will transfer it to the database. In addition, writing data will be larger compared to a read operation.

A few design questions / considerations related to the above scenario,

Is it possible to process 500 MB of data in an XML file?

Suppose the file is as xml. Now, how to take care of performance evaluation?

  • I am thinking of caching ( MemoryCache class in .Net) data like a dictionary, this will allow

to achieve performance during the read operation, is it ok to cache 500 MB of data in memory or we

is there another option?

  • Now, if I use the above caching mechanism, what should happen during the write operation:

  • Do I have to write the contents of the dictionary in xml again during each write operation, converting

whole dictionary for xml? or - is there a way to update only part of the xml file whose data becomes modified / added? or any

another way to handle this scenario? - Should I improve performance again by putting a write operation in the queue and in the background

the thread reads the queue and activates the actual write operation, so the one who actually writes the data

Will not suffer due to writing to a file? - To handle multi-threaded script, planning to use Mutex with a global name, are there any others

best way to do this?

I'm sure I work with a few assumptions and tried to build from there, and if I'm wrong with

some assumptions, this will change most of the design concept. Consequently, a completely new solution is also

welcome (maintaining performance as the main criteria). Thanks in advance.

+6
source share
7 answers

As you said, โ€œa write operation is more than a read,โ€ I assume that data grows much faster, so my suggestion is to start developing for the database. It does not require a full functional database, such as MSSQL or MYSQL, you can start with SQL-Lite or MSSQL-Compact. This will make your application a promising proof of the high capacity of data processing.

Saving heavy read data, such as configurations that don't change much in RAM, is an efficient way. My suggestion is to use some cache managers such as MemoryCache or Enterprise Caching Block, it saves a lot of time to implement thread safe data access and nightmares :) instead of writing your own.

 public interface IDataHandler { IDictionary<string,string> GetData(); void SetData(string key,string value); } public class MyDataHandler : IDataHandler { public IDictionary<string,string> GetData() { return CacheManager.GetData("ConfigcacheKey") as IDictionary<string,string>; } public void SetData(string key,string value) { var data = GetData() ?? new Dictionary<string,string(); if(data.ContainsKey(key)) data[key] = value; else data.Add(key,value); CacheManager.Add("ConfigcacheKey", data); // HERE write an async method to save the key,value in database or XML file } } 

If you intend to use XML, you do not need to convert the dictionary to xml every time. Load the XML document into an XmlDocument / XDocument object and use XPath to find the item to update the value or add a new item and save the document.

In terms of performance, if you arenโ€™t doing any crazy logic or processing huge (I mean very large) data in GB, I recommend that you quickly quit using already available crypto-tested components such as Databases, CacheManagers that abstracts you from thread safe operations.

+3
source

I see two possible approaches to this problem:

  • Using a database. IMO is the preferred approach, as this is exactly what the databases are for: simultaneous read / write access by multiple applications.
  • Use the service application, which will manage the resource and may be available (Pipes, Sockets, SharedMem, ...) by other applications.

Critical points to remember:

  • GlobalMutex does not work on several computers (the XML file may lie on a network resource. If you cannot exclude this as "unsupported", you should not use Mutex).
  • A โ€œlock fileโ€ can block locks (for example, if the process that created the lock file was killed, the file may remain on disk)
  • XML is a very poor format if a file is updated multiple times by several processes (for example, if you need "load-update-write" for each access, this will have very poor performance).
+2
source

Base your decision on the design principles of this answer on Stackoverflow:

How to log efficiently asynchronously?

As you mentioned in one of your considerations, the above solution includes threads and queues.

Also, instead of serializing data into XML, you can probably get better performance using BinaryFormatter

+1
source

In terms of performance, XML files are very slow when the size exceeds 100 MB. My requirement was to read / write data (~ 1 GB) to disk, reading n write operations can be parallel. for example, Data comes from 1 stream, and it is written to a file, and another / one application can request data for display purposes on a graph / other user interface. We moved on to the binary reader, we performed a performance analysis, and the binary reader / writer was very fast compared to XML (for larger files).

Now we have switched to HDF5, and we play with 20 gigabyte data files with simultaneous read and write operations.

Mutex with the global name shud work, we used the same thing.

+1
source

I would start with a simple, easy management process that is solely responsible for accessing the data file. Other processes interact with the governor (i.e., through .NET Remoting in this script through the IDataHandler interface) and never manipulate the file directly. Thus, you not only abstract the problems associated with multiple access, but also get several functions:

  • an easy, simple process is much more reliable and will not damage your data in case of failure of any "consumer" processes.
  • you have one code to support things like reliability, blocking, sharing, etc.
  • whenever you decide to switch XML to something else - there is only one place to change technology.
+1
source

Database do not hesitate.

If you refuse to create another server, simply use SQLCE for the shared file on the network drive (unless you need more than 256 simultaneous connections).

There is no huge database to support, but you get strongly typed data and all the other good things that come from using the database, such as indexes, hashes, convolutions, etc.

If nothing else, he does not need to perform a linear scan of the entire file every time you want to find (or update or delete, or even add, if you want unique keys) an entry.

You literally write a hash table by matching keys with values. Do not use the equivalent of storing data from an array of tuples. Use a real permanent store.

The only advantage you have with an XML file (if it's possible to use it well) is readability and editable (if it's even a bonus ... SSMS, which is hard to use)?

Disadvantages:

1) Linear scanning for all requests 2) Lack of security or access to a password at the application level ... anyone can edit this XML file. SQLCE can encrypt and lock the password. 3) Untyped data. 4) Detailed format (seriously, JSON will be better, faster, less, typed and read by a person). 5) SQL> XPath / XSLT 6) If your data requirements grow, you have built-in constraints and keys.

I cannot come up with a more efficient solution at a lower cost than an SQLCE instance.

+1
source

Primarily. You must forget about using XML for high-performance systems. I would suggest going for JSON. Its light weight and many high-performance applications, such as Foursquare, use JSON to store their data (although not for all of their data).

Itโ€™s better to try using one of the NOSQL-based databases rather than relational databases, since they are designed exclusively for a high-performance system, and few of them can save raw JSON data. I would suggest going to MongoDB (there is a C # driver and supports LINQ). There are many other document-based NOSQL databases. But I did not use them.

For concurrency, you can use one of the parallel collections, especially ConcurrentDictionary<TKey, TValue> , so as not to worry about synchronization issues.

+1
source

Source: https://habr.com/ru/post/973901/


All Articles