Large static arrays slow down the load on the class, you need a better / faster search method

I have a class with several static arrays:

int [] with 17,720 elements
string [] with 17,720 elements

I noticed that when I first access this class, it takes almost 2 seconds to initialize, which causes a pause in the graphical interface that accesses it.

In particular, it is a search for Unicode character names. The first array is the index into the second array.

static readonly int[] NAME_INDEX = {
0x0000, 0x0001, 0x0005, 0x002C, 0x003B, ...

static readonly string[] NAMES = {
"Exclamation Mark", "Digit Three", "Semicolon", "Question Mark", ...

The following code is how arrays are used (given the character code). [Note: this code is not a performance issue]

int nameIndex = Array.BinarySearch<int>(NAME_INDEX, code);
if (nameIndex > 0) { return NAMES[nameIndex]; }

I think that I am looking at other options for structuring data so that 1) the class loads quickly and 2) I can quickly get the "name" for the given character code.

Should I store all these thousands of elements in static arrays?

Update
Thanks for all the suggestions. I tested the Dictionary approach, and the performance of adding all entries seems very poor.

Below is the code with Unicode data for testing arrays and dictionaries http://drop.io/fontspace/asset/fontspace-unicodesupport-zip

Solution Update
I tested my original double arrays (which are faster than both versions), with a background thread for initialization, which helped performance a bit.

However, the real surprise is how well binary files work in resource streams. This is the fastest solution discussed in this thread. Thank you all for your answers!

+4
source share
6 answers

"Should I store all these thousands of elements in static arrays?"

It would be best to store your data as a binary stream in resources in the assembly, and then load from resources. There will be some more additional utilities, but, therefore, no initialization of objects is required.

The main idea would be (no real code):

 // Load data (two streams): indices = ResourceManager.GetStream ("indexData"); strings = ResourceManager.GetStream ("stringData"); // Retrieving an entry: stringIndex = indices.GetIndexAtPosition (char); string = strings.GetStringFromPosition (stringIndex); 

If you want a really good solution (even for some extra work), look at using data memmapped files.

+3
source

So, a couple of observations. Binary search will only work if your array is sorted and it doesn't look sorted from your code snippet above.

Since your main goal is to find a specific name, your code will ask for a hash table. I would suggest using a dictionary, it will give you an O (1) (on average) search, without much overhead than just having arrays.

Regarding boot time, I agree with Andrey that the best way is to use a separate thread. When using data volume, you will have some initialization overhead. A common practice with GUIs is to use a separate thread for these actions, so you do not block the user interface.

+7
source

First

A Dictionary<int, string> will work much better than your duel arrays. Putting aside how this data gets into arrays / Dictionary (hardcoded or read from another place, such as a resource file), this is still a better and more intuitive storage mechanism.

Second

Like others, do your upload in a different thread. I would use a helper function to help you deal with this. You can use this approach:

 public class YourClass { private static Dictionary<int, string> characterLookup; private static ManualResetEvent lookupCreated; static YourClass() { lookupCreated = new ManualResetEvent(false); ThreadPool.QueueUserWorkItem(LoadLookup); } static void LoadLookup(object garbage) { // add your pairs by calling characterLookup.Add(...) lookupCreated.Set(); } public static string GetDescription(int code) { if (lookupCreated != null) { lookupCreated.WaitOne(); lookupCreated.Close(); lookupCreated = null; } string output; if(!characterLookup.TryGetValue(code, out output)) output = null; return output; } } 

In your code, call GetDescription to translate the integer to the appropriate line. If the user interface does not call this until a later time, you should see a noticeable decrease in startup time. To be safe, I have included ManualResetEvent , which will cause any GetDescription calls to block until the dictionary is fully loaded.

+5
source
+3
source

if you store arrays in a file you can do lazy loading

 public class Class1 { const int CountOfEntries = 17700; //or what ever the count is IEnumerable<KeyValuePair<int, string>> load() { using (var reader = File.OpenText("somefile")) { while (!reader.EndOfStream) { var line = reader.ReadLine(); var pair = line.Split(','); yield return new KeyValuePair<int, string>(int.Parse(pair[0]), pair[1]); } } } private static Dictionary<int, string> _lookup = new Dictionary<int, string>(); private static IEnumerator<KeyValuePair<int, string>> _loader = null; private string LookUp(int index) { if (_lookup.Count < CountOfEntries && !_lookup.ContainsKey(index)) { if(_loader == null) { _loader = load().GetEnumerator(); } while(_loader.MoveNext()) { var pair = _loader.Current; _lookup.Add(pair.Key,pair.Value); if (pair.Key == index) { return index; } } } string name; if (_lookup.TryGetValue(index,out name)) { return return name; } throw new KeyNotFoundException("The given index was not found"); } } 

the code expects the file to have one pair on each line: index0, NAME0 index1, name1

If the first requested index is at the end, it will work more slowly, probably (due to IO mostly), but if access is random, the average case will read half the values ​​for the first time, if access is not random, make sure to store the most used at the top of the file

There are a few more questions to discuss. The above code is not thread safe for the download operation and improves the responsiveness of the rest of the code, while keeping the download in the background thread

hope this helps

0
source

How about using a dictionary instead of two arrays? You can initialize the dictionary asynchronously using a stream or thread pool. The search will be O (1) instead of O (log (n)).

 public static class Lookup { private static readonly ManualResetEvent m_Initialized = new ManualResetEvent(false); private static readonly Dictionary<int, string> m_Dictionary = new Dictionary<int, string>(); public static Lookup() { // Start an asynchronous operation to intialize the dictionary. // You could use ThreadPool.QueueUserWorkItem instead of creating a new thread. Thread thread = new Thread(() => { Initialize(); }); thread.Start(); } public static string Lookup(int code) { m_Initialized.WaitOne(); lock (m_Dictionary) { return m_Dictionary[code]; } } private static void Initialize() { lock (m_Dictionary) { m_Dictionary.Add(0x0000, "Exclamation Point"); // Keep adding items to the dictionary here. } m_Initialized.Set(); } } 
0
source

Source: https://habr.com/ru/post/1308805/


All Articles