How to search using a list of strings?

I have a list of strings (a List<String> ) that can contain from 1 to 6 entries. What I want to do is use this list of strings for the search, but I want the possible searches to use any combination of two or more of these strings for the search. I used Dictionary<List<String>, String> currently.

ex. Suppose my list contains the following: fire, aero, thunder, water, blizzard, and I have the following entries in the dictionary:

 List<String>(){"fire", "aero"}, "searing wind" List<String>(){"fire", "aero", "thunder"} "firestorm" List<String>(){"aero", "thunder"}, "storm" List<String>(){"aero", "water", "blizzard"}, "snowstorm" List<String>(){"aerora", "blizzara"}, "hailstorm" 

I want the search to return the first 4 records, since my base list contains all the values ​​needed to find them. I also need to know what values ​​were used for the search, since these values ​​will need to be cleared from the base list later. The number of entries in the dictionary is likely to be ~ 400

I can think of an exhaustive way to do this search, but since the fact that order will matter when doing the search, it will take time to do all the permutations and find them. I could activate the alphabetical order in the dictionary keyword lists, if that helps. Does anyone know a better way to do this, or perhaps another, more efficient way to do this? I already use sqlite for some other things in this program, so if this allows me to find faster, I could use this.

thanks

+4
source share
1 answer

One option that you might want to explore is to use the decision tree. The idea would be like that. Select an arbitrary row, then divide all your sets into two groups β€” groups containing this row and groups not containing this row. Then, recursively repeat this procedure in both groups and build a tree from all the decisions you made. For example, we introduce the abbreviation:

A = aero

R = Aerora

F = Fire

T = Thunder

W = Water

B = blizzard

Then you can build a tree like this:

 start --> A? -- NO --> R? -- YES --> B? -- YES --> "hailstorm" | +--- YES --> F? -- YES --> T? -- YES --> "firestorm" | | | +----- NO --> "searing wind" | +----- NO --> T? -- YES --> "storm" | +----- B? -- YES --> "snowstorm" 

When you have a tree like this, you can save your attributes as a set of strings, and then view all matches as follows. Starting at the root of the tree, look at the line indicated by the given node. If this string is in your rowset, then recursively continue down the YES branch and find all matches in that part of the tree. Then, regardless of whether you looked at this branch, examine the branch "NO" to get all the other lines that could match your request.

The advantage of this approach is that if you think that you have a small number of lines as keywords, the depth of the tree can be very small - at most O (k) for k keywords - and so, at best, your search will be take only time O (k). In the worst case scenario, you simply examine the whole tree, which takes O (n) time. Moreover, using machine learning methods, you can build a very good tree structure that will have a solid compromise between the size and speed of the search.

Hope this helps!

+1
source

Source: https://habr.com/ru/post/1341574/


All Articles