The fastest way to find a huge list of large texts

I have a Windows application written in C # that should load 250,000 rows from the database and provide a โ€œsearch as you typeโ€ function, which means that as soon as the user types something in the text box, the application should search for everything 250,000 records (this is a bit, one column with 1000 characters in each row) using like search and display found records.

The approach I used was as follows:

1- The application loads all entries in the printed List<EmployeeData>

 while (objSQLReader.Read()) { lstEmployees.Add(new EmployeesData( Convert.ToInt32(objSQLReader.GetString(0)), objSQLReader.GetString(1), objSQLReader.GetString(2))); } 

2- In the TextChanged event, using LINQ , I do a search (with a combination of regular expressions) and attach an IEnumerable<EmployeesData> to the ListView, which is in virtual mode.

 String strPattern = "(?=.*wood*)(?=.*james*)"; IEnumerable<EmployeesData> lstFoundItems = from objEmployee in lstEmployees where Regex.IsMatch(Employee.SearchStr, strPattern, RegexOptions.IgnoreCase) select objEmployee; lstFoundEmployees = lstFoundItems; 

3 RetrieveVirtualItem is processed to display items in a ListView to display an item.

 e.Item = new ListViewItem(new String[] { lstFoundEmployees.ElementAt(e.ItemIndex).DateProjectTaskClient, e.ItemIndex.ToString() }); 

Although lstEmployees loads relatively quickly (1.5 seconds) to load a list from SQL Server, a search in TextChanged requires a search in more than 7 minutes using LINQ. Searching through SQL Server directly by doing a like search takes less than 7 seconds.

What am I doing wrong here? How to speed up this search (no more than 2 seconds)? This is a requirement from my client. Therefore, any help is much appreciated. Please, help...

+6
source share
4 answers

Is there an index in the database column that stores the text data? If so, then something similar to the trie structure described by Nicholas is already in use. Indexes in SQL Server are implemented using B+ trees , which have an average search time in order of database 2 of database n, where n is the height of the tree. This means that if the table indicates 250,000 records, the number of operations required to search is a database of 2 (250,000) or approximately 18 operations.

When you load all the information into a data reader and then use the LINQ expression, this is a linear operation, (O) n, where n is the length of the list. In the worst case, it will be 250,000 operations. If you use a DataView, there will be indexes that can be used to search when searching, which will greatly improve performance.

At the end of the day, if there are not too many queries submitted against the database server, use the query optimizer for this. As long as the LIKE operation is not performed with a wildcard at the beginning of the line (i.e. LIKE %some_string ) (denies using an index) and the table has an index, you will have very high performance. If there are too many requests to the database server, either put all the information in the DataView so that you can use the index, or use the dictionary suggested above, which has an O (1) search time (of the order of one), assuming that the dictionary is implemented using hash tables.

+3
source

See my answer to this question . If you need an instant response (i.e., as fast as users), loading data into memory can be very attractive. It can use a little memory, but it is very fast.

Despite the fact that there are many characters (250 thousand entries * 1000), how many unique values โ€‹โ€‹exist? The structure in memory based on keys with pointers to records corresponding to these keys really should not be so large, even considering permutations of these keys.

If the data really does not fit into memory or changes frequently, save it in the database and use SQL Server Full Text Indexing, which will process such queries as much better than LIKE . This involves quickly connecting the application to the database.

Full-text indexing offers a powerful set of operators / expressions that can be used for more intelligent searches. It is available with the free version of SQL Expression Edition, which processes up to 10 GB of data.

+2
source

You would like to preload things and create yourself a data structure called trie

a trie, waht else?

This is an intense amount of memory, but this is what the doctor ordered in this case.

+2
source

If the records can be sorted, you can go with binary search, which is much, much faster for large datasets. There are several implementations in .NET assemblies such as List<T> and Array .

0
source

Source: https://habr.com/ru/post/905268/


All Articles