Plural matching using regex in C #

I am looking to use a regex in C # to search for terms, and I want to include plurals of these terms in the search. For example, if the user wants to find "pipe", then I also want to return the results for "pipe".

So, I can do this ...

string s ="\\b" + term + "s*\\b"; if (Regex.IsMatch(bigtext, s) { /* do stuff */ } 

How do I modify the above to allow me to correspond, say, to stress when the user enters stress and is still working for pipes / pipes?

+6
source share
3 answers

This creates a regular expression to remove plurals:

  /(?<![aei])([ie][d])(?=[^a-zA-Z])|(?<=[ertkgwmnl])s(?=[^a-zA-Z])/g 

( Demo and source )

I know that this is not exactly what you need, but it can help you find something.

+1
source

The problem you may encounter is that there are many irregular nouns, such as man , fish and index . Therefore, you should use the PluralizationService , which has a Pluralize method. Here is an example showing how to use it.

After you get the plural of this term, you can easily create a regular expression that looks for both the plural and the singular term.

 PluralizationService ps = PluralizationService.CreateService(CultureInfo.CurrentCulture); string plural = ps.Pluralize(term); string s = @"("+term+"|"+plural+")"; if (Regex.IsMatch(bigtext, s)) { /* do stuff */ } 
+7
source

If you are using SQL Server, so how can you not use Soundex? I'm not sure what you are trying to find. I assume that you are trying to create dynamic SQL as a search input. If not, I think there is SoundEx for LINQ.

EDIT: I am standing fixed, it seems that for SoundEx there are some linq to sql properties that can be done for SoundEx.

However, MSDN has an example of soundex, which for the simple tests that I ran this morning seems to do everything I checked. http://msdn.microsoft.com/en-us/library/bb669073.aspx

The change I made was instead of .ToUpper (invariant), which I used .ToUpperInvariant () and instead of passing (string word) I used the extension method (this is string word)

Here is an example of what I ran

 List<string> animals = new List<string>(); animals.Add("dogs"); animals.Add("dog"); animals.Add("cat"); animals.Add("rabbits"); animals.Add("doggie"); string dog = "dog"; var data = from animal in animals where animal.SoundEx() == dog.SoundEx() select animal; 

data: dogs, dog, dog

Now with SQL Server using Contains / FreeText / ContainsTable etc. and using SoundEx with respect to the directory (I am not familiar with newer versions of SQL server - back to the SQLServer 2000 implementation that I used), you can also evaluate your results.

In addition, if you have the opportunity to use a SQL server, you can look into this option: LINQ to SQL SOUNDEX - maybe?

In connection with the Pluralization solution, you should be able to use .Net 4.

There is also a Levenshtein distance algorithm that may be useful.

0
source

Source: https://habr.com/ru/post/914026/


All Articles