Extract keywords from text and exclude words

I have this function to extract all words from text

public static string[] GetSearchWords(string text)
{

    string pattern = @"\S+";
    Regex re = new Regex(pattern);

    MatchCollection matches = re.Matches(text);
    string[] words = new string[matches.Count];
    for (int i=0; i<matches.Count; i++)
    {
        words[i] = matches[i].Value;
    }
    return words;
}

and I want to exclude the list of words from the returned array, the list of words is as follows

string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";

How to change the function above to avoid returning the words that are on my list.

+4
source share
2 answers
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();

I think the Exceptmethod fits your needs

+5
source

If you are not forced to use Regex, you can use a bit of LINQ:

void Main()
{
    var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');

    string str = "if you read about cooking you can cook";

    var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}



string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
    var words = text.Split();

    return words.Where(word => !toExclude.Contains(word)).ToArray();
}

I guess the word is a series of non-white characters.

+2
source

Source: https://habr.com/ru/post/1526272/


All Articles