Comparing a string list with an available dictionary / thesaurus

Question

Comparing a string list with an available dictionary / thesaurus

I have a program (C #) that generates a list of strings (permutations of the original string). Most strings are a random grouping of source letters, as expected (i.e. etam, aemt, team). I want to find one line in the list, which is the current English word, programmatically. I need a thesaurus / dictionary to search and compare each line. Anyone knows the available resource. I am using VS2008 in C #.

+4

string c # permutation spell-checking

sMaN Feb 11 '10 at 23:30

source share

2 answers

You can also use the Wiktionary. The MediaWiki API (Wikionary uses MediaWiki) allows you to query a list of article titles. In wiktionary, article titles are (among other things) words in a dictionary. The only catch is that foreign words are also in the dictionary, so sometimes you can get the “wrong” matches. Of course, your user will also need Internet access. You can get help and information about the api at: http://en.wiktionary.org/w/api.php

Here is an example URL of your request:

 http://en.wiktionary.org/w/api.php?action=query&format=xml&titles=dog|god|ogd|odg|gdo

This returns the following xml:

 <?xml version="1.0"?> <api> <query> <pages> <page ns="0" title="ogd" missing=""/> <page ns="0" title="odg" missing=""/> <page ns="0" title="gdo" missing=""/> <page pageid="24" ns="0" title="dog"/> <page pageid="5015" ns="0" title="god"/> </pages> </query> </api>

In C #, you can use System.Xml.XPath to get the parts you need (pages using pageid). These are "real words."

I wrote an implementation and tested it (using a simple “dog” example above). He returned only the "dog" and "god." You should check it in more detail.

 public static IEnumerable<string> FilterRealWords(IEnumerable<string> testWords) { string baseUrl = "http://en.wiktionary.org/w/api.php?action=query&format=xml&titles="; string queryUrl = baseUrl + string.Join("|", testWords.ToArray()); WebClient client = new WebClient(); client.Encoding = UnicodeEncoding.UTF8; // this is very important or the text will be junk string rawXml = client.DownloadString(queryUrl); TextReader reader = new StringReader(rawXml); XPathDocument doc = new XPathDocument(reader); XPathNavigator nav = doc.CreateNavigator(); XPathNodeIterator iter = nav.Select(@"//page"); List<string> realWords = new List<string>(); while (iter.MoveNext()) { // if the pageid attribute has a value // add the article title to the list. if (!string.IsNullOrEmpty(iter.Current.GetAttribute("pageid", ""))) { realWords.Add(iter.Current.GetAttribute("title", "")); } } return realWords; }

Name it as follows:

 IEnumerable<string> input = new string[] { "dog", "god", "ogd", "odg", "gdo" }; IEnumerable<string> output = FilterRealWords(input);

I tried using LINQ to XML, but I am not familiar with it, so it was a pain, and I abandoned it.

+1

Benny jobigan Feb 15 '10 at 11:52

source share

Aryabhatta · Accepted Answer · 2010-02-11T23:46:12+0000

You can download a list of words from the Internet (say, one of the files mentioned here: http://www.outpost9.com/files/WordLists.html ), then do it quickly:

// Read words from file. string [] words = ReadFromFile(); Dictionary<String, List<String>> permuteDict = new Dictionary<String, List<String>>(StringComparer.OrdinalIgnoreCase); foreach (String word in words) { String sortedWord = new String(word.ToArray().Sort()); if (!permuteDict.ContainsKey(sortedWord)) { permuteDict[sortedWord] = new List<String>(); } permuteDict[sortedWord].Add(word); } // To do a lookup you can just use String sortedWordToLook = new String(wordToLook.ToArray().Sort()); List<String> outWords; if (permuteDict.TryGetValue(sortedWordToLook, out outWords)) { foreach (String outWord in outWords) { Console.WriteLine(outWord); } }

Comparing a string list with an available dictionary / thesaurus

More articles: