Spelling libraries (e.g. hunspell) in UWP applications?

Question

Spelling libraries (e.g. hunspell) in UWP applications?

I am porting the authoring application to UWP platorm. The only puzzle piece I left is the NHunspell library. I use it to check spelling and thesaurus functions. I set up a damn thing out of it and created custom dictionaries for different things (i.e. another dictionary for each writing project). This library is a wonderful thing.

However, I cannot include this DLL in the UWP application.

1) Is there a way to force the use of this DLL? I really like how the NHunSpell system is configured. It has common sense and is very fast and easy to use.

2) If not, can anyone recommend a better solution for custom dictionaries, an individual spell check, etc.

Update 3

After a significant update and reading on the Internet, I found a link discussing the theory of spell checking. Here is one quick example (the one I used the most).

http://www.anotherchris.net/csharp/how-to-write-a-spelling-corrector-in-csharp/

After reading this article, taking this basic code and cleaning out English words from .dic Hunspell files, I created my own spell checker library that works in UWP.

Once I get it solidified, I will send it as an answer below to donate to the SO community. :)

Update 2

I give way to using Hunspell. It doesn't seem like this is possible at all ... are there any other libraries / packages anyone can offer?

UPDATE:

I probably need to rephrase the statement that I cannot enable the DLL: I cannot include the DLL through NuGet. He complains that the DLL is not compatible with the UAP / UWP platform.

I can MANUALLY include a DLL in my project by contacting an existing DLL (not NuGet). However, this DLL does prove to be incompatible with the UAP platform. A simple spellchecking call works fine in WinForms, but drops immediately with a System.IO.FileNotFoundException .

The NHunspell constructor allows you to load related .dic and .aff . However, I reduced this by loading the files into memory, and then called an alternative constructor that takes a byte array instead of the file name for each of these files. It still crashes, but with a new error Method not found :

String System.AppDomain.get_RelativeSearchPath()

I am looking for a spell checking mechanism that will work within UAP. I would prefer it to be NHunspell just for reference. However, I am not blind to the fact that this is becoming less and less possible as an option.

The people I work with have suggested using the built-in spell checker options. However, I cannot use the built-in spell-check features of Windows 10 / TextBox (what I know), because I cannot manage user dictionaries, and I cannot turn off things like automatic capitalization and word replacement (where it replaces a word for you if he thinks he's close enough to the right guess). These things are suicide chapters for writers! The writer may disable them at the OS level, but they may want them for other applications, not just that.

Please let me know if there is work for NHunspell. And if you don’t know about it, let me know your best spare spelling checker that works within UAP.

As a side note, I also use NHunspell for its thesaurus capability. It works great in my windows apps. I would also have to replace this functionality, hopefully with the same engine as the spellchecker. However, if you know a good thesaurus engine (but this is not a spell check), that's good too!

Thanks!

+5

c # win-universal-app hunspell nhunspell

Jerry Mar 18 '16 at 0:05

source share

3 answers

ganchito55 · Answer 1 · 2016-03-20T16:22:32+0000

I download the source code of the NHunspell library and I tried to create a library with UWP support, however I found problems with Marshalling ( Marshalling.cs )
The package loads DLLs that work only in x86 and x64 architectures, so the application will not work in the hand (mobile phones, tablets).
The package loads the DLL with system calls:

  [DllImport("kernel32.dll")] internal static extern IntPtr LoadLibrary(string fileName);

and I think it needs to be rewritten to work in UWP, because UWP uses the sandbox.

IMHO there are only two options:
1) Rewrite the Marshalling class with UWP restrictions.
2) Do not use Hunspell in your program.

I don’t have much knowledge about DLLs with UWP, but I think rewriting can be very difficult.

Jerry · Answer 2 · 2016-03-25T14:17:23+0000

As promised, here is a class I built for spell checking.

 using System; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; using System.Text.RegularExpressions; using System.Threading.Tasks; namespace Com.HanelDev.HSpell { public class HSpellProcess { private Dictionary<string, string> _dictionary = new Dictionary<string, string>(); public int MaxSuggestionResponses { get; set; } public HSpellProcess() { MaxSuggestionResponses = 10; } public void AddToDictionary(string w) { if (!_dictionary.ContainsKey(w.ToLower())) { _dictionary.Add(w.ToLower(), w); } else { // Upper case words are more specific (but may be the first word // in a sentence.) Lower case words are more generic. // If you put an upper-case word in the dictionary, then for // it to be "correct" it must match case. This is not true // for lower-case words. // We want to only replace existing words with their more // generic versions, not the other way around. if (_dictionary[w.ToLower()].CaseSensitive()) { _dictionary[w.ToLower()] = w; } } } public void LoadDictionary(byte[] dictionaryFile, bool resetDictionary = false) { if (resetDictionary) { _dictionary = new Dictionary<string, string>(); } using (MemoryStream ms = new MemoryStream(dictionaryFile)) { using (StreamReader sr = new StreamReader(ms)) { string tmp = sr.ReadToEnd(); tmp = tmp.Replace("\r\n", "\r").Replace("\n", "\r"); string [] fileData = tmp.Split("\r".ToCharArray()); foreach (string line in fileData) { if (string.IsNullOrWhiteSpace(line) || line.StartsWith("#")) { continue; } string word = line; // I added all of this for file imports (not array imports) // to be able to handle words from Hunspell dictionaries. // I don't get the hunspell derivatives, but at least I get // the root word. if (line.Contains("/")) { string[] arr = line.Split("/".ToCharArray()); word = arr[0]; } AddToDictionary(word); } } } } public void LoadDictionary(Stream dictionaryFileStream, bool resetDictionary = false) { string s = ""; using (StreamReader sr = new StreamReader(dictionaryFileStream)) { s = sr.ReadToEnd(); } byte [] bytes = Encoding.UTF8.GetBytes(s); LoadDictionary(bytes, resetDictionary); } public void LoadDictionary(List<string> words, bool resetDictionary = false) { if (resetDictionary) { _dictionary = new Dictionary<string, string>(); } foreach (string line in words) { if (string.IsNullOrWhiteSpace(line) || line.StartsWith("#")) { continue; } AddToDictionary(line); } } public string ExportDictionary() { StringBuilder sb = new StringBuilder(); foreach (string k in _dictionary.Keys) { sb.AppendLine(_dictionary[k]); } return sb.ToString(); } public HSpellCorrections Correct(string word) { HSpellCorrections ret = new HSpellCorrections(); ret.Word = word; if (_dictionary.ContainsKey(word.ToLower())) { string testWord = word; string dictWord = _dictionary[word.ToLower()]; if (!dictWord.CaseSensitive()) { testWord = testWord.ToLower(); dictWord = dictWord.ToLower(); } if (testWord == dictWord) { ret.SpelledCorrectly = true; return ret; } } // At this point, we know the word is assumed to be spelled incorrectly. // Go get word candidates. ret.SpelledCorrectly = false; Dictionary<string, HSpellWord> candidates = new Dictionary<string, HSpellWord>(); List<string> edits = Edits(word); GetCandidates(candidates, edits); if (candidates.Count > 0) { return BuildCandidates(ret, candidates); } // If we didn't find any candidates by the main word, look for second-level candidates based on the original edits. foreach (string item in edits) { List<string> round2Edits = Edits(item); GetCandidates(candidates, round2Edits); } if (candidates.Count > 0) { return BuildCandidates(ret, candidates); } return ret; } private void GetCandidates(Dictionary<string, HSpellWord> candidates, List<string> edits) { foreach (string wordVariation in edits) { if (_dictionary.ContainsKey(wordVariation.ToLower()) && !candidates.ContainsKey(wordVariation.ToLower())) { HSpellWord suggestion = new HSpellWord(_dictionary[wordVariation.ToLower()]); suggestion.RelativeMatch = RelativeMatch.Compute(wordVariation, suggestion.Word); candidates.Add(wordVariation.ToLower(), suggestion); } } } private HSpellCorrections BuildCandidates(HSpellCorrections ret, Dictionary<string, HSpellWord> candidates) { var suggestions = candidates.OrderByDescending(c => c.Value.RelativeMatch); int x = 0; ret.Suggestions.Clear(); foreach (var suggest in suggestions) { x++; ret.Suggestions.Add(suggest.Value.Word); // only suggest the first X words. if (x >= MaxSuggestionResponses) { break; } } return ret; } private List<string> Edits(string word) { var splits = new List<Tuple<string, string>>(); var transposes = new List<string>(); var deletes = new List<string>(); var replaces = new List<string>(); var inserts = new List<string>(); // Splits for (int i = 0; i < word.Length; i++) { var tuple = new Tuple<string, string>(word.Substring(0, i), word.Substring(i)); splits.Add(tuple); } // Deletes for (int i = 0; i < splits.Count; i++) { string a = splits[i].Item1; string b = splits[i].Item2; if (!string.IsNullOrEmpty(b)) { deletes.Add(a + b.Substring(1)); } } // Transposes for (int i = 0; i < splits.Count; i++) { string a = splits[i].Item1; string b = splits[i].Item2; if (b.Length > 1) { transposes.Add(a + b[1] + b[0] + b.Substring(2)); } } // Replaces for (int i = 0; i < splits.Count; i++) { string a = splits[i].Item1; string b = splits[i].Item2; if (!string.IsNullOrEmpty(b)) { for (char c = 'a'; c <= 'z'; c++) { replaces.Add(a + c + b.Substring(1)); } } } // Inserts for (int i = 0; i < splits.Count; i++) { string a = splits[i].Item1; string b = splits[i].Item2; for (char c = 'a'; c <= 'z'; c++) { inserts.Add(a + c + b); } } return deletes.Union(transposes).Union(replaces).Union(inserts).ToList(); } public HSpellCorrections CorrectFrom(string txt, int idx) { if (idx >= txt.Length) { return null; } // Find the next incorrect word. string substr = txt.Substring(idx); int idx2 = idx; List<string> str = substr.Split(StringExtensions.WordDelimiters).ToList(); foreach (string word in str) { string tmpWord = word; if (string.IsNullOrEmpty(word)) { idx2++; continue; } // If we have possessive version of things, strip the off before testing // the word. THis will solve issues like "My [mother's] favorite ring." if (tmpWord.EndsWith("'s")) { tmpWord = word.Substring(0, tmpWord.Length - 2); } // Skip things like ***, #HashTagsThatMakeNoSense and 1,2345.67 if (!tmpWord.IsWord()) { idx2 += word.Length + 1; continue; } HSpellCorrections cor = Correct(tmpWord); if (cor.SpelledCorrectly) { idx2 += word.Length + 1; } else { cor.Index = idx2; return cor; } } return null; } } }

Stefan · Answer 3 · 2016-05-09T08:21:39+0000

You can use the built-in spell checker directly so you can better control its behavior. And then apply your results to the text box control yourself.

Take a look at ISpellChecker . This will allow you to add your own vocabulary and has much more control over its behavior. And yes, it is available for UWP.

Spelling libraries (e.g. hunspell) in UWP applications?

More articles: