Java Stanford NLP: Spell Checking

I am trying to check the spelling accuracy of text samples using Stanford NLP. This is just a text metric, not a filter or anything else, so if it's a little different, as long as the error is uniform.

My first idea was to check if the word is known to vocabulary:

private static LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz");

@Analyze(weight=25, name="Spelling")
    public double spelling() {
        int result = 0;

        for (List<? extends HasWord> list : sentences) {
            for (HasWord w : list) {
                if (! lp.getLexicon().isKnown(w.word())) {
                    System.out.format("misspelled: %s\n", w.word());
                    result++;
                }
            }
        }

        return result / sentences.size();
    }

However, this creates quite a few false positives:

misspelled: Sincerity
misspelled: Sisyphus
misspelled: Sisyphus
misspelled: fidelity
misspelled: negates
misspelled: gods
misspelled: henceforth
misspelled: atom
misspelled: flake
misspelled: Sisyphus
misspelled: Camus
misspelled: foandf
misspelled: foandf
misspelled: babby
misspelled: formd
misspelled: gurl
misspelled: pregnent
misspelled: babby
misspelled: formd
misspelled: gurl
misspelled: pregnent
misspelled: Camus
misspelled: Sincerity
misspelled: Sisyphus
misspelled: Sisyphus
misspelled: fidelity
misspelled: negates
misspelled: gods
misspelled: henceforth
misspelled: atom
misspelled: flake
misspelled: Sisyphus

Any ideas on how to make this better?

+3
source share
2 answers

isKnown (String) . : "false" , ( ) 1 , . 1 . . , , , isKnown (String).

+9

, / , (, , ) . "" , , , , . , ""? ""?

, ? lp.getLexicon(). isKnown (w.word()) ? ? , ""? NLP, , , 100% - .

0

Source: https://habr.com/ru/post/1724833/


All Articles