Need a high efficient algorithm to check if a string contains English speech

Question

Need a high efficient algorithm to check if a string contains English speech

I have a lot of lines. All of them contain only characters. Symbols and words are not separated by a space from each other. Some of the characters make up English words and others are just bufflegab. Lines may not contain the whole sentence.

I need to find out which ones are written in real English. I mean, String can be built by concatenating well-written English words. I know that I can do something with the word. But words do not split apart. Therefore, it may take a long time to verify each possible combination of words.

I am looking for an algorithm or high performance method that checks if strings are built from English words or English speech. Perhaps there is something that gives me a chance that the line contains English speech.

Do you know a method or algorithm that helps me? Something like sphinx help me?

+3

algorithm

c0d3x May 24, '09 at 9:18

source share

6 answers

Unknown · Answer 1 · 2009-05-24T09:23:13+0000

This is called a segmentation problem .

There is no trivial way to solve this problem. What I can offer you, based on my assumption of your level of knowledge, is to build a trie from your dictionary, as well as the first chance that you find a possible word, try to assume that this word.

, - , , , - , .

stephan · Answer 2 · 2009-05-25T15:56:06+0000

bufflegab , - , bigram, .. - ( N-). , .

Igor Krivokon · Answer 3 · 2009-05-24T09:26:56+0000

N-.

. http://en.wikipedia.org/wiki/N-gram

Brian · Answer 4 · 2009-05-24T09:25:59+0000

, , . Rabin-Karp. , . , . , , , , , .

1800 INFORMATION · Answer 5 · 2009-05-24T09:26:12+0000

Trie. Trie - . , .

cat injection vulnerability · Answer 6 · 2009-05-25T16:25:08+0000

It depends on what precision you want, how effective you need it and what text you process.

Need a high efficient algorithm to check if a string contains English speech

More articles: