Python string parsing?

For a string such as "helloyellowellow", parse all valid strings from the given string. (For example: [[hell, hello, yellow], [low, low] ........]

I am looking for the most optimized way to write code. Here is mine, but I'm not sure if this is the best way.

Full disclosure - This was an interview question.

master = [] # Dictionary for us to look up words def is_word(inputstr): #returns True/False def processstring(fstr,secstr,li): if is_word(fstr): li.append(fstr) if len(secstr) == 0: if len(li) != 0: master.append(li) return processstring(fstr+secstr[0], secstr[1:len(secstr)],li) def wrapperprocess(inpstr): li = [] if len(inpstr) == 0: return processstring('',inpstr,li) wrapperprocess(inpstr[1:len(inpstr)]) wrapperprocess('helloyellowellow') print master 
+6
source share
3 answers

Since you mentioned that you are looking for an efficient algorithm, and assuming that you go into the dictionary in advance (and not just as a called predicate), you can use Aho-Corasick .

Of course, if the input text is short, a more naive algorithm will be faster to avoid the β€œexpensive” dictionary preprocessing.

Plus, an alternative python answer: here is an easy way to just check each substring:

 def gen_words(txt): n = len(txt) for i in range(n): for j in range(i+1, n+1): subtxt = txt[i:j] if is_word(subtxt): yield subtxt 

For uniqueness do:

 all_words = set(gen_words(txt)) 
+3
source

You can do something like:

 tgt='helloyellowellow' with open('/usr/share/dict/words') as f: for word in f: word=word.strip() if word in tgt and len(word)>1: print word 

Print

 el ell he hell hello lo low loy ow owe we well ye yell yellow 

If you are just looking for an is_word function that has undefined, you can play with something like this:

 def is_word(word, dic='/usr/share/dict/words'): if not hasattr(is_word, 'words'): with open(dic) as f: is_word.words={word.strip() for word in f} return word in is_word.words and len(word)>1 

As the default data structure, Python sets have an average search time of O (1) . You are unlikely to write something on your own, which is faster.

+2
source

This is a good problem to solve,

Use Wordnet

While parsing your string starts at some index and continues to torment the index value for each increment by index, check for the existence of the same word with wordnet, it will tell you whether a particular substring makes sense or not!

To install Wordnet :

 https://pypi.python.org/pypi/Wordnet-bn/1.0 
0
source

Source: https://habr.com/ru/post/956663/


All Articles