Replace Python dictionary with space in key

I have a line and a dictionary, I have to replace every occurrence of the dict key in this text.

text = 'I have a smartphone and a Smart TV' dict = { 'smartphone': 'toy', 'smart tv': 'junk' } 

If there is no space in the keys, I break the text into words and compare one by one with the dict . Looks like he took O (n) . But now the key has a space inside, so the thing is more complicated. Please offer me a good way to do this, and note that the key may not match the text.

Update

I think about this solution, but it is inefficient. O (m * n) or more ...

 for k,v in dict.iteritems(): text = text.replace(k,v) #or regex... 
+5
source share
5 answers

If the keyword in the text is not close to each other (the keyword is another keyword), we can do this. Took O (n) to me> "<

 def dict_replace(dictionary, text, strip_chars=None, replace_func=None): """ Replace word or word phrase in text with keyword in dictionary. Arguments: dictionary: dict with key:value, key should be in lower case text: string to replace strip_chars: string contain character to be strip out of each word replace_func: function if exist will transform final replacement. Must have 2 params as key and value Return: string Example: my_dict = { "hello": "hallo", "hallo": "hello", # Only one pass, don't worry "smart tv": "http://google.com?q=smart+tv" } dict_replace(my_dict, "hello google smart tv", replace_func=lambda k,v: '[%s](%s)'%(k,v)) """ # First break word phrase in dictionary into single word dictionary = dictionary.copy() for key in dictionary.keys(): if ' ' in key: key_parts = key.split() for part in key_parts: # Mark single word with False if part not in dictionary: dictionary[part] = False # Break text into words and compare one by one result = [] words = text.split() words.append('') last_match = '' # Last keyword (lower) match original = '' # Last match in original for word in words: key_word = word.lower().strip(strip_chars) if \ strip_chars is not None else word.lower() if key_word in dictionary: last_match = last_match + ' ' + key_word if \ last_match != '' else key_word original = original + ' ' + word if \ original != '' else word else: if last_match != '': # If match whole word if last_match in dictionary and dictionary[last_match] != False: if replace_func is not None: result.append(replace_func(original, dictionary[last_match])) else: result.append(dictionary[last_match]) else: # Only match partial of keyword match_parts = last_match.split(' ') match_original = original.split(' ') for i in xrange(0, len(match_parts)): if match_parts[i] in dictionary and \ dictionary[match_parts[i]] != False: if replace_func is not None: result.append(replace_func(match_original[i], dictionary[match_parts[i]])) else: result.append(dictionary[match_parts[i]]) result.append(word) last_match = '' original = '' return ' '.join(result) 
+1
source

If your keys do not have spaces:

 output = [dct[i] if i in dct else i for i in text.split()] ' '.join(output) 

You should use dct instead of dict so that it does not interfere with the built-in dict () function

In this case, a dictionary understanding and ternary operator are used to filter data.

If your keys have spaces, you are right:

 for k,v in dct.iteritems(): string.replace('d', dct[d]) 

And yes, this time complexity will be m * n, since you have to iterate over each row for each key in dct each time.

+1
source

Drop all dictionary keys and input text in lower case, so comparisons are simple. Now...

 for entry in my_dict: if entry in text: # process the match 

This suggests that the dictionary is small enough to guarantee consistency. If instead the dictionary is large and the text is small, you need to take each word, then each two-word phrase and see if they are in the dictionary.

Is this enough for you?

0
source

You need to check all neighbor permutations from 1 (each individual word) to len (text) (entire line). You can generate neighboring permutations as follows:

 text = 'I have a smartphone and a Smart TV' array = text.lower().split() key_permutations = [" ".join(array[j:j + i]) for i in range(1, len(array) + 1) for j in range(0, len(array) - (i - 1))] >>> key_permutations ['i', 'have', 'a', 'smartphone', 'and', 'a', 'smart', 'tv', 'i have', 'have a', 'a smartphone', 'smartphone and', 'and a', 'a smart', 'smart tv', 'i have a', 'have a smartphone', 'a smartphone and', 'smartphone and a', 'and a smart', 'a smart tv', 'i have a smartphone', 'have a smartphone and', 'a smartphone and a', 'smartphone and a smart', 'and a smart tv', 'i have a smartphone and', 'have a smartphone and a', 'a smartphone and a smart', 'smartphone and a smart tv', 'i have a smartphone and a', 'have a smartphone and a smart', 'a smartphone and a smart tv', 'i have a smartphone and a smart', 'have a smartphone and a smart tv', 'i have a smartphone and a smart tv'] 

Now we are replacing the dictionary:

 import re for permutation in key_permutations: if permutation in dict: text = re.sub(re.escape(permutation), dict[permutation], text, flags=re.IGNORECASE) >>> text 'I have a toy and a junk' 

Although you most likely want to try rearranging in the reverse order, the longest ones first, so more specific phrases take precedence over individual words.

0
source

You can do this quite easily with regular expressions.

 import re text = 'I have a smartphone and a Smart TV' dict = { 'smartphone': 'toy', 'smart tv': 'junk' } for k, v in dict.iteritems(): regex = re.compile(re.escape(k), flags=re.I) text = regex.sub(v, text) 

It still suffers from a problem depending on the processing order of the dict keys if the replacement value for one element is part of a search query for another element.

0
source

Source: https://habr.com/ru/post/1242063/


All Articles