If you are looking for google for word segmentation, there are actually no very good descriptions, and I'm just trying to fully understand the process that uses the dynamic programming algorithm to find line segmentation into single words. Does anyone know a place where there is a good description of the problem of word segmentation, or can anyone describe it?
Word segmentation basically just takes a string of characters and decides where to divide it into words, if you don't know and use dynamic programming, this will take into account a number of subtasks. It is quite simple to do this using recursion, but I could not find anywhere on the net to find even just a description of the iteration algorithm for this online, so if anyone has examples or can give an algorithm that will be great.
Thanks for any help.
, (.. , ), - - , "", string- > , , , :
lotsofwordstogether
:
lots, of, words, together
, , , M - , N th , , , " , , ( ) M - N.
M
N
Python wordsegment module . memoization .
Github, :
def segment(text): "Return a list of words that is the best segmenation of `text`." memo = dict() def search(text, prev='<s>'): if text == '': return 0.0, [] def candidates(): for prefix, suffix in divide(text): prefix_score = log10(score(prefix, prev)) pair = (suffix, prefix) if pair not in memo: memo[pair] = search(suffix, prefix) suffix_score, suffix_words = memo[pair] yield (prefix_score + suffix_score, [prefix] + suffix_words) return max(candidates()) result_score, result_words = search(clean(text)) return result_words
, memo search, , , max candidates.
memo
search
candidates
( , : , 1,2,3..n . , - , , .):
static List<String> connectIter(List<String> tokens) { // use instead of array, the key is of the from 'int int' Map<String, List<String>> data = new HashMap<String, List<String>>(); int n = tokens.size(); for(int i = 0; i < n; i++) { String str = concat(tokens, i, n); if (dictContains(str)) { data.put(1 + " " + i, Collections.singletonList(str)); } } for (int i = 2; i <= n; i++) { for (int start = 0; i < n; start++) { List<String> solutions = new ArrayList<String>(); for (int end = start + 1; end <= n - i + 1; end++) { String firstPart = concat(tokens, start, end); if (dictContains(firstPart)) { List<String> partialSolutions = data.get((i - 1) + " " + end); if (partialSolutions != null) { List<String> newSolutions = new ArrayList<>(); for (String onePartialSolution : partialSolutions) { newSolutions.add(firstPart + " " + onePartialSolution); } solutions.addAll(newSolutions); } } } if (solutions.size() != 0) { data.put(i + " " + start, solutions); } } } List<String> ret = new ArrayList<String>(); for(int i = 1; i <= n; i++) { // can be optimized to run with less iterations List<String> solutions = data.get(i + " " + 0); if (solutions != null) { ret.addAll(solutions); } } return ret; } static private String concat(List<String> tokens, int low, int hi) { StringBuilder sb = new StringBuilder(); for(int i = low; i < hi; i++) { sb.append(tokens.get(i)); } return sb.toString(); }
Source: https://habr.com/ru/post/1723644/More articles:Saving image. A common error occurred in GDI + - c #How to create django application scope variable? - djangohow to get touch event on imageView UITableViewCell? - iphoneHaving a problem with dynamically invoking an unmanaged VB COM dll from C #? - c #Touch event not working? in UImageview? - iphoneSpring configuration file gives BeanDefinitionStoreException - xmlMoq Test Roles.AddUserToRole - staticHow to authenticate Google gadget viewer on Appengine? - javaasp.net In a repeater, can a public function be called from another class? - asp.netBase page or wizard base page or nested wizards? - asp.netAll Articles