This is not an easy task because, as Ruach suggested, there are many viable suggestions for this message. For example, "JACK AND JILL WENT UP THE HILL" has the same encoding as "JACK AND JILL WALK CHISELED". Since these words are both grammatical sentences and the words in each of them are common, it is not clear how to choose one or the other (or any other of 40141055989476564163599 different sequences of English words that have the same encoding as this message), without delving into natural language processing.
In any case, here we propose a dynamic programming solution to the problem of finding the shortest sentence (with the smallest characters, if there is a connection). He can also calculate the total number of sentences that have the same encoding as this message. He needs a dictionary of English words in a file.
The following improvements should be the best indicator of how likely the sentence is: possible word frequencies, false positive rates in morse (for example, βIβ is a common word, but it is often found as part of other morse code sequences). The hard part will be to formulate a good evaluation function, which can be expressed in such a way that it can be calculated using dynamic programming.
MORSE = dict(zip('ABCDEFGHIJKLMNOPQRSTUVWXYZ', [ '.-', '-...', '-.-.', '-..', '.', '..-.', '--.', '....', '..', '.---', '-.-', '.-..', '--', '-.', '---', '.--.', '--.-', '.-.', '...', '-', '..-', '...-', '.--', '-..-', '-.--', '--..' ])) # Read a file containing AZ only English words, one per line. WORDS = set(word.strip().upper() for word in open('dict.en').readlines()) # A set of all possible prefixes of English words. PREFIXES = set(word[:j+1] for word in WORDS for j in xrange(len(word))) def translate(msg, c_sep=' ', w_sep=' / '): """Turn a message (all-caps space-separated words) into morse code.""" return w_sep.join(c_sep.join(MORSE[c] for c in word) for word in msg.split(' ')) def encode(msg): """Turn a message into timing-less morse code.""" return translate(msg, '', '') def c_trans(morse): """Construct a map of char transitions. The return value is a dict, mapping indexes into the morse code stream to a dict of possible characters at that location to where they would go in the stream. Transitions that lead to dead-ends are omitted. """ result = [{} for i in xrange(len(morse))] for i_ in xrange(len(morse)): i = len(morse) - i_ - 1 for c, m in MORSE.iteritems(): if i + len(m) < len(morse) and not result[i + len(m)]: continue if morse[i:i+len(m)] != m: continue result[i][c] = i + len(m) return result def find_words(ctr, i, prefix=''): """Find all legal words starting from position i. We generate all possible words starting from position i in the morse code stream, assuming we already have the given prefix. ctr is a char transition dict, as produced by c_trans. """ if prefix in WORDS: yield prefix, i if i == len(ctr): return for c, j in ctr[i].iteritems(): if prefix + c in PREFIXES: for w, j2 in find_words(ctr, j, prefix + c): yield w, j2 def w_trans(ctr): """Like c_trans, but produce a word transition map.""" result = [{} for i in xrange(len(ctr))] for i_ in xrange(len(ctr)): i = len(ctr) - i_ - 1 for w, j in find_words(ctr, i): if j < len(result) and not result[j]: continue result[i][w] = j return result def shortest_sentence(wt): """Given a word transition map, find the shortest possible sentence. We find the sentence that uses the entire morse code stream, and has the fewest number of words. If there are multiple sentences that satisfy this, we return the one that uses the smallest number of characters. """ result = [-1 for _ in xrange(len(wt))] + [0] words = [None] * len(wt) for i_ in xrange(len(wt)): i = len(wt) - i_ - 1 for w, j in wt[i].iteritems(): if result[j] == -1: continue if result[i] == -1 or result[j] + 1 + len(w) / 30.0 < result[i]: result[i] = result[j] + 1 + len(w) / 30.0 words[i] = w i = 0 result = [] while i < len(wt): result.append(words[i]) i = wt[i][words[i]] return result def sentence_count(wt): result = [0] * len(wt) + [1] for i_ in xrange(len(wt)): i = len(wt) - i_ - 1 for j in wt[i].itervalues(): result[i] += result[j] return result[0] msg = 'JACK AND JILL WENT UP THE HILL' print sentence_count(w_trans(c_trans(encode(msg)))) print shortest_sentence(w_trans(c_trans(encode(msg))))