Reduce the string using grammar rules

Question

Reduce the string using grammar rules

I am trying to find a suitable DP algorithm to simplify the string. For example, I have a string abab and a list of rules

ab -> b
ab -> c
ba -> a
cc -> b

The goal is to get all single characters that can be obtained from a given string using these rules. For this example, it will be b, c . The length of this string can be up to 200 characters. Could you suggest an efficient algorithm?

Rules are always 2 -> 1 . I have an idea to create a tree, a row is assigned to the root row, and each child is a row after one conversion, but I'm not sure if this is the best way.

+6

string language-agnostic algorithm dynamic-programming

user2875945 May 08, '15 at 20:28

source share

3 answers

Bas swinckels · Answer 1 · 2015-05-08T22:19:03+0000

For a DP problem, you always need to understand how you can build an answer for a big problem in terms of smaller subtasks. Suppose you have a simplify function that is called with an input of length n . There are n-1 ways to split the input in the first and last parts. For each of these sections, you must recursively call your simplify function for both the first part and the last part. The final answer for entering length n is the set of all possible combinations of answers for the first and last parts that are allowed by the rules.

In Python, this can be implemented as follows:

 rules = {'ab': set('bc'), 'ba': set('a'), 'cc': set('b')} all_chars = set(c for cc in rules.values() for c in cc) @ memoize def simplify(s): if len(s) == 1: # base case to end recursion return set(s) possible_chars = set() # iterate over all the possible splits of s for i in range(1, len(s)): head = s[:i] tail = s[i:] # check all possible combinations of answers of sub-problems for c1 in simplify(head): for c2 in simplify(tail): possible_chars.update(rules.get(c1+c2, set())) # speed hack if possible_chars == all_chars: # won't get any bigger return all_chars return possible_chars

Quick check:

 In [53]: simplify('abab') Out[53]: {'b', 'c'}

To do this fast enough for large lines (to avoid exponential behavior), you should use memoize decorator . This is an important step in solving DP problems, otherwise you just do brute force calculations. Another small acceleration can be obtained by returning from the function as soon as possible_chars == set('abc') , since at this point you are already sure that you can generate all possible results.

Runtime analysis: to enter length n there are 2 substrings of length n-1 , 3 substrings of length n-2 , ... n substrings of length 1, for all O(n^2) subtasks. Due to memoization, a function is called no more than once for each subtask. The maximum runtime for one subtask is O(n) due to for i in range(len(s)) , so the total runtime is no more than O(n^3) .

Jay kominek · Answer 2 · 2015-05-08T23:08:49+0000

If you read these rules from right to left, they look exactly like context free grammar rules and have basically the same meaning. You can apply a parsing algorithm to your data from the bottom up, for example, the Earley algorithm , as well as a suitable start rule; sort of

 start <- start a | start b | start c

and then just look at the syntax forest for the shortest start s chain. In the worst case, of course, O (n ^ 3), but Earley is pretty effective these days.

You can also create parses when parsing with derivatives . You might be able to effectively test them for short start s chains.

Adam stelmaszczyk · Answer 3 · 2015-05-08T22:23:16+0000

Let N be the length of the given string and R the number of rules.

Extending the tree from top to bottom gives the computational complexity O (NR ^ N) in the worst case (an input line of type aaa... and rules aa -> a ).

Evidence:

The root of the tree has (N-1) R children who have (N-1) R ^ 2 children, ... who have (N-1) R ^ N children (leaves). Thus, the total complexity is O ((N-1) R + (N-1) R ^ 2 + ... (N-1) R ^ N) = O (N (1 + R 2 + ... + R ^ N)) = (using the binomial theorem ) = O (N (R + 1) ^ N) = O (NR ^ N).

Recursive Java implementation of this naive approach:

 public static void main(String[] args) { Map<String, Character[]> rules = new HashMap<String, Character[]>() {{ put("ab", new Character[]{'b', 'c'}); put("ba", new Character[]{'a'}); put("cc", new Character[]{'b'}); }}; System.out.println(simplify("abab", rules)); } public static Set<String> simplify(String in, Map<String, Character[]> rules) { Set<String> result = new HashSet<String>(); simplify(in, rules, result); return result; } private static void simplify(String in, Map<String, Character[]> rules, Set<String> result) { if (in.length() == 1) { result.add(in); } for (int i = 0; i < in.length() - 1; i++) { String two = in.substring(i, i + 2); Character[] rep = rules.get(two); if (rep != null) { for (Character c : rep) { simplify(in.substring(0, i) + c + in.substring(i + 2, in.length()), rules, result); } } } }

Original version of Bas Swinckels O (RN ^ 3) Java (with HashMap as memoization cache):

 public static Set<String> simplify2(final String in, Map<String, Character[]> rules) { Map<String, Set<String>> cache = new HashMap<String, Set<String>>(); return simplify2(in, rules, cache); } private static Set<String> simplify2(final String in, Map<String, Character[]> rules, Map<String, Set<String>> cache) { final Set<String> cached = cache.get(in); if (cached != null) { return cached; } Set<String> ret = new HashSet<String>(); if (in.length() == 1) { ret.add(in); return ret; } for (int i = 1; i < in.length(); i++) { String head = in.substring(0, i); String tail = in.substring(i, in.length()); for (String c1 : simplify2(head, rules)) { for (String c2 : simplify2(tail, rules, cache)) { Character[] rep = rules.get(c1 + c2); if (rep != null) { for (Character c : rep) { ret.add(c.toString()); } } } } } cache.put(in, ret); return ret; }

The conclusion in both approaches:

 [b, c]

Reduce the string using grammar rules

More articles: