First of all, I'm not sure how applicable your approach to natural language processing is. Also, are there any existing libraries for NLP? In particular, in NLP, I know that sometimes the order and part of speech are of great importance, plus this approach is not very stable for word variations.
However, if you want to stick to your approach, one idea to make it more readable and more convenient (see more complete pros / cons below) looks something like this:
StringFinder finder = new StringFinder(phrase); if (finder.containsAll("turn", "on").andOneOf("computer", "pc").andNot("off").matches()) { turnOnComputer(); return; } else if (finder.containsAll("turn", "off").andOneOf("computer", "pc").andNot("on").matches()) { turnOffComputer(); return; } else if (finder.containsAll("turn", "on").andOneOf("light", "lamp").andNot("off").matches()) { ... } else if (finder.containsAll("turn")) { // If we reached this point badPhrase(); } else if (...
With something like:
class StringFinder { private final String phrase; private final Map<String, Boolean> cache = new HashMap<String, Boolean>(); public StringFinder(String phrase) { this.phrase = phrase; } public StringFinder containsAll(String... strings) { for (String string : strings) { if (contains(string) == false) return new FailedStringFinder(phrase); } return this; } public StringFinder andOneOf(String... strings) { for (String string: strings) { if (contains(string)) return this; } return FailedStringFinder(phrase); } public StringFinder andNot(String... strings) { for (String string : strings) { if (contains(string)) return new FailedStringFinder(phrase); } return this; } public boolean matches() { return true; } private boolean contains(String s) { Boolean cached = cache.get(s); if (cached == null) { cached = phrase.contains(s); cached.put(s, cached); } return cached; } } class FailedStringFinder extends StringFinder { public boolean matches() { return false; }
Disadvantages:
- Duplication of checks: the queue is checked several times.
- Duplicate patterns (but see benefits below).
Benefits:
- Relatively short code.
- Checks are duplicated but cached, so performance remains high.
- The condition is very close to the operation, the result is very readable code.
- Non-nested conditions allow you to change the condition required for a particular operation without restructuring the code, which leads to a much more convenient code.
- It is easy to change the order in which conditions and operations appear to manage priorities.
- The lack of nesting facilitates its parallelization in the future.
- Flexible state check: for example, you can add methods to StringFinder to repeat the checks, for example:
public StringFinder containsOnAndNotOff() { return containsAll("on").andNot("off"); } public StringFinder containsOnAndNotOff() { return containsAll("on").andNot("off"); } , or to meet some exotic conditions that you need, such as andAtLeast3Of(String... strings) {...} .- The cache can also be expanded to not only remember whether words appear, but also to remember whether whole patterns appear.
- You can also add the final condition:
andMatches(Pattern p) (with a Regex pattern) - in fact, you can probably model many other checks with a regular expression. Then it will simplify caching - instead of using a string as a key, use a template.
source share