How to handle English abbreviations programmatically [Regex, JS, Ruby]

I am collecting user input in natural language, and I need to test it on a predefined "correct" version. This is very trivial, but I'm not sure how to handle abbreviations in English.

Suppose I expect an offer I'm positive you don't know what you're doing. . The match should be exact, but I do not want to block users in only one way, as this would quickly upset.

So, should all possible variations of this sentence be manually entered as valid matches? For instance:

 "I'm positive you don't know what you're doing." "I am positive you don't know what you're doing." "I am positive you do not know what you're doing." "I am positive you do not know what you are doing." "I'm positive you don't know what you are doing." ... 

Etc etc. Think of more complex sentences, and you will see how it goes crazy.

Or is there a software way that I could handle this? With Regex, JS, Ruby, or Rails (the tools I use)?

Any thanks, thanks.

+5
source share
1 answer

There cannot be so many English abbreviations . I would save each variation as a key that points to the same value, for example (pseudo Ruby-esque, but of course it can be done with JS)

 "aren't" => :arent "are not" => :arent etc. 

Then save the correct sentence using common values.

 ":im positive you :dont know what :youre doing" 

When you receive the input, replace the agreed keys with their stored value, then check the converted sentence for the correct one, saved with specially marked abbreviations.

(Note: for several cases that you could respond individually to different phrases with the same abbreviations, make special provisions.)

+5
source

Source: https://habr.com/ru/post/1266513/


All Articles