I think of coming up with a bunch of regular expressions for possible ways so that the data can be invalid (for example, not enough or too many characters), and then tracking these results might work. But instead of thinking about how the data can be invalid, I am curious how to “learn” patterns from bad data using AI.
What a funny quote I was reminded of, usually attributed to Jamie Zawinski:
Some people, faced with a problem, think: "I know, I will use regular expressions." Now they have two problems.
Except, in this case, I think the manual regex route is actually your best bet!
The irony of irony.
Anyway.
The fact is that people tend to overreceive their decisions. Here, regular expressions are actually a fairly simple solution to your problem, while creating a student is something that will take you much longer than I think you understand.
There are fewer ways for this very limited representation of the data (date) to be correctly expressed than there are ways for it to be displayed incorrectly. Because there are endless ways to identify bad data. Do you want to train a student to discover all of them? This is a rabbit hole. Think of this AI student instead as a colleague or friend: how would you describe to them all the ways that dates could not be presented properly?
While your intention was to do less work for yourself in the long run - and it's a good quality - to figure out how to develop a student, not to mention traveling and checking it out, not to mention keeping a close eye on him, outweighs any benefits that the student can provide to you in such a narrow precedent.
source share