When I first learned how to use regular expressions, we were taught to parse things like phone numbers (obviously always 5 digits, extra space and 6 more digits), email addresses (obviously always alphanumeric, @ ', then alphanumerics followed by β.β and three letters), which we must always do to verify the data that the user enters.
Of course, as I understand it, I found out how stupid the basic approach can be, but the more I look, the more I doubt the concept as a whole, the most open thorough thorough correct verification of something like an email address using regular expressions ends with that hundreds, if not thousands of characters, are long to accept all legal cases and correctly dismiss only illegal ones. Even worse, all these efforts do absolutely nothing for the actual reality, the user may have accidentally added βaβ, or cannot use this email address at all, or even use a different address, or may even use the β+β that is marked wrong.
At the same time, it seems that every site I come across still performs such a technical check, preventing me from putting more obscure characters in an email address or name or objecting to the idea that someone will have more or less one name, then one name and one last name, they are all made exclusively of Latin characters, but without any verification that this is my real name.
Are there any advantages to this? After the injection attacks have been handled (which should be using methods other than input sterilization), is there any other point for these checks?
Or, on the other hand, is there really a valid way to actually validate user data, other than to βuseβ it in any way contextually and see if it crashes?