Practical testing of users (sensitivity and specificity)?

Question

Practical testing of users (sensitivity and specificity)?

When I first learned how to use regular expressions, we were taught to parse things like phone numbers (obviously always 5 digits, extra space and 6 more digits), email addresses (obviously always alphanumeric, @ ', then alphanumerics followed by “.” and three letters), which we must always do to verify the data that the user enters.

Of course, as I understand it, I found out how stupid the basic approach can be, but the more I look, the more I doubt the concept as a whole, the most open thorough thorough correct verification of something like an email address using regular expressions ends with that hundreds, if not thousands of characters, are long to accept all legal cases and correctly dismiss only illegal ones. Even worse, all these efforts do absolutely nothing for the actual reality, the user may have accidentally added “a”, or cannot use this email address at all, or even use a different address, or may even use the “+” that is marked wrong.

At the same time, it seems that every site I come across still performs such a technical check, preventing me from putting more obscure characters in an email address or name or objecting to the idea that someone will have more or less one name, then one name and one last name, they are all made exclusively of Latin characters, but without any verification that this is my real name.

Are there any advantages to this? After the injection attacks have been handled (which should be using methods other than input sterilization), is there any other point for these checks?

Or, on the other hand, is there really a valid way to actually validate user data, other than to “use” it in any way contextually and see if it crashes?

+5

regex validation business-logic

Cactus Mar 11 '16 at 15:58

source share

2 answers

Excessive validation is really one of the bans of the Internet. Especially if the person who wrote the verification code does not have actual knowledge of the problem area. No, you probably don't actually know what the correct syntax is for email addresses. Or real addresses, especially internationally. Or phone numbers. Or the names of people.

Looking at a few localized examples (my email address) and extrapolating to rules that encompass all possible values in a domain (all email addresses) is crazy. If you do not have perfect domain knowledge, you should not come up with domain rules. In the case of email addresses, this leads only to a very narrow subset of the possible email addresses that can actually be used in everyday life. Ghee, thanks guys.

As with people's names, no matter what the person tells you, their name is, by definition, their name. This is what you call him. You cannot confirm this automatically; they will have to send a copy of their birth certificate for the actual official verification. And even then, is that really what you are interested in knowing? Or do you just need a "handle" to greet and identify them on your forum page?

Facebook does (did?) A rigorous name check to get people to use their real names to register. Well, many of the people I know on Facebook still use some kind of convoluted name. The filter obviously does not work. Having said that, it may work well enough for Facebook, so most people use their actual name because they should not worry about which particular template will pass the test. In this sense, such a filter can serve a specific purpose.

In the end, you need to decide the reasons for the verification and the specific limitations that you want to apply. The problem is that people often don’t think about the bigger picture before writing the verification code, and they do n’t have a good reason for their specific limitations. Do not fall into this trap.

+17

deceze Mar 11 '16 at 16:15

source share

Alex howansky · Accepted Answer · 2016-03-11T16:14:20+0000

is there any other point for these checks?

Sure. Knowing that your data is valid is very important. In the case of email addresses, for example, sending email to an address that you have not verified will at least result in bounces. There are enough bounces, and your mail host may block you from sending spam. An invalid phone number can lead to unnecessary costs if your application tries to send them an SMS. The list goes on and on.

Or, on the other hand, is there really a valid way to actually validate user data, other than to “use” it in any way contextually and see if it crashes?

Yes, but the regex is generally not well suited for validating data. If the phone number should be “5 digits with a space, then 6 digits,” then your check will fail if I type “5 digits into two spaces, then into 6 digits” or “5 digits into a dash, and then into 6 digits” or "11 digits". Use common sense and expect any crazy format the user provides. Know what the absolute minimum requirement is. For example, if you need only 11 digits, then first divide everything that is not a digit. Then formatting doesn't matter.

Also read the RFC. I can’t count the number of times my email address was rejected because it has a plus sign. The number of those who were a large technology-oriented company with programmers who needed to know better was rather disappointing.

Practical testing of users (sensitivity and specificity)?

More articles: