In general, you must first sanitize - "for your protection and for them." This includes removing any invalid characters (of course, character encoding). If the field should contain only characters and spaces, cross out everything that is not the first.
After that, you confirm the results - this name is already used (for unique fields), is the correct size, not empty?
The reason you give is exactly what you need - to maximize your user experience. Do not confuse the user if it can be avoided. This helps protect against silent copying and pasting behavior, but you have to be careful - if I want my name to be written as "Ke $ h @", I can or cannot change it to "Keh".
Secondly, it also prevents errors.
What happens if you want to create usernames that do not allow the use of special characters? If I enter Brian and your system rejects it as the name we already use, do I send Brian? First you test it and it is not used, then you remove special characters and you stay with Brian. Uh oh - now you either need to check the AGAIN, or you will get a strange error that caused the failure to create an account (if your database is configured to require unique usernames), or, even worse, it will be successful and rewriting / corruption occurs with user user accounts.
Another example is the minimum field length: if you need a name with a length of at least 3 letters and accept only letters, and I enter "no", you will reject it; but if I enter "no @ # $%", you could say that it is valid (long enough), sanitize it, and now it is no longer valid, etc.
An easy way to avoid this is to sanitize first and then you donβt need to think twice about validation.
However, Neath was entitled to not encode data before storage; As a rule, it is much easier to set the output in HTML as encoded when necessary, you should remember that it should decode it when you just need plain text (for input into text fields, JSON strings, etc.). Most of the test cases that you will use will not include data with HTML objects, so it is easy to introduce stupid errors that are not easy to catch.
The big problem is that when such an error is introduced, it can quickly lead to data corruption, which is not easy to solve. Example: you have plain text, output it to the text field incorrectly as html objects, the form returns and you re-encode it ... every time it opens or re-submits, it gets a transcoding. With a busy site / form, you can get thousands of records encoded in different ways, without a clear way to determine what should and what is not intended for HTML encoding.
Injection protection is good, but HTML coding is not (and should not) rely on.