I need to generate file names from user-entered names. These names can be in any language. For instance:
- "John Smith"
- "高 岡 和 子"
- "محمد سعيد بن عبد العزيز الفلسطيني"
These are the entered values that are used, so I cannot guarantee that the names do not contain characters that are not valid for the file name.
Users will download these files from their browser, so I need to make sure that the file names are valid on all operating systems in all configurations.
I am currently doing this for English-speaking countries by simply deleting all non-alphanumeric characters with a simple regular expression:
string = string.replaceAll("[^a-zA-Z0-9]", ""); string = string.replaceAll("\\s+", "_")
Some conversion examples:
- "John Smith" → "John_Smith.ext"
- "John O'Henry" → "John_OHenry.ext"
- "John van Smith III" → "John_van_Smith_III.ext"
Obviously, this does not work internationally.
I considered searching / creating a blacklist of all characters that are invalid for all file systems and deprive them of names. I could not find an exhaustive list.
I would rather use existing code in a shared library if possible. I think this has already been resolved, but I can’t find a solution that works internationally.
The file name for the user uploading the file, not for me. I am not going to store these files. These files are dynamically generated by the server upon request from the data in the database. File names are for the convenience of the user downloading the file.
source share