Is there a cross-platform Java method for removing special characters in file names?

I am making a cross-platform application that renames files based on data received on the Internet. I would like to misinform the strings that I took from the web API for the current platform.

I know that different platforms have different requirements for file names, so I was wondering if there is a cross-platform way to do this?

Edit: On Windows platforms, you cannot have the question mark '?' in the file name, whereas on Linux you can. File names may contain such characters, and I would like platforms that support these characters to keep them, but otherwise cut them out.

In addition, I would prefer a standard Java solution that does not require third-party libraries.

+52
java filesystems cross-platform filenames
Jul 20 '09 at 18:25
source share
8 answers

As suggested elsewhere, this is usually not what you want to do. It is usually best to create a temporary file using a safe method such as File.createTempFile ().

You should not do this with the whitelist and keep only the “good” characters. If the file consists only of Chinese characters, you will strip all of it. We cannot use the whitelist for this reason, we must use the blacklist.

Linux pretty much allows anything that can be a real pain. I would just limit Linux to the same list that you restrict Windows to keep your headaches in the future.

Using this C # snippet on Windows, I created a list of characters that are not valid on Windows. There are a few more characters in this list than you think (41), so I would not recommend creating your own list.

foreach (char c in new string(Path.GetInvalidFileNameChars())) { Console.Write((int)c); Console.Write(","); } 

Here is a simple Java class that clears the file name.

 public class FileNameCleaner { final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47}; static { Arrays.sort(illegalChars); } public static String cleanFileName(String badFileName) { StringBuilder cleanName = new StringBuilder(); for (int i = 0; i < badFileName.length(); i++) { int c = (int)badFileName.charAt(i); if (Arrays.binarySearch(illegalChars, c) < 0) { cleanName.append((char)c); } } return cleanName.toString(); } } 

EDIT: Because Stephen suggested that you probably should also make sure that these file accesses occur only within the directory you allow.

The following answer contains sample code for creating a custom security context in Java and then executing the code in this sandbox.

How to create a secure sandbox isolated JEXL (scripts)?

+24
Apr 11 2018-11-11T00:
source share

or just do the following:

 String filename = "A20/B22b#öA\\BC#Ä$%ld_ma.la.xps"; String sane = filename.replaceAll("[^a-zA-Z0-9\\._]+", "_"); 

Result: A20_B22b_A_BC_ld_ma.la.xps

Explanation:

[a-zA-Z0-9\\._] matches the letter az lower or upper case, numbers, [a-zA-Z0-9\\._] , and underscores

[^a-zA-Z0-9\\._] is the opposite. i.e. all characters that do not match the first expression

[^a-zA-Z0-9\\._]+ is a sequence of characters that do not match the first expression

So, each character sequence that does not consist of characters from az is 0-9 or. _ will be replaced.

+19
Jul 19 '13 at 11:37
source share

This is based on Sarel Botha's accepted answer, which works fine until you come across any characters outside the Basic Multilingual Plan . If you need full Unicode support (and who not?), Use this code, which is safe in Unicode:

 public class FileNameCleaner { final static int[] illegalChars = {34, 60, 62, 124, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 58, 42, 63, 92, 47}; static { Arrays.sort(illegalChars); } public static String cleanFileName(String badFileName) { StringBuilder cleanName = new StringBuilder(); int len = badFileName.codePointCount(0, badFileName.length()); for (int i=0; i<len; i++) { int c = badFileName.codePointAt(i); if (Arrays.binarySearch(illegalChars, c) < 0) { cleanName.appendCodePoint(c); } } return cleanName.toString(); } } 

Key changes here:

  • Use codePointCount icw length instead of length
  • use codePointAt instead of charAt
  • use appendCodePoint instead of append
  • No need to throw char in int s. In fact, you should never have to deal with char , as they are mostly broken for anything outside of BMP.
+10
Oct. 17 '14 at 8:21
source share

There is a pretty good Java built-in solution - Character.isXxx () .

Try Character.isJavaIdentifierPart(c) :

 String name = "name.é+!@#$%^&*(){}][/=?+-_\\|;:`~!'\",<>"; StringBuilder filename = new StringBuilder(); for (char c : name.toCharArray()) { if (c=='.' || Character.isJavaIdentifierPart(c)) { filename.append(c); } } 

Result: "name.é $ _".

+6
Nov 08 '12 at 16:33
source share

Here is the code I'm using:

 public static String sanitizeName( String name ) { if( null == name ) { return ""; } if( SystemUtils.IS_OS_LINUX ) { return name.replaceAll( "/+", "" ).trim(); } return name.replaceAll( "[\u0001-\u001f<>:\"/\\\\|?*\u007f]+", "" ).trim(); } 

SystemUtils from Apache commons-lang3

+6
Jul 11 '14 at 7:53
source share

This is not clear from your question, but since you plan to accept path names from a web form (?), You should probably block attempts to rename certain things; for example, "C: \ Program Files". This means that you need to canonize paths to exclude ".". and ".." before doing access checks.

Given that I'm not trying to remove illegal characters. Instead, I will use the "new file (str) .getCanonicalFile ()" to create canonical paths, then check that they satisfy your sandbox restrictions, and finally use "File.exists ()", "File.isFile ( ) ", etc., to verify that the source and destination are kosher and not the same file. I was dealing with illegal characters, trying to perform operations and catch exceptions.

+4
Jul 20 '09 at 23:11
source share

If you want to use more than [A-Za-z0-9], then check out MS Naming Conventions and don't forget to filter out "... Characters whose whole representations are in the range from 1 to 31, ..." As an example Aaron Digullah. For example, David Carboni's code will not be sufficient for these characters.

0
Feb 24 '18 at 12:06
source share

Paths.get(...) throws a detailed exception with the position of an invalid character.

 public static String removeInvalidChars(final String fileName) { try { Paths.get(fileName); return fileName; } catch (final InvalidPathException e) { if (e.getInput() != null && e.getInput().length() > 0 && e.getIndex() >= 0) { final StringBuilder stringBuilder = new StringBuilder(e.getInput()); stringBuilder.deleteCharAt(e.getIndex()); return removeInvalidChars(stringBuilder.toString()); } throw e; } } 
0
Feb 06 '19 at 11:38
source share



All Articles