Processing file parameters * with spaces as per RFC 5987 results in a '+' in file names

I have an outdated code code I'm dealing with (so I can't just use the URL with the coded name component) that allows the user to download a file from our website. Because our file names are often in different languages, they are all stored as UTF-8. I wrote code to handle the conversion of RFC5987 to the correct file name parameter *. This works fine until I have a file name with characters and other than ascii. In RFC, the space character is not part of attr_char, so it is encoded as% 20. I have new versions of Chrome, as well as Firefox, and they all convert to% 20 in + on boot. I tried not to encode the space and put the encoded file name in quotation marks and get the same result. I sniffed the response coming from the server to make sure that the servlet container is not dropping my headers and they look right to me. There are even examples in the RFC containing% 20. Am I missing something or all of these browsers have a bug related to this?

Thank you very much in advance. The code that I use to encode the file name is below.

Peter

public static boolean bcsrch(final char[] chars, final char c) { final int len = chars.length; int base = 0; int last = len - 1; /* Last element in table */ int p; while (last >= base) { p = base + ((last - base) >> 1); if (c == chars[p]) return true; /* Key found */ else if (c < chars[p]) last = p - 1; else base = p + 1; } return false; /* Key not found */ } public static String rfc5987_encode(final String s) { final int len = s.length(); final StringBuilder sb = new StringBuilder(len << 1); final char[] digits = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'}; final char[] attr_char = {'!','#','$','&','\'','+','-','.','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','^','_','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','|', '~'}; for (int i = 0; i < len; ++i) { final char c = s.charAt(i); if (bcsrch(attr_char, c)) sb.append(c); else { final char[] encoded = {'%', 0, 0}; encoded[1] = digits[0x0f & (c >>> 4)]; encoded[2] = digits[c & 0x0f]; sb.append(encoded); } } return sb.toString(); } 

Update

Here is a screenshot of the download dialog that I get for a Chinese character file with spaces as indicated in my comment.

screen cap of download dialog

+6
source share
1 answer

So, as Julian pointed out in the comments, I made a newbie Java error and forgot to make my character to convert bytes (so I encoded the character code instead of representing the character byte), so the encoding was completely wrong. This is clearly stated as a requirement in RFC 5987. I will post the corrected code for conversion. Once the encoding is correct, the filename * parameter is correctly recognized by the browser, and the file name used for downloading is correct.

The following is an adjusted escaping code that works with UTF-8 bytes of a string. The file name that was giving me the problems is now correctly encoded, looks like this:

Content-Disposition: mounts; File Name * = UTF-8``Museum% 20% E5% 8D% 9A% E7% 89% A9% E9% A6% 86.jpg

 public static String rfc5987_encode(final String s) throws UnsupportedEncodingException { final byte[] s_bytes = s.getBytes("UTF-8"); final int len = s_bytes.length; final StringBuilder sb = new StringBuilder(len << 1); final char[] digits = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'}; final byte[] attr_char = {'!','#','$','&','+','-','.','0','1','2','3','4','5','6','7','8','9', 'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','^','_','`', 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','|', '~'}; for (int i = 0; i < len; ++i) { final byte b = s_bytes[i]; if (Arrays.binarySearch(attr_char, b) >= 0) sb.append((char) b); else { sb.append('%'); sb.append(digits[0x0f & (b >>> 4)]); sb.append(digits[b & 0x0f]); } } return sb.toString(); } 
+10
source

Source: https://habr.com/ru/post/919538/


All Articles