CRLF escaping in multipart / form-data HTTP content type (iOS)

I am trying to publish a file using the content type multipart / form-data, and I had a question:
Should I avoid CRLF when I write the contents of a file? I got the code online and I think this might be wrong:

NSMutableURLRequest* req = [NSMutableURLRequest requestWithURL: url]; [req setHTTPMethod: @"POST"]; NSString* contentType = @"multipart/form-data, boundary=AaB03x"; [req setValue:contentType forHTTPHeaderField: @"Content-type"]; NSData* boundary = [@"\r\n--AaB03x\r\n" dataUsingEncoding:NSUTF8StringEncoding]; NSMutableData *postBody = [NSMutableData data]; [postBody appendData: boundary]; [postBody appendData: [@"Content-Disposition: form-data; name=\"datafile\"; filename=\"t.jpg\"" dataUsingEncoding:NSUTF8StringEncoding]]; [postBody appendData: [@"Content-Type: image/jpeg\r\n\r\n" dataUsingEncoding:NSUTF8StringEncoding]]; [postBody appendData: imageData]; [postBody appendData: boundary]; [req setHTTPBody:postBody]; 

This is wrong because imageData may contain \ r \ n sequences, right? If so, is there a way to avoid CRLF in raw data? Or am I missing something?

Thanks in advance!

+4
source share
1 answer

This is an interesting question. Looking at the multipart media type RFC , it turned out that the layout agent must make sure that the border does not appear in the encapsulated data. In addition, it states the following:

NOTE. Since border delimiters should not appear in body parts when encapsulated, the user agent must be careful to select a unique value for the boundary parameter. The value of the boundary parameter in the example above could be the result of an algorithm designed to produce boundary separators with a very low probability of already existing in the data, which should be encapsulated without the need for preliminary data scanning.

I interpret this to make sure that the boundary value does not appear in the encapsulated data, you will need to scan the data for the boundary value. Since in most cases this is an unacceptably expensive operation, it is expected that user agents will simply select a value that has a very low chance of appearing in the data.

Consider the likelihood that the border in your example occurs in a random string of bytes (which for the argument we assume is a JPEG image). The complete line that will need to be matched for an early early ranking of your images will be "\ r \ n - AaB03x" - 10 bytes or 80 bits. Starting with any bit, the probability that the next 10 bytes is a sequence is 2 ^ 80. There is 2 ^ 23 bits in a 1 MB JPEG file. This means that the likelihood that a JPEG file containing a sequence will be less than 2 ^ 23/2 ^ 80, or one of 2 ^ 57 (more than a hundred quadrillion).

So, I think the answer is to be 100% sure, you will need to check the data for the border sequence, and then use another if this border exists in the data. But in practice, the chances of a border sequence appearing are small enough to not be worth it.

+3
source

Source: https://habr.com/ru/post/1336367/


All Articles