rules
Sketch parsing rules are documented here: https://msdn.microsoft.com/en-us/library/17w5ykft.aspx
The Microsoft C / C ++ startup code uses the following rules when interpreting the arguments specified on the command line of the operating system:
Arguments are limited to a space, which is either space or a tab.
The carriage character (^) is not recognized as an escape character or delimiter. The character is completely processed by the command line parser in the operating system before passing to the argv array in the program.
A string surrounded by double quotation marks ("string") is interpreted as a single argument, regardless of the space contained within. The string is quoted as an argument.
- The double quotation mark preceded by a backslash (\ ") is interpreted as the character of the double quote character (").
- Backslashes are interpreted literally unless they are preceded by a double quote.
- If an even number of backslashes is followed by a double quote character, one backslash is placed in the argv array for each pair of backslashes, and the double quote character is interpreted as a line separator.
- If an odd number of backslashes is followed by a double quotation mark, one backslash is placed in the argv array for each pair of backslashes, and the double quotation mark is "escaped" by the remaining backslash, causing the letter double quote mark (") to be placed in argv .
Generation Application
Unfortunately, there is no proper documentation on how to avoid arguments correctly, for example, how to apply the above rules to ensure that the array of arguments is passed correctly to the target application. Below are the rules that I followed to escape each argument:
If the argument contains a space or tab, wrap it with the characters "(double quotation mark).
If the argument contains a double quote, before the characters \ (backslash), before the previous characters \ (backslash), the previous characters with the character \ (backslash) should be output before adding the escape string (double quote).
If the argument ends with one or more characters \ (backslash) and contains a space, output the final characters \ (backslash) with \ (backslash) before adding a closing "(double quotation mark).
Code
/// <summary> /// Convert an argument array to an argument string for using /// with Process.StartInfo.Arguments. /// </summary> /// <param name="argument"> /// The args to convert. /// </param> /// <returns> /// The argument <see cref="string"/>. /// </returns> public static string EscapeArguments(string argument) { using (var characterEnumerator = argument.GetEnumerator()) { var escapedArgument = new StringBuilder(); var backslashCount = 0; var needsQuotes = false; while (characterEnumerator.MoveNext()) { switch (characterEnumerator.Current) { case '\\': // Backslashes are simply passed through, except when they need // to be escaped when followed by a \", eg the argument string // \", which would be encoded to \\\" backslashCount++; escapedArgument.Append('\\'); break; case '\"': // Escape any preceding backslashes for (var c = 0; c < backslashCount; c++) { escapedArgument.Append('\\'); } // Append an escaped double quote. escapedArgument.Append("\\\""); // Reset the backslash counter. backslashCount = 0; break; case ' ': case '\t': // White spaces are escaped by surrounding the entire string with // double quotes, which should be done at the end to prevent // multiple wrappings. needsQuotes = true; // Append the whitespace escapedArgument.Append(characterEnumerator.Current); // Reset the backslash counter. backslashCount = 0; break; default: // Reset the backslash counter. backslashCount = 0; // Append the current character escapedArgument.Append(characterEnumerator.Current); break; } } // No need to wrap in quotes if (!needsQuotes) { return escapedArgument.ToString(); } // Prepend the " escapedArgument.Insert(0, '"'); // Escape any preceding backslashes before appending the " for (var c = 0; c < backslashCount; c++) { escapedArgument.Append('\\'); } // Append the final " escapedArgument.Append('\"'); return escapedArgument.ToString(); } } /// <summary> /// Convert an argument array to an argument string for using /// with Process.StartInfo.Arguments. /// </summary> /// <param name="args"> /// The args to convert. /// </param> /// <returns> /// The argument <see cref="string"/>. /// </returns> public static string EscapeArguments(params string[] args) { var argEnumerator = args.GetEnumerator(); var arguments = new StringBuilder(); if (!argEnumerator.MoveNext()) { return string.Empty; } arguments.Append(EscapeArguments((string)argEnumerator.Current)); while (argEnumerator.MoveNext()) { arguments.Append(' '); arguments.Append(EscapeArguments((string)argEnumerator.Current)); } return arguments.ToString(); }
Test tests
Here are the tests that I used to verify the above code (posting left as an exercise for the reader)
NOTE. My test case was a random number of cases below as an input array of arguments, encoding it into an argument string, passing the string to a new process that outputs the arguments as a JSON array, and make sure the input args array matches the output JSON array.
+ --------------------------------------- + --------- ----------------------------------- +
| Input String | Escaped String |
+ --------------------------------------- + --------- ----------------------------------- +
| quoted argument | "quoted argument" |
| "quote | \" quote |
| "wrappedQuote" | \ "wrappedQuote \" |
| "quoted wrapped quote" | "\" quoted wrapped quote \ "" |
| \ backslashLiteral | \ backslashLiteral |
| \\ doubleBackslashLiteral | \\ doubleBackslashLiteral |
| trailingBackslash \ | trailingBackslash \ |
| doubleTrailingBackslash \\ | doubleTrailingBackslash \\ |
| \ quoted backslash literal | "\ quoted backslash literal" |
| \\ quoted double backslash literal | "\\ quoted double backslash literal" |
| quoted trailing backslash \ | "quoted trailing backslash \\" |
| quoted double trailing backslash \\ | "quoted double trailing backslash \\\\" |
| \ "\ backslashQuoteEscaping |" \\\ "\ backslashQuoteEscaping" |
| \\ "\ doubleBackslashQuoteEscaping |" \\\\\ "\ doubleBackslashQuoteEscaping" |
| \\ "\\ doubleBackslashQuoteEscaping |" \\\\\ "\\ doubleBackslashQuoteEscaping" |
| \ "\\ doubleBackslashQuoteEscaping |" \\\ "\\ doubleBackslashQuoteEscaping" |
| \ "\ backslash quote escaping |" \\\ "\ backslash quote escaping" |
| \\ "\ double backslash quote escaping |" \\\\\ "\ double backslash quote escaping" |
| \\ "\\ double backslash quote escaping \" \\\\\ "\\ double backslash quote escaping" |
| \ "\\ double backslash quote escaping |" \\\ "\\ double backslash quote escaping" |
| TrailingQuoteEscaping "| TrailingQuoteEscaping \" |
| TrailingQuoteEscaping \ "| TrailingQuoteEscaping \\\" |
| TrailingQuoteEscaping \ "\ | TrailingQuoteEscaping \\\" \ |
| TrailingQuoteEscaping "\ | TrailingQuoteEscaping \" \ |
| Trailing Quote Escaping "|" Trailing Quote Escaping \ "" |
| Trailing Quote Escaping \ "|" Trailing Quote Escaping \\\ "" |
| Trailing Quote Escaping \ "\ |" Trailing Quote Escaping \\\ "\\" |
| Trailing Quote Escaping "\ |" Trailing Quote Escaping \ "\\" |
+ --------------------------------------- + --------- ----------------------------------- +
There are other answers to this question on SO. I just prefer a coded state machine for regular expressions (it also works faster). fooobar.com/questions/95551 / ... has a good explanation on how to do this.
source share