Because of the comments, I thought about it and tried. Helped me a lot to increase my understanding of escaping, so I almost completely changed my answer so that it could be useful to later readers.
NullUserException gave you only a short version, I'm trying to explain it a bit more. And thanks to the critical reviews of Qtax and Duncan, this answer, I hope, is now correct and useful.
A backslash has a special meaning, its escape character in strings, that is, a backslash and the next character form an escape sequence that translates to something else when something is done with the string. This โsomething doneโ is already a string creation. Therefore, if you want to use \
literally, you need to avoid it. This escape character is a backslash.
So, start with a few examples to better understand what is going on. I additionally print ASCII codes of characters in a string, so I hope to increase the clarity of what is happening.
s = "A\1\nB" print s print [x for x in s] print [hex(ord(x)) for x in s]
prints
A B ['A', '\x01', '\n', 'B'] ['0x41', '0x1', '0xa', '0x42']
Therefore, while I entered the code \
and 1
in the code, s
does not contain these two characters, it contains the ASCII character 0x01
, which is the "Beginning of the header". Same for \n
, it translates to 0x0a
character.
Since this behavior is not always necessary, raw strings can be used where escape sequences are ignored.
s = r"A\1\nB" print s print [x for x in s] print [hex(ord(x)) for x in s]
I just added r
before the line and now the result
A\1\nB ['A', '\\', '1', '\\', 'n', 'B'] ['0x41', '0x5c', '0x31', '0x5c', '0x6e', '0x42']
All characters print when I print them.
This is the situation that we have. Now there is the following.
There may be a situation where a string must be passed to a regular expression, which must be found literally, so each character that has a special meaning in the regular expression (for example, + * $ [.) Must be escaped, so there is a special function re.escape
that does the job.
But for this question, this is an incorrect function because the string should not be used in the regular expression, but as a replacement string for re.sub
.
So, a new situation:
An raw string including escape sequences should be used as a replacement string for re.sub
. re.sub
will also handle escape sequences, but with a small but important difference in processing before: \n
0x0a
character is still translated to 0x0a, but now the transition \1
has changed! It will be replaced by the contents of capture group 1 of the regular expression in re.sub
.
s = r"A\1\nB" print re.sub(r"(Replace)" ,s , "1 Replace 2")
And the result
1 AReplace B 2
\1
been replaced by capture group content and \n
the LineFeed character.
The important point is that you must understand this behavior, and now you have two possibilities for my opinion (and I will not judge which one is correct)
The creator is not sure about the behavior of the string, and if he enters \n
, then he wants a new string. In this case, use this to simply exit \
followed by a digit.
OnlyDigits = re.sub(r"(Replace)" ,re.sub(r"(\\)(?=\d)", r"\\\\", s) , "1 Replace 2") print OnlyDigits print [x for x in OnlyDigits] print [hex(ord(x)) for x in OnlyDigits
Output:
1 A\1 B 2 ['1', ' ', 'A', '\\', '1', '\n', 'B', ' ', '2'] ['0x31', '0x20', '0x41', '0x5c', '0x31', '0xa', '0x42', '0x20', '0x32']
The creator determines exactly what he is doing, and if he wanted a new line, he would type \0xa
. Avoid everyone in this case.
All = re.sub(r"(Replace)" ,re.sub(r"(\\)", r"\\\\", s) , "1 Replace 2") print All print [x for x in All] print [hex(ord(x)) for x in All]
Output:
1 A\1\nB 2 ['1', ' ', 'A', '\\', '1', '\\', 'n', 'B', ' ', '2'] ['0x31', '0x20', '0x41', '0x5c', '0x31', '0x5c', '0x6e', '0x42', '0x20', '0x32']