Remove non-literal characters from the beginning and end of a line

Question

Remove non-literal characters from the beginning and end of a line

I need to delete all non-letter characters from the beginning to the end of the word, but keep them if they appear between two letters.

For instance:

'123foo456' --> 'foo' '2foo1c#BAR' --> 'foo1c#BAR'

I tried using re.sub() , but I could not write a regex.

+5

python regex

iomartin Oct 12 '12 at 16:17

source share

6 answers

You can use str.strip for this:

 In [1]: import string In [4]: '123foo456'.strip(string.digits) Out[4]: 'foo' In [5]: '2foo1c#BAR'.strip(string.digits) Out[5]: 'foo1c#BAR'

As Matt points out in the comments (thanks, Matt), this only removes the numbers. To remove any non-letter character,

Define what you mean by nebukt:

 In [22]: allchars = string.maketrans('', '') In [23]: nonletter = allchars.translate(allchars, string.letters)

and then split:

 In [18]: '2foo1c#BAR'.strip(nonletter) Out[18]: 'foo1c#BAR'

+6

unutbu Oct 12 '12 at 16:25

source share

With your two examples, I was able to create a regular expression using the non-greedy Python syntax, as described here . I broke the entrance into three parts: non-letters, exclusively letters, and then not letters to the end. Here is a test run:

 1:[123] 2:[foo] 3:[456] 1:[2] 2:[foo1c#BAR] 3:[]

Here's the regex:

 ^([^A-Za-z]*)(.*?)([^A-Za-z]*)$

And mo.group(2) what you want, where mo is a MatchObject.

+2

Philip Oct 12 '12 at 16:25

source share

For Unicode Compatibility:

 ^\PL+|\PL+$

\PL means not a letter

+2

Toto Oct 12 '12 at 17:42

source share

Try the following:

 re.sub(r'^[^a-zA-Z]*(.*?)[^a-zA-Z]*$', '\1', string);

Parentheses capture everything between non-letter lines at the beginning and end of a line. ? guarantees that . also does not fix any letters without a letter. Then the replacement simply prints the captured group.

0

Martin ender Oct 12 '12 at 16:24

source share

result = re.sub('(.*?)([az].*[az])(.*)', '\\2', '23WERT#3T67', flags=re.IGNORECASE)

0

Matthias Oct 12 '12 at 16:25

source share

Kent · Accepted Answer · 2012-10-12T16:24:02+0000

like this?

 re.sub('^[^a-zA-Z]*|[^a-zA-Z]*$','',s)

s is the input line.

Remove non-literal characters from the beginning and end of a line

More articles: