Filter special characters such as color codes from shell output

I ran shell commands in python, wrote their outputs to files, and finally showed them on a web page. however, color style symbols for command outputs were also recorded. is there any way to filter out color style characters or display them correctly on web pages? Thank you very much!

Output Log:

" 22200K .......\u001b[0m\u001b[91m... .......... ...\u001b[0m\u001b[91m.\u001b[0m\u001b[91m...... .........\u001b[0m\u001b[91m.\u001b[0m\u001b[91m \u001b[0m\u001b[91m.\u001b[0m\u001b[91m.\u001b[0m\u001b[91m.\u001b[0m\u001b[91m.\u001b[0m\u001b[91m...... 50% 28.6K 12m55s" 

real text:

 [INFO][88] 22250K .......... .......... .......... .......... .......... 50% 35.8K 12m53s 
+6
source share
4 answers

In the unlikely event that you have xterm256 color codes, this will filter both the "normal" ansi and xterm256 ansi codes:

 import re print re.sub(r'\x1b(\[.*?[@-~]|\].*?(\x07|\x1b\\))', '', a) 

or in slightly less obfuscation and more readable form:

 '(' + CSI + '.*?' + CMD + '|' + OSC + '.*?' + '(' + ST + '|' + BEL + ')' + ')' 

Full code with tests:

 import re tests = [ u"22200K .......\u001b[0m\u001b[91m... .......... ...\u001b[0m\u001b[91m.\u001b[0m\u001b[91m...... .........\u001b[0m\u001b[91m.\u001b[0m\u001b[91m \u001b[0m\u001b[91m.\u001b[0m\u001b[91m.\u001b[0m\u001b[91m.\u001b[0m\u001b[91m.\u001b[0m\u001b[91m...... 50% 28.6K 12m55s", u"=\u001b[m=", u"-\u001b]23\u0007-", ] def trim_ansi(a): ESC = r'\x1b' CSI = ESC + r'\[' OSC = ESC + r'\]' CMD = '[@-~]' ST = ESC + r'\\' BEL = r'\x07' pattern = '(' + CSI + '.*?' + CMD + '|' + OSC + '.*?' + '(' + ST + '|' + BEL + ')' + ')' return re.sub(pattern, '', a) for t in tests: print trim_ansi(t) 
+4
source

This should work in most cases:

 import re print re.sub(u'\u001b\[.*?[@-~]', '', a) 

Update

Evacuation sequences start with the ESC character (ASCII decimal 27 / hex 0x1B / octal 033). For two character sequences, the second character is in the ASCII range of 64-95 (from @ to _). However, most of these sequences have more than two characters and begin with the characters ESC and [(left bracket). This sequence is called the CSI for the Control Sequence Introducer (or Control Sequence Initiator). The final nature of these sequences is in the ASCII range of 64-126 (from @ to ~). ( http://en.wikipedia.org/wiki/ANSI_escape_code )

Update2

With the following "a.py":

 import sys, re for line in sys.stdin: sys.stdout.write(re.sub(u'\u001b\[.*?[@-~]', '', line)) 

this works smoothly for me:

 ls --color | python a.py 
+2
source

You can simply pass the output through strings to remove any non-printable characters:

 ./some-script | strings 
-1
source

Source: https://habr.com/ru/post/987823/


All Articles