Regular expression to remove signal noise peaks

I deal with radio signals that sometimes have noise peaks.
The input looks something like this: 00000001111100011110001111100001110000001000001111000000111001111000

Before parsing the data in the signal, I need to remove the peak bits, which are 0 and 1 sequences with a length less than (in this example) 3.

So basically, I need to match. 0000000111110001111000111110000111000000(1)000001111000000111(00)1111000
After the match, I replace its bit before it, so a clean signal looks like this: 00000001111100011110001111100001110000000000001111000000111111111000

So far I have achieved this with two different Regex:

self.re_one_spikes = re.compile("(?:[^1])(?P<spike>1{1,%d})(?=[^1])" % (self._SHORTEST_BIT_LEN - 1))
self.re_zero_spikes = re.compile("(?:[^0])(?P<spike>0{1,%d})(?=[^0])" % (self._SHORTEST_BIT_LEN - 1))

Then I repeat the matches and replace.

How can I do this with a single regex? Can I use regex to replace matches of different sizes?
I tried something like this without success:

re.compile("(?![\1])([01]{1,2})(?![\1])")
+4
4
import re
THRESHOLD=3

def fixer(match):
    ones = match.group(0)
    if len(ones) < THRESHOLD: return "0"*len(ones)
    return ones

my_string = '00000001111100011110001111100001110000001000001111000000111001111000'
print(re.sub("(1+)",fixer,my_string))

""

def fixer(match):
    items = match.group(0)
    if len(items) < THRESHOLD: return "10"[int(items[0])]*len(items)
    return items

print(re.sub("(1+)|(0+)",fixer,my_string))
+6

[01] , :

(?<=([01]))(?:(?!\1)[01]){1,2}(?=\1)

 (?<=                 # Lookbehind for 0 or 1
      ( [01] )             # (1), Capture behind 0 or 1
 )
 (?:                  # Match spike, one to %d times in length
      (?! \1 )             # Cannot be the 0 or 1 from lookbehind
      [01] 
 ){1,2}
 (?= \1 )             # Lookahead, can only be 0 or 1 from capture (1)

$1 (.. 0).

 **  Grp 0 -  ( pos 40 , len 1 ) 
1  
 **  Grp 1 -  ( pos 39 , len 1 ) 
0  

----------------------------------------

 **  Grp 0 -  ( pos 59 , len 2 ) 
00  
 **  Grp 1 -  ( pos 58 , len 1 ) 
1  

Regex1:   (?<=([01]))(?:(?!\1)[01]){1,2}(?=\1)
Options:  < none >
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   2
Elapsed Time:    2.06 s,   2058.02 ms,   2058018 µs


50,000 iterations * 2 matches/iteration = 100,000 matches 

100,000 matches / 2 sec  =  50,000 matches per second
+1

An alternative approach without using regexand using replace()instead (in case someone might prove useful in the future):

>>> my_signal = '00000001111100011110001111100001110000001000001111000000111001111000'
>>> my_threshold = 3
>>> for i in range(my_threshold):
...     my_signal = my_signal.replace('0{}0'.format('1'*(i+1)), '0{}0'.format('0'*(i+1)))
... 
>>> my_signal
'00000001111100011110001111100000000000000000001111000000000001111000'
0
source
def fix_noise(s, noise_thold=3):
    pattern=re.compile(r'(?P<before>1|0)(?P<noise>(?<=0)1{1,%d}(?=0)|(?<=1)0{1,%d}(?=1))' % (noise_thold-1, noise_thold-1))
    result = s
    for noise_match in pattern.finditer(s):
        beginning = result[:noise_match.start()+1]
        end = result[noise_match.end():]
        replaced = noise_match.group('before')*len(noise_match.group('noise'))
        result = beginning + replaced + end
    return result

Jordan's indexing initiative int(items[0])is awesome!

0
source

Source: https://habr.com/ru/post/1654742/


All Articles