Bit vector and boolean value list

Question

Bit vector and boolean value list

I am trying to reproduce in Python two examples (originally written in Java) that I found in a book.

Two functions check to see if a string contains duplicate characters. The first function uses an integer ( checker ) as a bit vector, and the second function simply uses a list of logic elements. I expected better performance using the function with bits, but in fact it works worse.

Why? Did I write something wrong during the "translation" from Java to Python?

Note: for simplicity we use only lowercase letters ( a to z ), especially for the bit vector function.

 import sys import timeit def is_unique_chars_bit(my_str): checker = 0 for char in my_str: val = ord(char) - ord('a') if ((checker & (1 << val)) > 0): return False checker |= (1 << val) return True def is_unique_chars_list(my_str): if len(my_str) > 128: # Supposing we use ASCII, which only has 128 chars return False char_list = [False] * 128 for char in my_str: val = ord(char) if char_list[val]: return False char_list[val] = True return True if __name__ == '__main__': alphabet = "abcdefghijklmnopqrstuvwxyz" t_bit = timeit.Timer("is_unique_chars_bit('"+ alphabet +"')", "from __main__ import is_unique_chars_bit") t_list = timeit.Timer("is_unique_chars_list('"+ alphabet +"')", "from __main__ import is_unique_chars_list") print(t_bit.repeat(3, 200000)) print(t_list.repeat(3, 200000))

Results:

 [1.732477278999795, 1.7263494359995093, 1.7404333820004467] [0.6785205180003686, 0.6759967380003218, 0.675434408000001]

The original Java features are as follows:

 boolean isUniqueCharsBoolArray(String str) { if (str.length() > 128) return false; boolean[] char_set = new boolean[128]; for (int i = 0; i < str.length(); i++) { int val = str.charAt(i); if (char_set[val]) { return false; } char_set[val] = true; } return true; } boolean isUniqueCharsBits(String str) { for (int i = 0; i < str.length(); i++) { int val = str.charAt(i) -'a'; if ((checker & (1 << val)) > 0) { return false; } checker |= (1 << val); } return true; }

+5

performance python arraylist bit-manipulation

Kurt bourbaki Jan 7 '16 at 10:35

source share

1 answer

Tamas hegedus · Accepted Answer · 2016-01-07T12:29:27+0000

This is because integers are immutable reference classes in python. This means that integer operations are generally slow. (This is true even for python2 ints). Look at the following line:

 checker |= (1 << val)

If we expand the task, we get:

 checker = checker | (1 << val)

On this separate line, two new integers are allocated. One for 1 << val and one for bitwise or.

On the other hand, you don’t need to select objects to assign an array element, and the reason is faster.

If you are looking for the fastest way to determine if a string has duplicate characters, this function is shorter and faster than the previous two (taken from "check for duplicates in list" ):

 def is_unique_chars_set(my_str): return len(my_str) != len(set(my_str))

Timeit shows 3x acceleration (the last line is new):

 >python3 test.py [2.398782472571393, 2.3595238689519626, 2.37358552995787] [1.0055455609592512, 1.007462347465074, 1.012826469701654] [0.32564058749026437, 0.3303359144351621, 0.31772896318809885]

Note. The results can vary greatly if you use another python runtime such as IronPython

Bit vector and boolean value list

More articles: