I am trying to reproduce in Python two examples (originally written in Java) that I found in a book.
Two functions check to see if a string contains duplicate characters. The first function uses an integer ( checker ) as a bit vector, and the second function simply uses a list of logic elements. I expected better performance using the function with bits, but in fact it works worse.
Why? Did I write something wrong during the "translation" from Java to Python?
Note: for simplicity we use only lowercase letters ( a to z ), especially for the bit vector function.
import sys import timeit def is_unique_chars_bit(my_str): checker = 0 for char in my_str: val = ord(char) - ord('a') if ((checker & (1 << val)) > 0): return False checker |= (1 << val) return True def is_unique_chars_list(my_str): if len(my_str) > 128: # Supposing we use ASCII, which only has 128 chars return False char_list = [False] * 128 for char in my_str: val = ord(char) if char_list[val]: return False char_list[val] = True return True if __name__ == '__main__': alphabet = "abcdefghijklmnopqrstuvwxyz" t_bit = timeit.Timer("is_unique_chars_bit('"+ alphabet +"')", "from __main__ import is_unique_chars_bit") t_list = timeit.Timer("is_unique_chars_list('"+ alphabet +"')", "from __main__ import is_unique_chars_list") print(t_bit.repeat(3, 200000)) print(t_list.repeat(3, 200000))
Results:
[1.732477278999795, 1.7263494359995093, 1.7404333820004467] [0.6785205180003686, 0.6759967380003218, 0.675434408000001]
The original Java features are as follows:
boolean isUniqueCharsBoolArray(String str) { if (str.length() > 128) return false; boolean[] char_set = new boolean[128]; for (int i = 0; i < str.length(); i++) { int val = str.charAt(i); if (char_set[val]) { return false; } char_set[val] = true; } return true; } boolean isUniqueCharsBits(String str) { for (int i = 0; i < str.length(); i++) { int val = str.charAt(i) -'a'; if ((checker & (1 << val)) > 0) { return false; } checker |= (1 << val); } return true; }
source share