Which objects are used, changes if the sets are the same size, intersecting elements from b, if b has more elements, then objects from a are returned:
i = "$foobar" * 100 j = "$foob" * 100 l = "$foobar" * 100 k = "$foob" * 100 print(id(i), id(j)) print(id(l), id(k)) a = {i, j} b = {k, l, 3} inter = a.intersection(b) for ele in inter: print(id(ele))
Output:
35510304 35432016 35459968 35454928 35510304 35432016
Now that they are the same size:
i = "$foobar" * 100 j = "$foob" * 100 l = "$foobar" * 100 k = "$foob" * 100 print(id(i), id(j)) print(id(l), id(k)) a = {i, j} b = {k, l} inter = a.intersection(b) for ele in inter: print(id(ele))
Output:
35910288 35859984 35918160 35704816 35704816 35918160
The relevant part of the source. The string if (PySet_GET_SIZE(other) > PySet_GET_SIZE(so)) , n the result of the comparison, apparently, determines which object to iterate over and which objects will be used.
if (PySet_GET_SIZE(other) > PySet_GET_SIZE(so)) { tmp = (PyObject *)so; so = (PySetObject *)other; other = tmp; } while (set_next((PySetObject *)other, &pos, &entry)) { key = entry->key; hash = entry->hash; rv = set_contains_entry(so, key, hash); if (rv < 0) { Py_DECREF(result); return NULL; } if (rv) { if (set_add_entry(result, key, hash)) { Py_DECREF(result); return NULL; }
If you pass an object that is not a set, then this is not true, and the length does not matter, since objects from iterable are used:
it = PyObject_GetIter(other); if (it == NULL) { Py_DECREF(result); return NULL; } while ((key = PyIter_Next(it)) != NULL) { hash = PyObject_Hash(key); if (hash == -1) goto error; rv = set_contains_entry(so, key, hash); if (rv < 0) goto error; if (rv) { if (set_add_entry(result, key, hash)) goto error; } Py_DECREF(key);
When you pass iterability, firstly, it can be an iterator, so you cannot check the size without consuming, and if you pass the list, the search will be 0(n) , so it makes sense to simply iterate over the iteration passed to, on the contrary, if if you have a set of elements of 1000000 and one with 10 , it makes sense to check if 10 in the set if 1000000 as opposed to checking if any of the 1000000 in your set of 10 , since the search should be 0(1) in average, therefore, it means a linear passage through 10 against a linear passage over 1,000,000 eleme Tami.
If you look at wiki.python.org/moin/TimeComplexity , this is a backup:
Middle case → Intersection s & t O (min (len (s), len (t))
Worst case -> O (len (s) * len (t)) O (len (s) * len (t))
replace "min" with "max" if t is not a set
So, when we pass the iterable, we should always get the objects from b:
i = "$foobar" * 100 j = "$foob" * 100 l = "$foobar" * 100 k = "$foob" * 100 print(id(i), id(j)) print(id(l), id(k)) a = {i, j} b = [k, l, 1,2,3] inter = a.intersection(b) for ele in inter: print(id(ele))
You get objects from b:
20854128 20882896 20941072 20728768 20941072 20728768
If you really want to decide which objects you have, iterate and search, saving depending on what you want.