In a hash collision, how does CPython know what value is stored in the HASHVALUE index, and what value is stored in RESOLUTIONINDEX

Question

In a hash collision, how does CPython know what value is stored in the HASHVALUE index, and what value is stored in RESOLUTIONINDEX

If I have a dict, for example { key1 : value1, key2 : value2,..., key17:value17 }, and 2 keys give the same hash, say key13 and key5 both give 12 when hashing, since I understand that python implements a conflict resolution method (open addressing if I'm not mistaken), to solve this problem. So, let's say value5 will be stored in index 12, and value 13 will be stored in another public index, determined by the conflict resolution method.

Here the hard part that I get confused: To get the value (for example, from key5), does the CPython interpreter hash the key and extract the value from the HASHVALUE index? This cannot be right, because then how does the interpreter know if value 13 belongs to key5, or is it in a different index due to a collision?

I tried to look at the C code from https://github.com/python/cpython/blob/master/Objects/dictobject.c#L1041

and the function seems

PyObject *
PyDict_GetItem(PyObject *op, PyObject *key)
{
    Py_hash_t hash;
    PyDictObject *mp = (PyDictObject *)op;
    PyDictKeyEntry *ep;
    PyThreadState *tstate;
    PyObject **value_addr;

    if (!PyDict_Check(op))
        return NULL;
    if (!PyUnicode_CheckExact(key) ||
        (hash = ((PyASCIIObject *) key)->hash) == -1)
    {
        hash = PyObject_Hash(key);
        if (hash == -1) {
            PyErr_Clear();
            return NULL;
        }
    }

    #/* We can arrive here with a NULL tstate during initialization: try
       #running "python -Wi" for an example related to string interning.
       #Let just hope that no exception occurs then...  This must be
       #_PyThreadState_Current and not PyThreadState_GET() because in debug
       #mode, the latter complains if tstate is NULL. */
    tstate = (PyThreadState*)_Py_atomic_load_relaxed(
        &_PyThreadState_Current);
    if (tstate != NULL && tstate->curexc_type != NULL) {
       # /* preserve the existing exception */
        PyObject *err_type, *err_value, *err_tb;
        PyErr_Fetch(&err_type, &err_value, &err_tb);
        ep = (mp->ma_keys->dk_lookup)(mp, key, hash, &value_addr);
       # /* ignore errors */
        PyErr_Restore(err_type, err_value, err_tb);
        if (ep == NULL)
            return NULL;
    }
    else {
        ep = (mp->ma_keys->dk_lookup)(mp, key, hash, &value_addr);
        if (ep == NULL) {
            PyErr_Clear();
            return NULL;
        }
    }
    return *value_addr;
}

but my knowledge of C is very small, and I frankly do not understand what half of this says.

+4

python dictionary collision hash

Ron D. Oct 4 '15 at 10:27

source share

1 answer

Raymond Hettinger · Accepted Answer · 2017-08-10T08:38:26+0000

Keys are stored with associated values

- CPython , : -, :

typedef struct {
    Py_ssize_t me_hash;
    PyObject *me_key;
    PyObject *me_value;
} PyDictEntry;

{hash5, key5, value5} 12, {hash13, key13, value13} , .

key5 12, Python value5.

Contrariwise, key13 12, Python , key13 != key5 value5. , key13 , value13.

, "CPython , HASHVALUE RESOLUTIONINDEX". , , , .

In a hash collision, how does CPython know what value is stored in the HASHVALUE index, and what value is stored in RESOLUTIONINDEX

Keys are stored with associated values

More articles: