2D array to represent a huge python dict, COOrdinate as a solution to save memory

Question

2D array to represent a huge python dict, COOrdinate as a solution to save memory

I am trying to update a dict_with_tuples_key file with data from an array:

myarray = np.array([[0, 0],  # 0, 1
                    [0, 1],
                    [1, 1],  # 1, 2
                    [1, 2],  # 1, 3
                    [2, 2],
                    [1, 3]]
) # a lot of this with shape~(10e6, 2)

dict_with_tuples_key = {(0, 1): 1,
                        (3, 7): 1} # ~10e6 keys

Using an array to store dict values (thanks to @MSeifert), we get the following:

def convert_dict_to_darray(dict_with_tuples_key, myarray):
    idx_max_array = np.max(myarray, axis=0)
    idx_max_dict  = np.max(dict_with_tuples_key.keys(), axis=0)
    lens = np.max([list(idx_max_array), list(idx_max_dict)], axis=0)
    xlen, ylen = lens[0] + 1, lens[1] + 1
    darray = np.zeros((xlen, ylen)) # Empty array to hold all indexes in myarray
    for key, value in dict_with_tuples_key.items():
        darray[key] = value
    return darray

@njit
def update_darray(darray, myarray):
    elements = myarray.shape[0]
    for i in range(elements):
        darray[myarray[i][0]][myarray[i][1]] += 1
    return darray

def darray_to_dict(darray):
    updated_dict = {}
    keys = zip(*map(list, np.nonzero(darray)))
    for x, y in keys:
        updated_dict[(x, y)] = darray[x, y]
    return updated_dict

darray = convert_dict_to_darray(dict_with_tuples_key, myarray)
darray = update_darray(darray, myarray)

I get the exact result:

# print darray_to_dict(darray)
# {(0, 1): 2.0,
#  (0, 0): 1.0,
#  (1, 1): 1.0,
#  (2, 2): 1.0,
#  (1, 2): 1.0,
#  (1, 3): 1.0,
#  (3, 7): 1.0, }

For a small matrix, it works well, @njit works on it so fast, but ...

creating a huge empty darray = np.zeros((xlen, ylen)) does not fit in memory . How can we avoid assigning a very sparse array and store only nonzero values, such as a sparse matrix in COOrdinate format?

+4

python numpy sparse-matrix numba

user3313834 Feb 11 '16 at 13:18

source share

1 answer

innoSPG · Answer 1 · 2016-02-11T14:34:29+0000

dok_matrix scipy; a dock_matrix - Keys. , darray = np.zeros((xlen, ylen)), .

, , scipy darray convert_dict_to_darray.

:

from scipy.sparse import dok_matrix

def convert_dict_to_darray(dict_with_tuples_key, myarray):
    idx_max_array = np.max(myarray, axis=0)
    idx_max_dict  = np.max(dict_with_tuples_key.keys(), axis=0)
    lens = np.max([list(idx_max_array), list(idx_max_dict)], axis=0)
    xlen, ylen = lens[0] + 1, lens[1] + 1
    darray = dok_matrix( (xlen, ylen) )
    for key, value in dict_with_tuples_key.items():
        darray[key[0], key[1]] = value
    return darray

2D array to represent a huge python dict, COOrdinate as a solution to save memory

More articles: