Saving a list of 1 million key pairs in python

I need to save a list of 1 million key-value pairs in python. The key will be a string / integer, while the value will be a list of float values. For instance:

{"key":36520193,"value":[[36520193,16.946938],[26384600,14.44005],[27261307,12.467529],[16456022,11.316026],[26045102,8.891106],[148432817,8.043456],[36670593,7.111857],[43959215,7.0957513],[50403486,6.95],[18248919,6.8106747],[27563337,6.629243],[18913178,6.573106],[42229958,5.3193846],[17075840,5.266625],[17466726,5.2223654],[47792759,4.9141016],[83647115,4.6122775],[56806472,4.568034],[16752451,4.39949],[69586805,4.3642135],[23207742,3.9822476],[33517555,3.95],[30016733,3.8994896],[38392637,3.8642135],[16165792,3.6820507],[14895431,3.5713203],[48865906,3.45],[20878230,3.45],[17651847,3.3642135],[24484188,3.1820507],[74869104,3.1820507],[15176334,3.1571069],[50255841,3.1571069],[103712319,3.1571069],[20706319,2.9571068],[33542647,2.95],[17636133,2.95],[66690914,2.95],[19812372,2.95],[21178962,2.95],[37705610,2.8642135],[20812260,2.8642135],[25887809,2.8642135],[18815472,2.8642135],[17405810,2.8642135],[46598192,2.8642135],[20592734,2.6642137],[44971871,2.5],[27610701,2.45],[92788698,2.45],[52164826,2.45],[17425930,2.2],[60194002,2.1642137],[122136476,2.0660255],[205325522,2.0],[117521212,1.9820508],[33953887,1.9820508],[22704346,1.9571068],[26176058,1.9071068],[39512661,1.9071068],[43141485,1.8660254],[16401281,1.7],[31495921,1.7],[14599628,1.7],[74596964,1.5],[55821372,1.5],[109073560,1.4142135],[91897348,1.4142135],[25756071,1.25],[25683960,1.25],[17303288,1.25],[42065448,1.25],[72148532,1.2],[19192100,1.2],[85941613,1.2],[77325396,1.2],[18266218,1.2],[114005403,1.2],[16346823,1.2],[43441850,1.2],[60660643,1.2],[41463847,1.2],[33804454,1.2],[20757729,1.2],[18271440,1.2],[51507708,1.2],[104856807,1.2],[24485743,1.2],[16075381,1.2],[68991517,1.2],[96193545,1.2],[63675003,1.2],[70735999,1.2],[25708416,1.2],[80593161,1.2],[42982108,1.2],[120368215,1.2],[24379982,1.2],[14235673,1.2],[20172395,1.2],[161441314,1.2],[37996201,1.2],[35638883,1.2],[46164502,1.2],[74047763,1.2],[19681494,1.2],[95938476,1.2],[20443787,1.2],[87258609,1.2],[34784832,1.2],[30346151,1.2],[40885516,1.2],[197129344,1.2],[14266331,1.2],[15112466,1.2],[26867986,1.2],[82726479,1.2],[23825810,1.2],[14662121,1.2],[32707312,1.2],[17477917,1.2],[123462351,1.2],[5745462,1.2],[16544178,1.2],[23284384,1.2],[45526985,1.2],[23109303,1.2],[26046257,1.2],[53654203,1.2],[133026438,1.2],[25139051,1.2],[65077694,1.2],[17469289,1.2],[15130494,1.2],[148525895,1.2],[15176360,1.2],[44853617,1.2],[9115332,1.2],[16878570,1.2],[132421452,1.2],[6273762,1.2],[124360757,1.2],[21643452,1.2],[9890492,1.2],[16305494,1.2],[18484474,1.2],[22643607,1.2],[60753586,1.2],[9200012,1.2],[30042254,1.2],[8374622,1.2],[15894834,1.2],[18438022,1.2],[78038442,1.2],[22097386,1.2],[21018755,1.2],[20845703,1.2],[164462136,1.2],[19649167,1.2],[24746288,1.2],[27690898,1.2],[42822760,1.2],[160935289,1.2],[178814456,1.2],[53574205,1.2],[41473578,1.2],[82176632,1.2],[82918057,1.2],[102257360,1.2],[17504315,1.2],[18363508,1.2],[50735431,1.2],[80647070,1.2],[40879040,1.2],[17790497,1.2],[191364080,1.2],[14429823,1.2],[22078893,1.2],[121338184,1.2],[113341318,1.2],[48900101,1.2],[38547066,1.2],[20484157,1.2],[16228699,1.2],[21179292,1.2],[15317594,1.2],[55777010,1.2],[15318882,1.2],[182109160,1.2],[45238537,1.2],[19701986,1.2],[32484918,1.2],[18244358,1.2],[18479513,1.2],[19081775,1.2],[21117305,1.2],[19325724,1.2],[136844568,1.2],[32398651,1.2],[20482993,1.2],[14063937,1.2],[91324381,1.2],[20528275,1.2],[14803917,1.2],[16208245,1.2],[17419051,1.2],[31187903,1.2],[54043787,1.2],[167737676,1.2],[24431712,1.2],[24707301,1.2],[24420092,1.2],[15469536,1.2],[26322385,1.2],[77330594,1.2],[82925252,1.2],[28185335,1.0],[24510384,1.0],[24407244,1.0],[41229669,1.0],[16305330,1.0],[26246555,1.0],[28183026,1.0],[49880016,1.0],[104621640,1.0],[36880083,1.0],[19705747,1.0],[22830942,1.0],[21440766,1.0],[54639609,1.0],[49077908,1.0],[29588859,1.0],[23523447,1.0],[20803216,1.0],[20221159,1.0],[1416611,1.0],[3744541,1.0],[21271656,1.0],[68956490,1.0],[96851347,1.0],[39479083,1.0],[27778893,1.0],[18785448,1.0],[39010580,1.0],[65796371,1.0],[124631720,1.0],[27039286,1.0],[18208354,1.0],[51080209,1.0],[37388787,1.0],[18462037,1.0],[31335156,1.0],[21346320,1.0],[23911410,1.0],[73134924,1.0],[807095,1.0],[44465330,1.0],[16732482,1.0],[37344334,1.0],[734753,1.0],[23006794,1.0],[33549858,1.0],[102693093,1.0],[51219631,1.0],[20695699,1.0],[4081171,1.0],[27268078,1.0],[80116664,1.0],[32959253,1.0],[85772748,1.0],[27109019,1.0],[28706024,1.0],[59701568,1.0],[23559586,1.0],[15693493,1.0],[56908710,1.0],[6541402,1.0],[15855538,1.0],[126169000,1.0],[24044209,1.0],[80700514,1.0],[21500333,1.0],[18431316,1.0],[44496963,1.0],[68475722,1.0],[15202472,1.0],[19329393,1.0],[39706174,1.0],[22464533,1.0],[81945172,1.0],[22101236,1.0],[19140282,1.0],[31206614,1.0],[15429857,1.0],[27711339,1.0],[14939981,1.0],[62591681,1.0],[52551600,1.0],[40359919,1.0],[27828234,1.0],[21414413,1.0],[156132825,1.0],[21586867,1.0],[23456995,1.0],[25434201,1.0],[30107143,1.0],[34441838,1.0],[37908934,1.0],[47010618,1.0],[139903189,1.0],[17833574,1.0],[758608,1.0],[15823236,1.0],[37006875,1.0],[10302152,1.0],[40416155,1.0],[21813730,1.0],[18785600,1.0],[30715906,1.0],[428333,1.0],[22059385,1.0],[15155074,1.0],[11061902,1.0],[1177521,1.0],[20449160,1.0],[197117628,1.0],[42423692,1.0],[24963961,1.0],[19637934,1.0],[35960001,1.0],[43269420,1.0],[43283406,1.0],[20269113,1.0],[59409413,1.0],[25548759,1.0],[23779324,1.0],[21449197,1.0],[14327149,1.0],[15429316,1.0],[16159485,1.0],[18785846,1.0],[67651295,1.0],[28389815,1.0],[19780922,1.0],[23841181,1.0],[78391198,1.0],[60765383,1.0],[37689397,1.0],[6447142,1.0],[31332871,1.0],[30364057,1.0],[14120151,1.0],[16303064,1.0],[23023236,1.0],[103610974,1.0],[108382988,1.0],[19791811,1.0],[17121755,1.0],[46346811,1.0],[45618045,1.0],[25587721,1.0],[25362775,1.0],[20710218,1.0],[20223138,1.0],[21035409,1.0],[101894425,1.0],[38314814,1.0],[24582667,1.0],[21181713,1.0],[15901190,1.0],[18197299,1.0],[38802447,1.0],[19668592,1.0],[14515734,1.0],[16870853,1.0],[16488614,1.0],[95955871,1.0],[14780915,1.0],[21188490,1.0],[24243022,1.0],[27150723,1.0],[29425265,1.0],[36370563,1.0],[36528126,1.0],[43789332,1.0],[82773533,1.0],[19726043,1.0],[20888549,1.0],[30271564,1.0],[14874125,1.0],[121436823,1.0],[56405314,1.0],[46954727,1.0],[25675498,1.0],[12803352,1.0],[23888081,1.0],[18498684,1.0],[38536306,1.0],[22851295,1.0],[20140595,1.0],[22311506,1.0],[31121729,1.0],[53717630,1.0],[100101137,1.0],[24753205,1.0],[24523660,1.0],[19544133,1.0],[20823773,1.0],[22677790,1.0],[15227791,1.0],[57525419,1.0],[28562317,1.0],[9629222,1.0],[24047612,1.0],[30508215,1.0],[59084417,1.0],[71088774,1.0],[142157505,1.0],[15284851,1.0],[17164788,1.0],[17885166,1.0],[18420140,1.0],[19695929,1.0],[20572844,1.0],[23479429,1.0],[26642006,1.0],[43469093,1.0],[50835878,1.0],[172049453,1.0],[20604508,1.0],[21681591,1.0],[20052907,1.0],[21271938,1.0],[17842661,1.0],[6365162,1.0],[18130749,1.0],[19249062,1.0],[24193336,1.0],[25913173,1.0],[28647246,1.0],[26072121,1.0],[14522546,1.0],[16409683,1.0],[18785475,1.0],[28969818,1.0],[52757166,1.0],[7120172,1.0],[112237392,1.0],[116779546,1.0],[57107167,1.0],[26347170,1.0],[26565946,1.0],[44409004,1.0],[21105244,1.0],[14230524,1.0],[44711134,1.0],[101753075,1.0],[783214,1.0],[22885110,1.0],[39367703,1.0],[23042739,1.0],[682903,1.0],[38082423,1.0],[16194263,1.0],[2425151,1.0],[52544275,1.0],[21380763,1.0],[18948541,1.0],[34954261,1.0],[34848331,1.0],[29245563,1.0],[19499974,1.0],[16089776,1.0],[77040291,1.0],[18197476,1.0],[1704551,1.0],[15002838,1.0],[17428652,1.0],[20702626,1.0],[29049111,1.0],[34004383,1.0],[34900333,1.0],[48156959,1.0],[50906836,1.0],[15742480,1.0],[41073372,1.0],[37338814,1.0],[1344951,1.0],[8320242,1.0],[14719153,1.0],[20822636,1.0],[168841922,1.0],[19877186,1.0],[14681605,1.0],[15033883,1.0],[23121582,1.0],[23670204,1.0],[41466869,1.0],[18753325,1.0],[21358050,1.0],[78132538,1.0],[132386271,1.0],[86194654,1.0],[17225211,1.0],[107179714,1.0],[18785430,1.0],[19408059,1.0],[19671129,1.0],[24347716,1.0],[24444592,1.0],[25873045,1.0],[7871252,1.0],[14138300,1.0],[16873300,1.0],[14546496,1.0],[165964253,1.0],[15529287,1.0],[95956928,1.0],[19404587,1.0],[21506437,1.0],[22832029,1.0],[19542638,1.0],[30827536,1.0],[5748622,1.0],[22757990,1.0],[41259253,1.0],[23738945,1.0],[19030602,1.0],[21410102,1.0],[28206360,1.0],[136411179,1.0],[17499805,1.0],[26107245,1.0],[127311408,1.0],[77023233,1.0],[20448733,1.0],[20683840,1.0],[22482597,1.0],[15485441,1.0],[28220280,1.0],[55351351,1.0],[70942325,1.0],[9763482,1.0],[15732001,1.0],[27750488,1.0],[18286352,1.0],[122216533,1.0],[19562228,1.0],[5380672,1.0],[22293700,1.0],[59974874,1.0],[44455025,1.0],[90420314,1.0],[22657153,1.0],[16660662,1.0],[14583400,1.0],[16689545,1.0],[94242867,1.0],[44527648,1.0],[40366319,1.0],[33616007,1.0],[23438958,1.0],[15317676,1.0],[14075928,1.0],[1978331,1.0],[33347901,1.0],[16570090,1.0],[32347966,1.0],[26671992,1.0],[101907019,1.0],[24986014,1.0],[23235056,1.0],[40001164,1.0],[21891032,1.0],[18139329,1.0],[9648652,1.0],[16105942,1.0],[3004231,1.0],[20762929,1.0],[28061932,1.0],[39513172,1.0],[15012305,1.0],[18349404,1.0],[22196210,1.0],[110509537,1.0],[20318494,1.0],[21816984,1.0],[22456686,1.0],[62290422,1.0],[93472506,0.8660254],[52305889,0.70710677],[67337055,0.70710677],[122768292,0.5],[35060854,0.5],[43289205,0.5],[87271142,0.5],[28096898,0.5],[79297090,0.5],[24016107,0.5],[48736472,0.5],[109982897,0.5],[98367357,0.5],[21816847,0.5],[73129588,0.5],[23807734,0.5],[76724998,0.5],[63153228,0.5],[21628966,0.5],[14465428,0.5],[42609851,0.5],[30213342,0.5],[17021966,0.5],[96616361,0.5],[97546740,0.5],[67613930,0.5],[21234391,0.5],[87245558,0.5],[36841912,0.5]]} 

I would search on this data structure. What will be the most suitable data structure to achieve my goal? I heard recommendations about Redis . Is it worth looking at it and not at the traditional python data structure? If not, suggest other mechanisms.

Edit

The value field is a list of lists. In most cases, a list can contain up to 1000 lists consisting of a list of size 2.

+6
source share
3 answers

Redis would be suitable if ...

  • You want to share a queue between multiple processes or instances of your application.
  • You want the data to be persistent, so if your application goes down, it may take the place where it stopped.
  • You need a quick, easy solution.
  • Memory usage is a problem.

I'm not sure about the latter, but I assume that using a dict or other type of collection in Python is likely to have a higher amount of memory than storing all your keys / values ​​in a single Redis hash.

Update

I tested memory usage by storing an array of examples a million times both in memory and in redis. Storing all values ​​in a Redis hash requires serialization of the array. I chose json serialization, but this might be a more efficient binary format that redis supports.

  • 1 million copies of the array provided in Ruby Hash (should be comparable to a Python dict ) indexed using an integer key similar to the one shown. Memory usage increased by ~ 350 mb (similar to python by @gnibbler).
  • 1 million copies of the array, serialized to a JSON string in a redis hash indexed using the same numbers. Memory usage increased by ~ 250 MB.

Both were very fast, and Redis was a little faster when I measured 10,000 random queries against random queries against a native collection. I know this is not Python, but it should be at least illustrative.

In addition, to answer another OPs problem, Redis has no problem passing very large string values. He is the maximum row size currently 512mb

+7
source

Really should not be a problem

 >>> d=dict((str(n), range(20)) for n in range(1000000)) 

it took ~ 350 MB to create. Of course, your keys / values ​​can be much larger

+2
source

I looked at memory in NumPy as well as redis.

Firstly, NumPy:

 >>> import numpy as NP >>> K = NP.random.randint(1000, 9999, 1e6) >>> V = 5 * NP.random.rand(2e6).reshape(-1, 2) >>> kv = K.nbytes + V.nbytes >>> '{:15,d}'.format(kv) >>> ' 2,400,000' # 2.4 MB 

Now redis:

I represented values ​​as strings that should be very efficient in redis.

 >>> from redis import Redis # using the python client for redis >>> # w/ a server already running: >>> r0 = Redis(db=0) >>> for i in range(K.shape[0]) : v = ' '.join(NP.array(V[i], dtype=str).tolist()) r0.set(K[i], v) >>> # save db to disk asynchronously, then shut down the server >>> r0.shutdown() 

Redis database (.rdb file) 2.9 MB

Of course, this is not a comparison of apples with apples, because I chose what I considered the most natural model for representing OP data in each library, i.e. redis (strings) than for NumPy (2-element NumPy array).

+2
source

Source: https://habr.com/ru/post/906375/


All Articles