Distributed memory error

I got the following error in the scheduler when starting Dask in a distributed task:

distributed.core - ERROR -
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/distributed/core.py", line 269, in write
    frames = protocol.dumps(msg)
  File "/usr/local/lib/python3.4/dist-packages/distributed/protocol.py", line 81, in dumps
    frames = dumps_msgpack(small)
  File "/usr/local/lib/python3.4/dist-packages/distributed/protocol.py", line 153, in dumps_msgpack
    payload = msgpack.dumps(msg, use_bin_type=True)
  File "/usr/local/lib/python3.4/dist-packages/msgpack/__init__.py", line 47, in packb
    return Packer(**kwargs).pack(o)
  File "msgpack/_packer.pyx", line 231, in msgpack._packer.Packer.pack (msgpack/_packer.cpp:231)
  File "msgpack/_packer.pyx", line 239, in msgpack._packer.Packer.pack (msgpack/_packer.cpp:239)
MemoryError

Are you running out of memory in the scheduler or on one of the workers? Or both?

+2
source share
1 answer

The most common cause of this error is trying to collect too much data, for example, in the following example using dask.dataframe:

df = dd.read_csv('s3://bucket/lots-of-data-*.csv')
df.compute()

( ), (, , 100 .) , , , .

, , , Executor.persist, , .

df = dd.read_csv('s3://bucket/lots-of-data-*.csv')
df = e.persist(df)

df.compute() , .

+2

Source: https://habr.com/ru/post/1678460/


All Articles