Why does Python crash when I try to summarize this numpy array?

I am working on Ubuntu 14.04 with Python 3.4 (Numpy 1.9.2 and PIL.Image 1.1.7). That's what I'm doing:

>>> from PIL import Image >>> import numpy as np >>> img = Image.open("./tifs/18015.pdf_001.tif") >>> arr = np.asarray(img) >>> np.shape(arr) (5847, 4133) >>> arr.dtype dtype('bool') # all of the following four cases where I incrementally increase # the number of rows to 700 are done instantly >>> v = arr[1:100,1:100].sum(axis=0) >>> v = arr[1:500,1:100].sum(axis=0) >>> v = arr[1:600,1:100].sum(axis=0) >>> v = arr[1:700,1:100].sum(axis=0) # but suddenly this line makes Python crash >>> v = arr[1:800,1:100].sum(axis=0) fish: Job 1, "python3" terminated by signal SIGSEGV (Address boundary error) 

It seems to me that Python suddenly ran out of memory. If so - how can I allocate more memory for Python? As I can see from htop, the 32GB memory capacity is not even removed.

You can download the TIFF image here .


If I create an empty logical array, explicitly set the pixels, and then apply the summation - then it works:

 >>> arr = np.empty((h,w), dtype=bool) >>> arr.setflags(write=True) >>> for r in range(h): >>> for c in range(w): >>> arr.itemset((r,c), img.getpixel((c,r))) >>> v=arr.sum(axis=0) >>> v.mean() 5726.8618436970719 >>> arr.shape (5847, 4133) 

But this "workaround" is not very satisfactory, since copying each pixel takes too long - maybe there is a faster method?

+6
source share
1 answer

I can play segfault using numpy v1.8.2 / PIL v1.1.7 installed from Ubuntu repositories.

  • If I install numpy 1.8.2 in virtualenv using pip (still using PIL v1.7.1 from the Ubuntu repositories), I no longer see segfault.

  • If I do the opposite (installing PIL v1.1.7 using pip and using numpy v1.8.2 from Ubuntu repositories), I still get segfault.

This makes me think that this is caused by an old numpy bug. I could not find a good candidate in the multi-user tracker, but I suspect that updating numpy (for example, from the current source or via pip) will probably solve the problem.

One way is to convert the image mode to "P" (unsigned 8-bit int) before creating the array, and then converting it to boolean:

 arr2 = np.asarray(img.convert("P")).astype(np.bool) v = arr2[1:800,1:100].sum(axis=0) 
+3
source

Source: https://habr.com/ru/post/983928/


All Articles