Convert a nested list without copying or losing precision

I use Mathematica 7 to process a large dataset. A data set is a three-dimensional array of signed integers. Three levels can be considered corresponding to X points per shot, Y-scans per scan, and Z-scans for each set.

I also have a “nullifying” shot (containing X points, which are signs of fractions of integers) that I would like to subtract from each frame in the data set. Subsequently, I no longer need the original dataset.

How can I perform this conversion without creating new copies of the data set or its parts in this process? Conceptually, the data set is in memory, and I would like to scan each element and change it in this place in memory, without constantly copying it to another place in memory.

The following stand-alone code reflects all aspects of what I'm trying to do:

(* Create some offsetted data, and a zero data set. *) myData = Table[Table[Table[RandomInteger[{1, 100}], {k, 500}], {j, 400}], {i, 200}]; myZero = Table[RandomInteger[{1, 9}]/RandomInteger[{1, 9}] + 50, {i, 500}]; (* Method 1 *) myData = Table[ f1 = myData[[i]]; Table[ f2 = f1[[j]]; f2 - myZero, {j, 400}], {i, 200}]; (* Method 2 *) Do[ Do[ myData[[i]][[j]] = myData[[i]][[j]] - myZero, {j, 400}], {i, 200}] (* Method 3 *) Attributes[Zeroing] = {HoldFirst}; Zeroing[x_] := Module[{}, Do[ Do[ x[[i]][[j]] = x[[i]][[j]] - myZero, {j, Length[x[[1]]]} ], {i, Length[x]} ] ]; 

(Note: Aaron Honecker hat tip for method number 3.)

On my machine (3.17 GHz Intel Core2 Duo processor, 4 GB of RAM, 32-bit version of Windows 7), all three methods use approximately 1.25 GB of memory, with fairings # 2 and # 3 slightly better.

If I am not against loss of accuracy, wrapping N[ ] around the internal objects myData and myZero when they are created increases their size in memory by 150 MB initially, but reduces the amount of memory required for zeroing (using methods # 1- # 3 above) from 1 , 25 GB to 300 MB! This is my working solution, but it would be great to know how best to deal with this problem.

+6
source share
2 answers

Unfortunately, I don't have much time, so I have to be brief ...

When working with big data, you need to know that Mathematica has a different storage format called packed arrays, which is much more compact and much faster than usual, but only works for machine realities or integers.

Please rate ?Developer`*Packed* to find out what features are available for direct conversion to / from them if this does not happen automatically.

So, a brief explanation of why my solution works quickly and efficiently is that it uses packed arrays. I tested with Developer`PackedArrayQ that my arrays are never decompressed and I used machine realities (I applied N[] to everything)

 In[1]:= myData = N@RandomInteger [{1, 100}, {200, 400, 500}]; In[2]:= myZero = Developer` ToPackedArray@ N@Table [RandomInteger[{1, 9}]/RandomInteger[{1, 9}] + 50, {i, 500}]; In[3]:= myData = Map[# - myZero &, myData, {2}]; // Timing Out[3]= {1.516, Null} 

In addition, the operation you requested (“I would like to scan each item and change it at that location in memory”) is called a map (see Map[] or /@ ).

+6
source

Let me start by saying that this answer should be seen as complementary to one of @Szabolcs, the latter being, in my opinion, the best option. Although @Szabolcs is probably the fastest and best overall solution, it lags behind the original specification because Map returns a (modified) copy of the original list rather than “scan every item” and change it at that location in memory. " , AFAIK, is provided only by the Part command. I will use his ideas (converting everything into packed arrays) to show code that makes changes in memory in the original list:

 In[5]:= Do[myData[[All,All,i]]=myData[[All,All,i]]- myZero[[i]], {i, Last@Dimensions @myData}];//Timing Out[5]= {4.734,Null} 

This is conceptually equivalent to method 3 mentioned in the question, but it runs much faster because it is a partially vectorized solution and only one loop is needed. This, however, is still at least an order of magnitude slower than the @Sabololcs solution.

In theory, this looks like a classic compromise between speed and memory: if you need speed and there is spare memory, @Szabolcs solution is the way to go. If your memory requirements are strict, theoretically this slower method will save on intermediate memory consumption (in the @Szabolcs method the initial list is garbage collection after myData assigned the Map result, so the final memory usage is the same, but when calculating one additional array of size myData supported by Map ).

However, in practice, memory consumption is apparently no less, since an additional copy of the list is for some reason supported in the Out variable in both cases during (or immediately after) the calculation, even if the output signal is suppressed (it may also be that this effect does not occur in all cases). I don’t quite understand this yet, but my current conclusion is that the @Szabolcs method is as good in terms of intermediate memory consumption as the real one, based on in-place list modifications. Therefore, his method seems to be suitable in all cases, but I decided to post this answer as a supplement.

+3
source

Source: https://habr.com/ru/post/905409/


All Articles