See the important clarification at the bottom of this question.
I use numpy to speed up the processing of longitude / latitude coordinates. Unfortunately, my “optimization” in numpy made my code run about 5 times slower than it worked without using numpy.
The bottleneck seems to fill the numpy array with my data, and then retrieve that data after performing the math transformations. To populate the array, I basically have a loop like:
point_list = GetMyPoints()
This loop, just filling the numpy array, doesn't even work on it, very slow, much slower than all the calculations were without numpy. (That is, it is not only the slowness of the python cycle itself, but apparently some huge overhead in actually transferring each small block of data from python to numpy.) At the other end, there is a similar slowness; after I have processed the numpy arrays, I refer to each modified coordinate pair in the loop, again as
some_python_tuple = point_buffer[ index ]
Again, this data output loop is much slower than all the original calculations without numpy. So, how do I actually populate the numpy array and extract data from the numpy array in a way that does not defeat the purpose of using numpy in the first place?
I am reading data from a form file using the C library, which passes me the data as a regular python list. I understand that if the library gave me the coordinates already in the numpy array, there would be no “filling” of the numpy array. But unfortunately, the starting point for me with data is the regular python list. And what's more, in general, I want to understand how you quickly populate a numpy array with data from python.
Explanation
The cycle shown above is actually simplified. I wrote it like this in this question because I wanted to focus on the problem I saw while trying to fill a numpy array slowly in a loop. Now I understand that doing this is just slow.
In my actual application, I have a coordinate point shape file, and I have an API to extract points for a given object. There is something like 200,000 objects. Therefore, I repeatedly call the GetShapeCoords( i ) function to get the coordinates for object i. This returns a list of lists, where each sublist is a list of lon / lat pairs, and the reason the list of lists is because some of the objects are multi-part (i.e. multi-poly). Then, in my source code, when I read in every object, I did the conversion at every point, calling the regular python function, and then drawing the converted points using PIL. It all took about 20 seconds to draw all 200,000 polygons. Not scary, but plenty of room for improvement. I noticed that at least half of those 20 seconds was spent executing the conversion logic, so I thought I would do it in a few words. And my initial implementation was just to read objects one at a time, and also add all the points from the sub-lists into one large numpy array, which then I could use the math material in numpy.
So now I understand that just passing the complete python list to numpy is the right way to configure a large array. But in my case, I read only one object at a time. So the one thing I could do is to add add points to the large list of python list of list lists. And then, when I collected several points of objects in this way (say, 10,000 objects), I could just assign numpy to this list of monsters.
So now my question has three parts:
(a) Is it true that numpy can accept this large list of irregular shapes, a list of lists of lists, both quickly and quickly?
(b) Then I want to be able to transform all the points in the leaves of this monster tree. What is an expression to get numpy, for example, "go to each sublist, and then to each sub-sub, and then for each coordinate pair that you find in these sub-tabs, multiply the first (the lon coordinate) by 0.5"? Can I do it?
(c) Finally, I need to return these transformed coordinates to build them.
Below is an example of a Winston answer that tells me how I can do this using itertools. What I want to do is very similar to what Winston does by smoothing the list. But I can’t just smooth it out. When I go to draw data, I need to know when one polygon stops and the next starts. So, I think I could make it work if there was a way to quickly mark the end of each polygon (i.e., each sub-sub) with a special coordinate pair like (-1000, -1000) or something like that. Then I could smooth itertools, as in Winston's answer, and then do the conversion to numpy. Then I need to draw from point to point using PIL, and here I think that I will need to reassign the modified numpy array back to the python list, and then iterate this list into a regular python loop to make a drawing. Does this seem to be my best option, not to mention writing a C module to handle all reading and drawing for me in one step?