I am creating a catalog for searching GAE sports tournaments with web2py and front end Flex. The user selects a location, radius and maximum date from a set of options. I have a basic version of this request, but it is inefficient and slow. One of the ways that I know I can improve this by condensing many individual queries, which I use to assemble objects into bulk queries. I just found out that this is possible. But I also think of a more extensive redesign that uses memcache.
The main problem is that I cannot query the data store by location because the GAE does not allow multiple numeric comparison operators (<, <=,> =,>) in a single query. I already use one for the date, and I need TWO to check both latitude and longitude, so this is not an option. At the moment, my algorithm is as follows:
1.) Request by date and select
2.) Use the assignment function from the geophysical distance module to find the maximum and minimum latitude and longitude for a given distance
3.) Scroll through the results and delete everything with lat / lng outside max / min
4.) Repeat the cycle and use the distance function to accurately check the distance, as step 2 will include some areas outside the radius. Delete the results from a distance outside (is this a 2/3/4 inefficent combination?)
5.) Gather the many-to-many list and attach to the objects (this is where I need to switch to mass operations)
6.) Return to customer
Here is my plan for using memcache .. let me know if I am leaving the left field on this, since I have no prior experience with memcache or the caching server in general.
-See the list in the cache filled with "geo objects" that represent all my data. They have five properties: latitude, longitude, event_id, event_type (pending expansion outside of tournaments) and start date. This list will be sorted by date.
. Also keep the pointer pointer in the cache, which is the beginning and ending indexes in the cache for all date ranges used by my application (next week, 2 weeks, month, 3 months, 6 months, year, 2 years).
- A scheduled task that updates pointers daily at 12 o’clock.
- add new inserts to the cache, as well as data storage; Update pointers.
Using this design, the algorithm will now look like this:
1.) Use the pointers to cut off the corresponding portion of the list based on the delivery date.
2-4.) Same as above, except for geo objects
5.) Use the bulk operation to select complete tournaments using the remaining event_ids geo objects
6.) Collect many-in-manys
7.) Return to customer
Thoughts on this approach? Thanks so much for reading and any advice you can give.
-Dane