WRT a 3 node cluster consisting of c3.2x large instances.
I have two tables. Table U contains about 65 million records and contains among other latitudes and longitudes. Table L has about 1 million records and also contains latitude and longitude.
U is stored as an ORC table.
The challenge is to determine how many U records are within a radius of 10 miles of locations in L.
select l.id, count(u.id) from U u, L l where 3960 * acos(cos(radians(l.lat)) * cos(radians(u.lat)) * cos(radians(l.long) - radians(u.long)) + sin(radians(l.lat)) * sin(radians(u.lat))) < 10.0 group by l.id;
Bit 3960 * acos(cos(radians(l.lat)) * cos(radians(u.lat)) * cos(radians(l.long) - radians(u.long)) + sin(radians(l.lat)) * sin(radians(u.lat))) < 10.0 - it's just the distance between lat / long pairs should be limited to 10 miles.
Problem: It appears that the request is running forever. While the card phase completes relatively quickly, the decline phase is stuck on some fixed percentage (80% ish)
I noticed this in the output messages that Hive emits. Number to reduce tasks defined during compilation: 1
I tried to increase the number of reducers by setting mapred.reduce.tasks to 7, but it always ends as 1. I have not been successfully increasing the number of reducers.
This answer seems to suggest that perhaps if I write my query differently, I can force more than one reducer. But I still could not do it.
Estimates of runtime : for one place in L, it takes about 60 seconds to get an answer. According to this account, it should take 60 million seconds, which is about 700 days! Is it worth so much time? Even for Hadoop.
I also tried to set additional restrictions, such as the lat limit, for a long time in a 10-mile 10-mile square box with the location in L in the center of the box, but the time taken for this is 40 seconds for 1 place, is not a big improvement.
Questions:
1) How can I get more gears? 2) Is there a better (in terms of runtime) request? 3) Any other tips to help me solve this problem.
Version: Hadoop - 2.7.0 Java 1.7.0_80 Hive 1.2.1