I run apache drill 1.0 (and then 1.4) locally on an ubuntu machine with 16 GB of RAM. When I work with a very large tab delimited file (52 million lines, 7 GB) and execute
Select distinct columns[0] from `table.tsv`
performance doesn't seem to improve the second time the same query is executed (both took 53 seconds). Usually, the second time the same request was executed, it takes less than half the time compared to the first request. Drill doesn't seem to use all of the allocated memory.
My conf / drill-env.sh file looks like this:
DRILL_MAX_DIRECT_MEMORY="14G"
DRILL_HEAP="14G"
export DRILL_JAVA_OPTS="-Xms$DRILL_HEAP -Xmx$DRILL_HEAP -XX:MaxDirectMemorySize=$DRILL_MAX_DIRECT_MEMORY -XX:MaxPermSize=14G -XX:ReservedCodeCacheSize=1G -Ddrill.exec.enable-epoll=true"
I also did it in exercise
alter system set `planner.memory.max_query_memory_per_node`=12884901888
However, when I check memory usage with smem, it only uses about 5 GB of RAM.
1 , , 3.6 , , , 1,8
?